Methods for OpenCSR

Please check our code at this Github repo.

We show the instructions for running four retrieval approaches to the OpenCSR task — BM25 (off-the-shelf), DPR (EMNLP2020), DrKIT (ICLR 2020) and DrFact (ours, NAACL 2021), as well as a Concept Re-ranker to boost the performance by learning with cross-attention.

Note that there is a relative dependency of these four methods:

training the DPR model needs the results from BM25 (to create training data);
DrFact needs to reuse DPR’s fact index and single-hop results (for creating distant supervision);
DrFact and DrKIT share many utility functions (sparse matrix operation and indexing scripts). We detailed the detailed instructions in individual pages.

Folder structure

drfact_data/
- datasets/ (download from here)
- knowledge_corpus/(download from here)
baseline_methods/
- BM25/
- DPR/
- MCQA/ (i.e., Concept Re-ranker)
language-master/language/labs/
- drkit/ (common modules for DrKIT and DrFact)
- drfact/ (for running DrFact)

Comparisions of the four methods

Comparisions