Methods for OpenCSR
Please check our code at this Github repo.
We show the instructions for running four retrieval approaches to the OpenCSR task — BM25 (off-the-shelf), DPR (EMNLP2020), DrKIT (ICLR 2020) and DrFact (ours, NAACL 2021), as well as a Concept Re-ranker to boost the performance by learning with cross-attention.
Note that there is a relative dependency of these four methods:
- training the DPR model needs the results from BM25 (to create training data);
- DrFact needs to reuse DPR’s fact index and single-hop results (for creating distant supervision);
- DrFact and DrKIT share many utility functions (sparse matrix operation and indexing scripts). We detailed the detailed instructions in individual pages.
Folder structure
- drfact_data/
- baseline_methods/
- BM25/
- DPR/
- MCQA/ (i.e., Concept Re-ranker)
- language-master/language/labs/
- drkit/ (common modules for DrKIT and DrFact)
- drfact/ (for running DrFact)