Link Search Menu Expand Document

Methods for OpenCSR

Please check our code at this Github repo.

We show the instructions for running four retrieval approaches to the OpenCSR task — BM25 (off-the-shelf), DPR (EMNLP2020), DrKIT (ICLR 2020) and DrFact (ours, NAACL 2021), as well as a Concept Re-ranker to boost the performance by learning with cross-attention.

Note that there is a relative dependency of these four methods:

  • training the DPR model needs the results from BM25 (to create training data);
  • DrFact needs to reuse DPR’s fact index and single-hop results (for creating distant supervision);
  • DrFact and DrKIT share many utility functions (sparse matrix operation and indexing scripts). We detailed the detailed instructions in individual pages.

Folder structure

  • drfact_data/
    • datasets/ (download from here)
    • knowledge_corpus/(download from here)
  • baseline_methods/
    • BM25/
    • DPR/
    • MCQA/ (i.e., Concept Re-ranker)
  • language-master/language/labs/
    • drkit/ (common modules for DrKIT and DrFact)
    • drfact/ (for running DrFact)

Comparisions of the four methods