SciMON: Scientific Inspiration Machines Optimized for Novelty
Published:
[Paper] [Code/Dataset] [Slides] [Poster] [Bib]
This repositiory contains datasets for SciMon Paper. The NLP dataset is based on 67,409 ACL anthology papers from 1952 to 2022. The biomedical dataset is based on 5,704 papers from PubMed. The project data includes the following components:
data/local_context_dataset.zip
: This folder contains the training, validation, and testing files for our task.data/kg/*.json
: Thedata/kg
directory contains files that store the original Information Extraction (IE) results for all paper abstracts.data/ct/*.csv
: Thedata/ct
directory contains files that represent the citation network for all papers.data/gold_subset
: This directory contains our gold annotation subsets.data/biomedical.zip
: This directory contains our biochemical datasets.evaluation
contain sample evaluation code.