Wikipedia Pre-train Pairs Dataset
This repository contains 542,192 data pairs used for the Wikipedia fine-tuning stage . The data folder contains 166 JSON files which include graph-to-text pairs related to 15 categories (Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam, WrittenWork, Athlete, Artist, City, MeanOfTransportation, CelestialBody, Politician) that appear in the WebNLG dataset. [Paper] [Code] [Dataset] [Slides] [Poster] [Bib]

ReviewRobot Dataset
This dataset contains 8,110 paper and review pairs and background KG from 174,165 papers. It also contains information extraction results from SciIE and various knowledge graph built on the IE results. The detailed information can be found here. [Paper] [Dataset] [Bib]

This dataset currently gathers knowledge extraction result from 14,229 papers and 6217 abstracts about Semantic Scholar’s CORD-19 Dataset, Best Demo Award at NAACL-HLT 2021. [Paper] [KG]

PubMed Paper Reading Dataset
This dataset gathers 14,857 entities, 133 relations, and entities corresponding tokenized text from PubMed. It contains 875,698 training pairs, 109,462 development pairs, and 109,462 test pairs. [Paper] [Bib] [Dataset]

PubMed Term, Abstract, Conclusion, Title Dataset
This dataset gathers three types of pairs: Title-to-Abstract (Training: 22,811/Development: 2095/Test: 2095), Abstract-to-Conclusion and Future work (Training: 22,811/Development: 2095/Test: 2095), Conclusion and Future work-to-Title (Training: 15,902/Development: 2095/Test: 2095) from PubMed. Each pair contains a pair of input and output as well as the corresponding terms(from original KB and link prediction results). [Paper] [Bib] [Dataset]

Wikipedia Person and Animal Dataset
This dataset gathers 428,748 person and 12,236 animal infobox with descriptions based on Wikipedia dump (2018/04/01) and Wikidata (2018/04/12). [Paper] [Bib] [Dataset]

ACL Title and Abstract Dataset
This dataset gathers 10,874 title and abstract pairs from the ACL Anthology Network (until 2016). [Paper] [Bib] [Dataset]


