Wikipedia Pre-train Pairs Dataset

Published:

[Paper] [Code] [Dataset] [Slides] [Poster] [Bib] [Video]

This repository contains 542,192 data pairs used for the Wikipedia fine-tuning stage. The data folder contains 166 JSON files which include graph-to-text pairs related to 15 categories (Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam, WrittenWork, Athlete, Artist, City, MeanOfTransportation, CelestialBody, Politician) that appear in the WebNLG dataset. The detailed information can be found here.