TL;DR The abstractive summarization challenge

Task

With the vast amounts of information being generated online, summarization has always been used in empowering users to quickly decide if they need to read lengthy articles. This problem is further exacerbated on social media platforms where users write informally with little concern for style, structure or grammar. In this challenge, participants will specifically generate summaries for social media posts.

For the first time, we have created one of the largest datasets for neural summarization (Webis-TLDR-17 Corpus) by mining Reddit - an online discussion forum which encourages the practice of providing a "tldr- a short, paraphrased summary" by the author of a post. We invite participants to test and develop their neural summarization models on our dataset to generate such TLDRs; as an incentive, we provide a free, extensive qualitative evaluation of your models through dedicated crowdsourcing.

Dates

November 5th, 2018 : Training data available, competition begins.  (Registrations closed)
March 15th, 2019 : Submission system open.
April 15th, 2019 : Leader board (TIRA)
June 1st, 2019 : Submission system closed (23:59 PM UTC), manual evaluation begins.

Data

We provide a corpus consisting of approximately 3 Million content-summary pairs mined from Reddit. It is up to the participants to split this into training and validation sets accordingly. Details about the corpus construction can be found in our publication:
TL;DR: Mining Reddit to Learn Automatic Summarization

Zenodo Download

Evaluation

To provide fast approximations of model performance, the public leaderboard will be updated with ROUGE scores. You will be able to self-evaluate your software using the TIRA service. You can find the user guide here.   Additionally, a qualitative evaluation will be performed through crowdsourcing. Human annotators will rate each candidate summary according to five linguistic qualities as suggested by the DUC guidelines.

Task Committee