TL;DR The abstractive summarization challenge
With the vast amounts of information being generated online, summarization has always been used
in empowering users to quickly decide if they need to read lengthy articles. This problem is
further exacerbated on social media platforms where users write informally with little concern
for style, structure or grammar. In this challenge, participants will specifically generate
summaries for social media posts.
For the first time, we have created one of the largest datasets for neural summarization (Webis-TLDR-17 Corpus) by mining Reddit - an online discussion forum which encourages the practice of providing a "tldr- a short, paraphrased summary" by the author of a post. We invite participants to test and develop their neural summarization models on our dataset to generate such TLDRs; as an incentive, we provide a free, extensive qualitative evaluation of your models through dedicated crowdsourcing.
November 5th, 2018 : Training data available, competition begins. (Registrations closed)
March 15th, 2019 : Submission system open.
April 15th, 2019 : Leader board (TIRA)
June 1st, 2019 : Submission system closed (23:59 PM UTC), manual evaluation begins.
We provide a corpus consisting of approximately 3 Million content-summary
pairs mined from Reddit. It is up to the participants to split this into training and validation
accordingly. Details about
the corpus construction can be found in our publication:
TL;DR: Mining Reddit to Learn Automatic Summarization
To provide fast approximations of model performance, the public leaderboard will be updated with ROUGE scores. You will be able to self-evaluate your software using the TIRA service. You can find the user guide here. Additionally, a qualitative evaluation will be performed through crowdsourcing. Human annotators will rate each candidate summary according to five linguistic qualities as suggested by the DUC guidelines.