With the vast amounts of information being generated online, summarization has always been used in empowering users to quickly decide if they need to read lengthy articles. This problem is further exacerbated on social media platforms where users write informally with little concern for style, structure or grammar. In this challenge, participants will specifically generate summaries for social media posts.

For the first time, we have created one of the largest datasets for neural summarization (Webis-TLDR-17 Corpus) by mining Reddit - an online discussion forum which encourages the practice of providing a "tldr- a short, paraphrased summary" by the author of a post. We invite participants to test and develop their neural summarization models on our dataset to generate such TLDRs; as an incentive, we provide a free, extensive qualitative evaluation of your models through dedicated crowdsourcing.
For a quick overview of state-of-the-art in neural summarization, refer to this RELATED WORK


November 5th, 2018 : Training data available, competition begins.
March 15th, 2019 : Submission system and public leaderboard open. (Deadline extended)
May 1st, 2019 : Submission system closed (23:59 PM UTC), manual evaluation begins.

Get started

We provide a corpus consisting of approximately 3 Million content-summary pairs mined from Reddit. It is up to the participants to split this into training and validation sets accordingly. Details about the corpus construction can be found in our publication:
TL;DR: Mining Reddit to Learn Automatic Summarization

Register Zenodo Download


To provide fast approximations of model performance, the public leaderboard will be updated with ROUGE scores. You will be able to self-evaluate your software using the TIRA service. You can find the user guide here. A script will be provided shortly after the start of the competition using which participants will be able to perform self evaluation and report the F-1 scores accordingly for ROUGE-1, ROUGE-2 and ROUGE-LCS.

Additionally, a qualitative evaluation will be performed through crowdsourcing. Human annotators will rate each candidate summary according to five linguistic qualities as suggested by the DUC guidelines.


When you register, you will get remote access to a virtual machine (Windows or Linux) to deploy the task software in. Your software must be executable from the command line and not require Internet access during the evaluation period.

NOTE: By submitting your software you retain full copyrights. You agree to grant us usage rights for evaluation of the corresponding data generated by your models. We agree not to share your model with a third party or use it for any purpose other than research. The generated summaries will however be shared with a crowdsourcing platform for evaluation.

Shahbaz Syed

Universität Leipzig

Michael Völske

Bauhaus-Universität Weimar

Martin Potthast

Universität Leipzig

Nedim Lipka

Adobe Research, San Jose

Benno Stein

Bauhaus-Universität Weimar

Hinrich Schütze

Ludwig-Maximilians-Universität, Munich