With the vast amounts of information being generated online, summarization has always been used in empowering users to quickly decide if they need to read lengthy articles. This problem is further exacerbated on social media platforms where users write informally with little concern for style, structure or grammar. In this challenge, participants will specifically generate summaries for social media posts.
For the first time, we have created one of the largest datasets for neural summarization (Webis-TLDR-17 Corpus) by mining Reddit - an online discussion forum which encourages the practice of providing a "tldr- a short, paraphrased summary" by the author of a post. We invite participants to test and develop their neural summarization models on our dataset to generate such TLDRs; as an incentive, we provide a free, extensive qualitative evaluation of your models through dedicated crowdsourcing.
For a quick overview of state-of-the-art in neural summarization, refer to this RELATED WORK
November 5th, 2018 : Training data available, competition begins.
March 15th, 2019 : Submission system and public leaderboard open. (Deadline extended)
May 1st, 2019 : Submission system closed (23:59 PM UTC), manual evaluation begins.
We provide a corpus consisting of approximately 3 Million content-summary
pairs mined from Reddit. It is up to the participants to split this into training and validation sets
accordingly. Details about
the corpus construction can be found in our publication:
TL;DR: Mining Reddit to Learn Automatic Summarization
To provide fast approximations of model performance, the public leaderboard will be updated with ROUGE scores. You will be able to self-evaluate your software using the TIRA service. You can find the user guide here. A script will be provided shortly after the start of the competition using which participants will be able to perform self evaluation and report the F-1 scores accordingly for ROUGE-1, ROUGE-2 and ROUGE-LCS.
Additionally, a qualitative evaluation will be performed through crowdsourcing. Human annotators will rate each candidate summary according to five linguistic qualities as suggested by the DUC guidelines.
When you register, you will get remote access to a virtual machine (Windows or Linux) to deploy the task software in. Your software must be executable from the command line and not require Internet access during the evaluation period.
NOTE: By submitting your software you retain full copyrights. You agree to grant us usage rights for evaluation of the corresponding data generated by your models. We agree not to share your model with a third party or use it for any purpose other than research. The generated summaries will however be shared with a crowdsourcing platform for evaluation.
Adobe Research, San Jose