Model Submission
- Download the test set for a dataset (a single text file with one article per line):
CNN/Daily Mail,
XSum,
Webis TL;DR
- Generate summaries from your model in the exact order to avoid alignment errors with hosted models.
- Send us the output file with summaries (a single text file with one summary per line) to
shahbaz.syed[at]uni-leipzig.de or tariq.yousef[at]uni-leipzig.de.
Automatic Metrics
Summary Length: It is the number of words in the summary.
Novelty: It is the percentage of summary words that are not in
the document.
Compression Ratio: It is the word ratio between the article and
the summary.
Factual Consistency: We compute this on two levels inspired by
Nan et.al,
2021.
- Entity-level: the
percentage of named entities in the summary that are found in the document. We also match
partial entities to their longer counterparts from the document if they share parts of the
entity.
- Relation-level: the
percentage of relations (extracted using
Stanford OpenIE) in the summary that are found in the
document.
Since we consider reference also a model, we only compute the precision with respect to the
source document.
N-gram Abstractiveness:
We compute the n-gram abstractiveness upto 4-grams following
Gehrmann et al., 2019. It is the normalized score for novelty that tracks parts of a summary that are already among the n-grams it has in common with the document.