- Download the test set for a dataset (a single text file with one article per line):
- Generate summaries from your model in the exact order to avoid alignment errors with hosted models.
- Send us the output file with summaries (a single text file with one summary per line) to
shahbaz.syed[at]uni-leipzig.de or tariq.yousef[at]uni-leipzig.de.
Summary Length: It is the number of words in the summary.
Novelty: It is the percentage of summary words that are not in
Compression Ratio: It is the word ratio between the article and
We compute this on two levels inspired by
- Entity-level: the
percentage of named entities in the summary that are found in the document. We also match
partial entities to their longer counterparts from the document if they share parts of the
- Relation-level: the
percentage of relations (extracted using
Stanford OpenIE) in the summary that are found in the
Since we consider reference also a model, we only compute the precision with respect to the
We compute the n-gram abstractiveness upto 4-grams following Gehrmann et al., 2019
. It is the normalized score for novelty that tracks parts of a summary that are already among the n-grams it has in common with the document.