Model Submission

Automatic Metrics

Summary Length: It is the number of words in the summary.
Novelty: It is the percentage of summary words that are not in the document.
Compression Ratio: It is the word ratio between the article and the summary.
ROUGE: We use the Python implementation provide by Google Research.
Factual Consistency: We compute this on two levels inspired by Nan et.al, 2021.
  • Entity-level: the percentage of named entities in the summary that are found in the document. We also match partial entities to their longer counterparts from the document if they share parts of the entity.
  • Relation-level: the percentage of relations (extracted using Stanford OpenIE) in the summary that are found in the document. Since we consider reference also a model, we only compute the precision with respect to the source document.
N-gram Abstractiveness: We compute the n-gram abstractiveness upto 4-grams following Gehrmann et al., 2019. It is the normalized score for novelty that tracks parts of a summary that are already among the n-grams it has in common with the document.