This chapter describes various text summarization techniques and evaluation techniques that have been proposed in literature and discusses the application of text summarization in digital libraries. First, it introduces the history of automatic text summarization and various types of summaries. Next, it reviews various approaches which have been used for single-document and multidocument summarization. Then, it describes the major evaluation approaches for assessing the generated summaries. Finally, it outlines the principal trends of the area of automatic text summarization. This chapter aims to help the reader to obtain a clear overview of the text summarization field and facilitate the application of text summarization in digital libraries.
Background And Types Of Summaries
Research in automatic text summarization has had a history of almost 50 years since the earliest attempt by Luhn (1958). However, there was little work and slow progress in the first 30 years. In the 1990s, as a result of information explosion in the World Wide Web, automatic text summarization became crucial to reduce information overload and this brought about its renaissance. It could be used for different purposes and different users and, thus, various types of summaries have been constructed.
Depending on the summarization method, a summary can be an extract (produced by sentence extraction) or an abstract (produced by an abstraction process). Using Mani’s (2001a, p. 6) definition:
Since extracts are much easier to be constructed automatically than abstracts which require more complex techniques such as rephrasing and paraphrasing, extracts are generally used in current digital library systems.
With reference to the content and intended use, a summary can be indicative, informative, or evaluative (Borko & Bernier, 1975):
An indicative summary provides an indication of what the original document is about. It can help users to determine whether the original document is worth reading or not, but users have to consult the original for details.
An informative summary reflects the content of the original document and represents the content in a concise way. It can be used as a substitute for the original document so that users do not need to read the original.
An evaluative or critical summary not only contains the main topics of the original document but also provides the abstractor’s comments on the document content.
Indicative summaries are more generally used in current digital library systems to help users identify documents of interest. On the other hand, informative summaries are more often used for news articles to inform users about news events, for example, Columbia’s Newsblaster1.
Depending on the purpose and intended users, a summary can be generic or user-focused (Mani, 2001):
Key Terms in this Chapter
Single-document Summarization: The process of representing the main content of one document.
Multidocument Summarization: The process of representing the main content of a set of related documents on a topic, instead of only one document.
Text Summarization: The process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user and task.
Summary: A brief representation of the main content of a source (or sources).
Abstraction: A kind of summarization approach, with rephrasing or paraphrasing the main content of a document to form a summary.
Extraction: A kind of summarization approach that extracts important pieces of information (typically sentences) from the input to form a summary.
Extrinsic Evaluation: An evaluation which assesses the quality of summaries indirectly through user performance of some tasks using the summaries.
Intrinsic Evaluation: An evaluation which assesses the quality of summaries through direct human judgment of some criteria (e.g., grammaticality, conciseness, and readability) or comparing of “ideal” summaries.