Text Summarization and Its Types: A Literature Review

Text Summarization and Its Types: A Literature Review

Namrata Kumari, Pardeep Singh
DOI: 10.4018/978-1-7998-4730-4.ch017
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Text summarization is a compressing technique of the original text to form a summary which will provide the same meaning and information as provided by the original version. Summarizer helps in saving time and increasing efficiency. This chapter gives the full insight of text summarizers, which can be categorized based on methodology, function and target reader, dimension, and language. Various researches have been conducted in the field of text summarization using different approaches. Consequently, the chapter aims to provide an overview of how text summarizers work with different methods and state their domain-oriented applications. Additionally, the authors discuss multi-lingual text summarization in detail. This chapter focuses on showing the effectiveness and shortcomings of text summarization approaches by comparing them.
Chapter Preview
Top

Introduction

In the present era, nobody has enough time to go through the full documentation to get the full meaning. So, there is an immense need for automatic text summarization to save time and to make it easy for humans. Text summarization helps in creating bulletin, headings, summary, brief description, finding out the essential words, and so on. Text summarization helps in creating a summary of the data set without altering the actual meaning of data, which includes essential information as well. The need for text summarization can also be understood by an example suppose someone wants to read documents which are related to text mining from a vast miscellaneous database, and he starts reading all documents one by one and hence consumes much time, but if there is a list of all headings, then the person can directly read materials related to text mining. Other examples are – news headlines in the newspaper, the title of a book, and many more. Text summarization is essential because a massive growth in the information requires high maintenance.

Abstractive text summarization and extractive text summarization are the two major categories in text summarization. Abstractive summarization refers to recreate the whole document in a few words or lines, which may include new words as well. Extractive summarization refers to extract the critical words or lines from the original document. To clear the difference between abstractive and extractive to consider an example – a man is reading a document, and while reading, he is highlighting the main sentences to remember the vital part; after that, he writes the whole document using a pen to make notes. The highlighted sentences and pen-written notes both describes the original document in brief but are placed in different categorization. A highlighted summary is an example of extractive summarization, and a pen-written summary is an abstractive summarization. Machines or tools used to create a summary are known as summarizers. These tools are language-dependent.

Summarizers take text data as input to produce a summary. If the tool produces a summary in one or more languages other than the original language, then it is referred to as multilingual text summarization. The main idea of multilingual text summarization is to save time and complexity. Earlier summarizers worked for the single document, but now many documents can be feed to the machine as input and referred as multiple document summarizer. Tools are mainly designed by keeping the target reader in mind. It can be indicative, informative of query focused. Without giving much content, indicative summaries help in providing an idea about the text, while informative summaries provide a shortened version of the content. Necessary steps in creating a summary are:

  • a)

    feed text input to the machine in one or more languages;

  • b)

    pre-processing and feature extraction;

  • c)

    sentence selection and assembly;

  • d)

    summary generation in the desired language.

Figure 1 shows a generic architecture of text summarization. Text data is taken as input; important terms are extracted, and on their basis sentence ranking is done; high ranked sentences are selected, and low-rank sentences are rejected and then combined to form a summary; output can be in one (monolingual) language or different (multilingual) language.

The proposed chapter will be helpful to all the readers in understanding the basics of text summarization. To make this chapter enjoyable and to make concepts more precise, a proper flow will be maintained. Real-life examples will be given to clear the picture of text summarization. Pictorial explanations will support this chapter to make the chapter more interesting for the students, notably. Different Multilingual text summarization approaches will be discussed in detail, explaining the importance of each and difference among all. Comparison of different Multilingual text summarization tools will be made based on their accuracy (in percentage), shortcomings, and effectiveness. Queries like how to calculate accuracy and what are the parameters to calculate the accuracy will be covered in this chapter. While reviewing the tools, the dataset used will also be the significant concerned area for comparisons as efficiency and accuracy can be depicted based on the dataset used. The trend of text summarization will help the reader to know about the past, present, and future of text summarization.

Key Terms in this Chapter

Completeness: Summary comprising of all-important information covered in input data.

Summary: Brief description of the full document while preserving the full meaning.

Extractive: Selection of relevant sentences.

Intrinsic: Used for evaluation by calculating evaluating metrics.

Extrinsic: It measures the summary on the basis of efficiency and acceptability.

Coherent: Summary should be in flow as same as in input data.

Multi-Document: More than one document file.

Precision: It indicates the true-positive predictive value.

Abstractive: Paraphrasing of the data.

Redundant: Summary should not include any unwanted information.

Recall: It indicates the true-positive value.

Semantic: Branch of linguistics and logic, concerned with meaning.

Complete Chapter List

Search this Book:
Reset