A Model for Text Summarization

A Model for Text Summarization

Rasim M. Alguliyev (Azerbaijan National Academy of Sciences, Institute of Information Technology, Baku, Azerbaijan), Ramiz M. Aliguliyev (Azerbaijan National Academy of Sciences, Institute of Information Technology, Baku, Azerbaijan), Nijat R. Isazade (Azerbaijan National Academy of Sciences, Institute of Information Technology, Baku, Azerbaijan), Asad Abdi (University of Malaya, Department of Artificial Intelligence, Kuala Lumpur, Malaysia) and Norisma Idris (University of Malaya, Department of Artificial Intelligence, Kuala Lumpur, Malaysia)
Copyright: © 2017 |Pages: 19
DOI: 10.4018/IJIIT.2017010104
OnDemand PDF Download:
$37.50

Abstract

Text summarization is a process for creating a concise version of document(s) preserving its main content. In this paper, to cover all topics and reduce redundancy in summaries, a two-stage sentences selection method for text summarization is proposed. At the first stage, to discover all topics the sentences set is clustered by using k-means method. At the second stage, optimum selection of sentences is proposed. From each cluster the salient sentences are selected according to their contribution to the topic (cluster) and their proximity to other sentences in cluster to avoid redundancy in summaries until the appointed summary length is reached. Sentence selection is modeled as an optimization problem. In this study, to solve the optimization problem an adaptive differential evolution with novel mutation strategy is employed. With a test on benchmark DUC2001 and DUC2002 data sets, the ROUGE value of summaries got by the proposed approach demonstrated its validity, compared to the traditional methods of sentence selection and the top three performing systems for DUC2001 and DUC2002.
Article Preview

Introduction

Interest in text summarization has gained increasing attention in recent years because of the large amounts of text data, which are created in a variety of social networks, web, and other information-centric applications, such as e-library and e-government. The explosion of electronic documents has made it difficult for users to extract useful information from them. The user due to the large amount of information does not read many relevant and interesting documents. Therefore, the continuing growth of available online text documents makes research and application of text summarization very important and consequently attracts many researchers. The reason for this is twofold: first, text summarization can help cope with the information overload, and second, small form-factor devices are becoming increasingly popular.

Text summarization is a process of automatically creating a shorter version of a document or a set of documents by reducing the document(s) in length. It is an important way of finding relevant information in large text libraries or in the Internet (Canhasi & Kononeko, 2014; Ferreira et al., 2014). Text summarization can help users to access the information more easily, on the one hand, reducing the time they have to spend dealing with the information, and on the other, selecting the information most useful for them (Yang & Wang, 2008; Lloret & Palomar, 2013).

According to different criteria, text summarization techniques can be categorized into abstract-based and extract-based (reproducing sentence or not), multi-document and single-document (more than one document or not), query-focused and generic (given query or not), supervised and unsupervised (with training set or not) methods. Abstraction can be described as reading and understanding the text to recognize its content which is then compiled in a concise text. In general, an abstract can be described as summary comprising concepts/ideas taken from the source that are then reinterpreted and presented in a different form. An extract is a summary consisting of units of text taken from the source and presented verbatim. Single-document summarization can only distill one document into a shorter version, while on the contrary; multi-document summarization can compress a set of documents. Multi-document summarization can be seen as an enhancement of single-document summarization and can be used for outlining the information contained in a cluster of documents (Canhasi & Kononeko, 2014; Luo, Zhuang, He, & Shi, 2013). Generic summarization tries to extract the most general idea from the original document without any specified preference in terms of content. Query-focused document summarization is a special case of document summarization. Given a query, the task is to produce a summary which can respond to the information required by the query (Canhasi & Kononeko, 2014). In supervised methods for summarization, the task of selecting important sentences is represented as a binary classification problem, partitioning all sentences in the input into summary and non-summary sentences. Unsupervised learning methods do not require any training data, thus can be applied to any text data without requiring any manual effort. The two main unsupervised learning methods commonly used in the context of text data are clustering and topic modeling (Aliguliyev, 2010; Cai, Li, & Zhang, 2013; Cai, Li, & Zhang, 2014; Cai, Li, Zhang, & Shi, 2014; Mei & Chen, 2012).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing