Article Preview
TopIntroduction
Interest in text summarization has gained increasing attention in recent years because of the large amounts of text data, which are created in a variety of social networks, web, and other information-centric applications, such as e-library and e-government. The explosion of electronic documents has made it difficult for users to extract useful information from them. The user due to the large amount of information does not read many relevant and interesting documents. Therefore, the continuing growth of available online text documents makes research and application of text summarization very important and consequently attracts many researchers. The reason for this is twofold: first, text summarization can help cope with the information overload, and second, small form-factor devices are becoming increasingly popular.
Text summarization is a process of automatically creating a shorter version of a document or a set of documents by reducing the document(s) in length. It is an important way of finding relevant information in large text libraries or in the Internet (Canhasi & Kononeko, 2014; Ferreira et al., 2014). Text summarization can help users to access the information more easily, on the one hand, reducing the time they have to spend dealing with the information, and on the other, selecting the information most useful for them (Yang & Wang, 2008; Lloret & Palomar, 2013).
According to different criteria, text summarization techniques can be categorized into abstract-based and extract-based (reproducing sentence or not), multi-document and single-document (more than one document or not), query-focused and generic (given query or not), supervised and unsupervised (with training set or not) methods. Abstraction can be described as reading and understanding the text to recognize its content which is then compiled in a concise text. In general, an abstract can be described as summary comprising concepts/ideas taken from the source that are then reinterpreted and presented in a different form. An extract is a summary consisting of units of text taken from the source and presented verbatim. Single-document summarization can only distill one document into a shorter version, while on the contrary; multi-document summarization can compress a set of documents. Multi-document summarization can be seen as an enhancement of single-document summarization and can be used for outlining the information contained in a cluster of documents (Canhasi & Kononeko, 2014; Luo, Zhuang, He, & Shi, 2013). Generic summarization tries to extract the most general idea from the original document without any specified preference in terms of content. Query-focused document summarization is a special case of document summarization. Given a query, the task is to produce a summary which can respond to the information required by the query (Canhasi & Kononeko, 2014). In supervised methods for summarization, the task of selecting important sentences is represented as a binary classification problem, partitioning all sentences in the input into summary and non-summary sentences. Unsupervised learning methods do not require any training data, thus can be applied to any text data without requiring any manual effort. The two main unsupervised learning methods commonly used in the context of text data are clustering and topic modeling (Aliguliyev, 2010; Cai, Li, & Zhang, 2013; Cai, Li, & Zhang, 2014; Cai, Li, Zhang, & Shi, 2014; Mei & Chen, 2012).