Conceptual Graphs as Framework for Summarizing Short Texts

Conceptual Graphs as Framework for Summarizing Short Texts

Sabino Miranda-Jiménez (INFOTEC - Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación / Cátedra Conacyt, Aguascalientes, México), Alexander Gelbukh (Centro de Investigación en Computación, Instituto Politécnico Nacional, México) and Grigori Sidorov (Centro de Investigación en Computación, Instituto Politécnico Nacional, México)
DOI: 10.4018/IJCSSA.2014070104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this paper, a conceptual graph-based framework for summarizing short texts is proposed. A semantic representation is implemented through conceptual graph structures that consist of concepts and conceptual relations that stand for texts. To summarize conceptual graphs, the most important nodes are selected using a set of operations: generalization, association, ranking, and pruning, which are described. The importance of nodes on weighted conceptual graphs is measured using a modified version of HITS algorithm. In addition, some heuristic rules are used to keep coherent structures based on information from WordNet (hierarchy of concepts) and VerbNet (semantic patterns of verbs). The experimental results show that this approach is effective in summarizing short texts.
Article Preview

Introduction

Summarization technologies are essential in today’s information society. In order to handle huge amount of information efficiently, users need to have short documents that stand for the essential information from one or more source documents, that is, their summaries. High-quality automatic text summarization is a challenging task that involves text analysis, text understanding, the use of domain knowledge, and language generation.

There are several points of view that lead the automatic summarization. The main factors used for automatic summarization are (1) the kind of information source: text, images, video, or voice, (2) the number of documents to be summarized: single- or multi-document, (3) the resulting summary: extractive or abstractive, (4) the purpose: generic, user-oriented, query-focused, indicative or informative, and (5) the number of languages: monolingual or multilingual (Spärck Jones, 1999; Spärck Jones, 2007; Das & Martins, 2007; Nenkova & McKeown, 2011; Lloret & Palomar, 2012; Elhadad, Miranda-Jiménez, Steinberger, & Giannakopoulos, 2013; Torres-Moreno, 2014).

In this research, the interest is in single-document text summarization for English. The resulting summaries are considered generic and abstractive at conceptual level.

The single-document summarization task was addressed in Document Understanding Conference (DUC1) for years 2001 and 2000, the main forum for evaluating text summarization systems, now is a track of Text Analysis Conference (TAC2). In both years, none of the systems outperformed the baseline, which consisted of the first 100 words of the original documents. Summaries produced by humans were significantly better than all the systems. DUC data used were newswire/paper documents; thus, the genre of documents affected the results because news documents have important ideas at the beginning of text. The single-document summarization task was not kept in later years of DUC because of the poor results in the competitions, but it still remains an open problem (Nenkova, 2005; Nenkova & McKeown, 2011).

Multi-document summarization is motivated by information on Internet. Given the large amount of redundancy on documents, summarization is more useful if it can provide a brief description of a group of documents about the same topic or event. This approach is the main trend in TAC competition and researches. Also, multi-document multilingual summarization is gaining attention because same information can appear in several languages; MultiLing competition provides a collection of documents and evaluations for this sort of systems (Elhadad et al., 2013; Lei, Forascu, El-haj, & Giannakopoulos, 2013).

According to the resulting summary, the extractive approach is very popular and well-known. In this approach, a summary is made of excerpts from one or more documents; it is produced by concatenating sentences selected verbatim as they appear in the documents to be summarized. The limitations of this approach are well-known: low quality, lack of coherence, among others.

Abstractive summaries are produced to convey the important information from the original document, and sentences can be reused, combined, or pruned from it (Barzilay & McKeown, 2005; Genest & Lapalme, 2012). This approach has not been widely explored because deep text analysis is required for understanding texts. Such a deep analysis is indispensable to improve the quality of summaries (Spärck Jones, 2007; Lloret & Palomar, 2012; Saggion & Poibeau, 2013).

In this paper, we propose a framework for single-document abstractive summarization for English, based on conceptual graphs as the underlying text representation (Sowa, 1984). Our approach is based on a set of operations on conceptual graphs in order to simplify conceptual structures, namely, generalization, association, ranking, and pruning.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 5: 2 Issues (2017)
Volume 4: 2 Issues (2016)
Volume 3: 2 Issues (2015)
Volume 2: 2 Issues (2014)
Volume 1: 2 Issues (2013)
View Complete Journal Contents Listing