Automatic Text Document Summarization Using Graph Based Centrality Measures on Lexical Network

Automatic Text Document Summarization Using Graph Based Centrality Measures on Lexical Network

Chandra Shakhar Yadav (SC & SS: School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India) and Aditi Sharan (SC & SS: School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India)
Copyright: © 2018 |Pages: 19
DOI: 10.4018/IJIRR.2018070102

Abstract

This article proposes a new concept of Lexical Network for Automatic Text Document Summarization. Instead of a number of chains, the authors are getting a network of sentences which is called as Lexical Network termed as LexNetwork. This network is created between sentences based on different lexical and semantic relations. In this network, a node is representing sentences and edges are representing strength between two sentences. Strength means the number of relations present between the two sentences. The importance of the sentences is decided based on different centrality measures and extracted for the summary. WSD is done with Simple Lesk technique, and Cosine-Similarity threshold (Ɵ, TH) is used as post processing task. In this article, the authors are suggesting that a Cosine similarity threshold 10% is better vs. 5%, and an Eigen-Value based centrality measure is better for summarization process. At last for comparison, they are using Semantrica-Lexalytics System.
Article Preview
Top

Introduction

Automatic Text Document Summarization plays an important role in IR (Information Retrieval) because, Summarization is a process in which we represent a large pool of text information into a meaningful and concise form, via selecting the good informative sentences along with this discard redundant sentences (or information). Radev et al. (2002) have defined a summary as “A text that is produced from one or more texts, that convey important information in the original texts, and that is no longer than half of the original text and usually significantly less than that”. According to Kulkarni and Apte “The concept of using lexical chains helps to analyze the document semantically and the concept of correlation of sentences…”

In this paper, we presenting lexical chain based summarization. Morris and Hirst (1991) proposed a logical description for the implementation of Lexical Chain using Roget Thesaurus, and Barzilay and Elhadad (1997) also develop the first text document summarizer using lexical chain.

As a whole, there are three main stages for summary generation using Lexical Chain: (1) candidate word selection for chain building as Noun, Verb, (2) Lexical chain construction and chain scoring model to represent the original document and, (3) Chain selection and chain extraction for summary generation. Generally, in second step chain scoring strategy is based on TF, IDF, the distinct number of words, position in the text.

Li and Sun (2008) proposed an Update style based summarization update style based mean actor (system) can differentiate between new coming information and information that is already known.

Gonzàlez and Fort, M. F. (2009) using WordNet or EuroWordNet lexical databases, proposed an algorithm for Lexical Chain construction, that was based on a global function optimization through Relaxation Labelling. Three different kinds of relations are considered by them, (1) Extra Strong relation between a word and its repetitions, (2) Strong relation between two words connected those are connected by a direct semantic relation, and (3) Medium Strong relation, between two words which are connected by some path of semantic relations. Chains are distinguished / extracted based on strong Medium and Light weight.

Gurevych and Nahnsen(2005) proposed a Lexical chains construction in which candidate word are selected based on POS tagging are Nouns, andWordNet1.7 used.

Kulkarni and Apte, (2014) considered nouns as candidate words for lexical chain creation and correlation between sentences. Their Lexical chains construction also depends on WordNet relations. They considered four different type of relations for chain creation like Synonym, Hypernym, Hyponym, and Meronym. The score of each Lexical chain is calculated based on keyword strength, TF-IDF. Lexical chain weight is calculated based on distances of each word and chain with the highest weight is extracted. In third stop extract top n sentences for summary generation.

Silber and McCoy (2000) follows the research of Barzilay and Elhadad (1997) for lexical chains creation. Their Modifications to WordNet based only on Noun for faster access.

Pourvali and Abadeh (2012) proposed an algorithm for single text document summarization based on two different knowledge source WordNet and Wikipedia (for words which don't present in the WordNet).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing