A New Approach Based on the Detection of Opinion by SentiWordNet for Automatic Text Summaries by Extraction

A New Approach Based on the Detection of Opinion by SentiWordNet for Automatic Text Summaries by Extraction

Mohamed Amine Boudia (GeCoDe Laboratory, Dr. Tahar Moulay University of Saida, Saida, Algeria), Reda Mohamed Hamou (GeCoDe Laboratory, Dr. Tahar Moulay University of Saida, Saida, Algeria) and Abdelmalek Amine (GeCoDe Laboratory, Dr. Tahar Moulay University of Saida, Saida, Algeria)
Copyright: © 2016 |Pages: 18
DOI: 10.4018/IJIRR.2016070102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this paper, the authors propose a new approach based on the detection of opinion by the SentiWordNet for the production of text summarization by using the scoring extraction technique adapted to detecting of opinion. The texts are decomposed into sentences then represented by a vector of scores of opinion (sentences). The summary will be done by elimination of sentences whose opinion is different from the original text. This difference is expressed by a threshold opinion. The following hypothesis: “textual units that do not share the same opinion of the text are ideas used for the development or comparison and their absences have no vocation to reach the semantics of the abstract” Has been verified by the statistical measure of Chi_2. Finally, the authors found an opinion threshold interval which generate the optimal assessments.
Article Preview

1. Introduction And Problematic

Currently, one of the major problems for computer scientists is access to the content of information, access itself or in other words the software and hardware infrastructure are no longer an obstacle, and the major problem is the exponential increase in the amount of textual information electronically. This requires the use of more specific tools i.e. access to the content of texts by rapid and effective means has become a necessary task.

A summary of a text is an effective way to represent the contents of the texts and allow quick access to their semantic content. The purpose of a summarization is to produce an abridged text covering most of the content from the source text.

Summary of text appears interesting for fast access to the content of textual information. A summary is a reissued the original text in smaller form that is done under the constraint of keeping the semantics of a document that is minimized entropy semantics. The purpose of this operation is to help the reader identify interesting information for him without having to read the entire document. The uses of automatic summaries aim to reduce the time to find the relevant documents or reduce treatment long texts by identifying the key information. The volume of electronic textual information is increasing, making access to information difficult. Producing a summary may facilitate access to information, but it is also a complex task because it requires language skills.

To do an automatic summarization, the current literature presents three approaches:

  • Automatic Summarization by extraction

  • Automatic Summarization by understanding

  • Automatic Summarization by automatic classification

Another line of research that has gained momentum in recent years, in case the Opinion Mining or the fact of detecting opinion of a sentence, paragraph or text. Our job is to use detection methods to produce a summary opinion. We propose the hypothesis:

Textual units that do not share the same opinion of the text are ideas used for the development or comparison and their absences have no vocation to reach the semantics of the abstract

In this work we will generate a summary automatically by extraction approach, we will use the scoring technique where the score will calculate according to opinion by using a SentiWordNet

We will build a summary of the sentences that have an opinion similar to that of full text according to a threshold of opinion; our work will give an answer for the following question:

  • Have our hypothesis been testable? If so, is it valid?

  • What is the impact of opinion threshold on the quality of the summary?

  • The opinion mining can he bring a plus for automatic summarization?

2. Literature Review

Automatically produce a summary is an idea that has emerged in the early 1950s, this is a branch of natural language processing (NLP). The first attempts made their apparition in 1950, the community has tried to implement simple approaches as extracting relevant sentences according to a scoring in order to arrive at a summary understandable and easily readable by a human. (Luhn, 1958; Edmundson, 1969) proposed to identify lexical units carriers of semantics by manual analysis, the result is called extracts (in English), i.e. an extract is a summary built by phrases (considered as pertinent) original text. Their idea is to assign a weight to each sentence that represents its pertinence then extract either by a reduced rate, N sentences whose weight are greater, or a threshold scoring, tell that all the sentences with a score and greater than or equal to the threshold will be kept.

Other work focuses on automating the analysis for the detection of pertinent lexical units: several methods and variations have been proposed (Radev et al, 2001; (Radev et al, 2004; Boudin and al, 2007; Carbonell et al., 1998). (Boudin et al., 2008) we see that we can select two sentences pertinent but if it looks like, they have worked on eliminating redundancy by similarity measures sentences, they even prove their method detects the topic text.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing