Abstract Retrieval over Wikipedia Articles Using Neural Network

Abstract Retrieval over Wikipedia Articles Using Neural Network

Falah Hassan Ali Al-akashi
DOI: 10.4018/IJSSCI.2019070102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this article, we propose a neural network model to create a Wikipedia article summarization for each query to allow users to find summary of the topic without going through the whole content in the article. Often, Wikipedia returns the articles related to a search query that makes obvious finding the relevant topic for the user. Text summarization is generated by extracting all those important sentences that are most significant in its topics and have a strong match in its content. Experimentally, each sentence in the article content is encoded as a set of features and presented as an input to the network. The proposed neural network is trained using a set of randomly selected typical articles from Wikipedia. The network output is then used to predict the sentences as a summary of content from the searched query. The results showed that the proposed approach is robust and efficient at finding relevant summaries for most searched queries. Evaluation of the proposal yields accuracy scores of 0.10317 in ROUGE-N and 0.13998 in ROUGE–L.
Article Preview
Top

1. Introduction

Text summarization (TS), sometimes called text simplification, deals with transforming the original text into simplified variants to increase its readability, usability, and understandability (Surya & Mishra, 2018). It is an information retrieval method to computationally generate a summary of text. The technique has been studied for decades but today with the exponential increasing of data in Internet and World Wide Web it has become much more important than ever. Most text summarizers are extraction-based algorithms which means generating summary by extracting sentences from the text. Some summarization algorithms are capable of producing summaries that contain not only sentences that are presented in the article but also new automatically constructed phrases that are added to the summary to make it more readable. Sometimes, this functionality makes the summarization algorithm more powerful due to improving the comprehensibility of the output summary. In practice, the automatic construction of phrases is fairly difficult task and there is no guarantee that the new phrases will be really meaningful for the user. Moreover, it mostly seems that it is impossible for a web user to find and utilize information relevant to the user interest since the dramatic increasing of data available on the Internet.

Query dependent multi-document summarization that used graph-based approach has been showed the efficiency of document summarization (Bhaskar & Bandyopadhyay, 2010) (Chali & Hasan, 2012). Also, cosine similarity in multi-document summarization has been improved its capability of producing relevant abstract summary (Ranjan et al., 2016); whilst, Wikipedia has been showed its ability to provide strong topics for different documents summarization (Gong et al., 2014). Wikipedia is a largest repository and a rich source of information and knowledge made user possible for finding information and knowledge (Al-akashi, 2014). Most researchers preferred Wikipedia as a good reference for document summarization and simplification. Ramanathan et al. (2009) used an algorithm for mapping document sentences to semantic concepts in Wikipedia and select sentences for summary using the frequency of the mapped-to-concepts.

Nevertheless, user needs to find information from the plethora of strong resources; e.g. Wikipedia. Generally, search engines response to the query search consisting of list of matching knowledge retrieved from resources; but still somewhat requires on the part of the user to go through the content to check whether information or knowledge is relevant to his/her perceptions. However, text summarization is not limited to the syntax similarity but also to the semantic similarity or sometimes to the role of labelling. Semantic levels deploy ‘WordNet’ and sentence compression for enabling sentence generation, or semantic role labelling to get semantic representation of text, and then using segmentation to form clusters of related pieces of text (Bhartiya & Singh, 2014).

Neural Network is another solution for text summarization, simplification, and abstracting to tackle the obstacles of data bulk (Kaikhah, 2004) or news articles (Kaikhah, 2015). Ensemble neural networks are neural networks consist of cluster of networks each having the same form where the goal is to extract all those sentences in the article that related to the user expectations. Words are tagged in the sentence and in the query with the parts-of-speech and the process identifier to compare them to extract related sentences. This helps a user to find all those abstracts that are related to the user’s query. User perception can predict whether particular information and knowledge is relevant or solely needs to be summarized to circumvent its reading difficulties. It is commonplace to define a part-of-speech as a linguistic category of lexical items (generally words) that share common syntactic or morphological characteristics. Generally, grammarians often divide the English language into eight parts-of-speech: the noun, verb, adjective, adverb, pronoun, preposition, conjunction, and interjection. The deployed neural network is explained in depth in the following Sections. Words “text” and “document” are used interchangeably throughout this manuscript and they both refer to the Wikipedia article.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing