Concept Map Information Content Enhancement Using Joint Word Embedding and Latent Document Structure

Kodaikkaavirinaadan Urkalan, Geetha T. V.

Source Title: International Journal on Semantic Web and Information Systems (IJSWIS) 16(4)

DOI: 10.4018/IJSWIS.2020100103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The concept map (CM) can be enhanced by extracting precise propositions, representing compactly, adding useful features that increase the information content (IC). To enhance the IC with domain knowledge of the document, an automatic enhanced CM generation using word embedding based concept and relation representation along with organization using latent semantic structure is proposed. To improve the concept significance, precise identification of similar items, clustering topically associated concepts, and hierarchical clustering of semantically related concepts are carried out. This augments the IC of the CM with additional information and generates CM with concise and informative content. The joint word embedding based on various contexts is utilized to determine distributional features critical for these enhancements. Summarization of the ECM to visualize the document summary is used to illustrate its resourcefulness. The work is evaluated using ACL anthology, Genia, and CRAFT dataset, and the information gain is approximately three times more in comparison with general CM.

Article Preview

Top

1. Introduction

A Concept Map (CM) introduce by (Novak & Gowin, 1984) consists of concepts as nodes and directed labeled edges indicating relations between these concept nodes. A CM is a graph that organizes, represents, and visualizes knowledge present in a text effectively (Qasim et al., 2013; Zubrinic et al. 2012). The challenge in constructing CM is the organization of concepts expressively such that it is equivalent to the human understanding of concepts. The CM has to provide an overview of the document that is concise and effortless to understand (Falke et al., 2017). Advantages of CM include the concise presentation of concepts, explicit visualization of relationships among them, and the indication of the important concepts.

Manual construction of CM from text is a daunting task as it requires elaborate domain knowledge to extract the proper set of concepts and relations in addition to organizing them efficiently. Concept map mining (CMM) refers to the automatic or semi-automatic creation of CM from the text (Villalon & Calvo 2008). In a semi-automatic process of CMM (Kowata et al., 2010, Zubrinic et al., 2012), the manual selection of concepts suggested by an automatic system is used to construct a CM. In automatic CMM (Falke and Gurevych, 2017; Qasim et al., 2013; Zubrinic et al., 2012), the selection of concepts and relations to form the CM is carried out automatically using available resources. Generally, the construction of CM from the text can be done using statistical, Natural Language Processing (NLP), dictionary-based, or machine learning-based approaches.

CM’s has been employed in diverse applications in previous studies such as, to structure information repositories (Fox & Richardson, 2005), teaching tool (Roy, 2008) and for concise text representation (Valerio et al., 2012). CM’s are also been used in many NLP applications like Keyword suggestion system (Amiri et al., 2008), Document Classification (Valerio et al., 2008), Knowledge explorer and organizer (Attapattu et al., 2014), Question Answering (Kim et al., 2017), Summarization (Falke et al., 2017), etc.

The generic CM can be enhanced in numerous ways: by extracting precise propositions, by representing it compactly via redundancy removal, by adding features to the concepts and organizing them such that it increases the information content. Falke & Gurevych (2017) leveraged the predicate-argument structure of the sentence to extract propositions that serve as input to the CMM. To extract the predicate-argument structures, different dependency parsing based NLP tools like OpenIE (Open Information Extraction), PropS, SRL (Semantic Role Labelling) using Mate are used. The information content of the CM generated was solely based on the lexical relation i.e. predicate of the proposition, but not on other latent features. The proposed automatic CMM utilizes Concept-Relation-Concept triplets extracted from the document sentences using Tree-based Convolution Neural Network (TBCNN) as input.

In this work, the focus is on compact representation by merging of similar concepts in the document. To enhance the information content of the CM, concepts are linked via the latent semantic structure that exists in the document i.e. by topical clustering of the concepts and hierarchically linking topically clustered concepts. The concept's significance is identified based on these relations along with its statistical value and is used to represent concepts. The ECM is formalized as a multigraph with the added relations and organized efficiently. The summary of the ECM is extracted by selecting concepts based on their significance, coverage, and cohesion. Additional concepts and relations are added to the summary to ensure the connectedness of the summary.

The rest of the paper is organized as follows. Section 2 reviews the existing methods and their limitations in CMM. Section 3 details the process of ECM generation using joint word embedding. Section 4 briefs about the dataset and tools used for the process. Section 5 presents a detailed description of the evaluation of the ECM and its summary along with the results. Section 6 provides the comments on the proposed work along with the improvements that can be carried out in the future.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Concept Map Information Content Enhancement Using Joint Word Embedding and Latent Document Structure

Abstract

1. Introduction

Complete Article List