Article Preview
Top1. Introduction
A Concept Map (CM) introduce by (Novak & Gowin, 1984) consists of concepts as nodes and directed labeled edges indicating relations between these concept nodes. A CM is a graph that organizes, represents, and visualizes knowledge present in a text effectively (Qasim et al., 2013; Zubrinic et al. 2012). The challenge in constructing CM is the organization of concepts expressively such that it is equivalent to the human understanding of concepts. The CM has to provide an overview of the document that is concise and effortless to understand (Falke et al., 2017). Advantages of CM include the concise presentation of concepts, explicit visualization of relationships among them, and the indication of the important concepts.
Manual construction of CM from text is a daunting task as it requires elaborate domain knowledge to extract the proper set of concepts and relations in addition to organizing them efficiently. Concept map mining (CMM) refers to the automatic or semi-automatic creation of CM from the text (Villalon & Calvo 2008). In a semi-automatic process of CMM (Kowata et al., 2010, Zubrinic et al., 2012), the manual selection of concepts suggested by an automatic system is used to construct a CM. In automatic CMM (Falke and Gurevych, 2017; Qasim et al., 2013; Zubrinic et al., 2012), the selection of concepts and relations to form the CM is carried out automatically using available resources. Generally, the construction of CM from the text can be done using statistical, Natural Language Processing (NLP), dictionary-based, or machine learning-based approaches.
CM’s has been employed in diverse applications in previous studies such as, to structure information repositories (Fox & Richardson, 2005), teaching tool (Roy, 2008) and for concise text representation (Valerio et al., 2012). CM’s are also been used in many NLP applications like Keyword suggestion system (Amiri et al., 2008), Document Classification (Valerio et al., 2008), Knowledge explorer and organizer (Attapattu et al., 2014), Question Answering (Kim et al., 2017), Summarization (Falke et al., 2017), etc.
The generic CM can be enhanced in numerous ways: by extracting precise propositions, by representing it compactly via redundancy removal, by adding features to the concepts and organizing them such that it increases the information content. Falke & Gurevych (2017) leveraged the predicate-argument structure of the sentence to extract propositions that serve as input to the CMM. To extract the predicate-argument structures, different dependency parsing based NLP tools like OpenIE (Open Information Extraction), PropS, SRL (Semantic Role Labelling) using Mate are used. The information content of the CM generated was solely based on the lexical relation i.e. predicate of the proposition, but not on other latent features. The proposed automatic CMM utilizes Concept-Relation-Concept triplets extracted from the document sentences using Tree-based Convolution Neural Network (TBCNN) as input.
In this work, the focus is on compact representation by merging of similar concepts in the document. To enhance the information content of the CM, concepts are linked via the latent semantic structure that exists in the document i.e. by topical clustering of the concepts and hierarchically linking topically clustered concepts. The concept's significance is identified based on these relations along with its statistical value and is used to represent concepts. The ECM is formalized as a multigraph with the added relations and organized efficiently. The summary of the ECM is extracted by selecting concepts based on their significance, coverage, and cohesion. Additional concepts and relations are added to the summary to ensure the connectedness of the summary.
The rest of the paper is organized as follows. Section 2 reviews the existing methods and their limitations in CMM. Section 3 details the process of ECM generation using joint word embedding. Section 4 briefs about the dataset and tools used for the process. Section 5 presents a detailed description of the evaluation of the ECM and its summary along with the results. Section 6 provides the comments on the proposed work along with the improvements that can be carried out in the future.