A Semantic Knowledge-Based Framework for Information Extraction and Exploration

A Semantic Knowledge-Based Framework for Information Extraction and Exploration

Abduladem Aljamel, Taha Osman, Dhavalkumar Thakker
Copyright: © 2021 |Pages: 25
DOI: 10.4018/IJDSST.2021040105
Article PDF Download
Open access articles are freely available for download

Abstract

The availability of online documents that describe domain-specific information provides an opportunity in employing a knowledge-based approach in extracting information from web data. This research proposes a novel comprehensive semantic knowledge-based framework that helps to transform unstructured data to be easily exploited by data scientists. The resultant sematic knowledgebase is reasoned to infer new facts and classify events that might be of importance to end users. The target use case for the framework implementation was the financial domain, which represents an important class of dynamic applications that require the modelling of non-binary relations. Such complex relations are becoming increasingly common in the era of linked open data. This research in modelling and reasoning upon such relations is a further contribution of the proposed semantic framework, where non-binary relations are semantically modelled by adapting the semantic reasoning axioms to fit the intermediate resources in the N-ary relations requirements.
Article Preview
Top

1. Introduction

An increasing amount of data is being made available online. It can be exploited to inform data analytics and Decision Support Systems (DSS) for a variety of applications such as those belonging to the financial services domain. However, this online data is diverse in terms of volume and complexity, is largely unstructured and constructed in natural human languages. This makes the manual exploitation of this data by end users very difficult. Therefore, automated Information Extraction (IE) techniques are needed in order to extract useful information to be represented in a machine understandable semantic model. However, the task of transforming the largely informative unstructured text into structured knowledgebase that can be reasoned upon to infer new knowledge or predictions or decisions of interest to a specific beneficiary group is very complex. Addressing that complexity requires in-depth expertise in utilising and integrating various methods and technologies associated with Natural Language Processing (NLP), knowledge representation and Machine Learning (ML). Recently, the advantage of the achievements in the field of Semantic Web Technologies (SWT) have been extensively used in data analytics and decision-support systems in several application domains such as financial investment recommendation, a clinical management, system audit management, network security management, justice and legal advice, waste-water management, power consumption management and electronic issue management.

As a result, there is a pressing need for a comprehensive framework that offers an intelligent roadmap for aligning the discrepancies in knowledge presentation by various contributing information sources and deliver intelligent query methods against that extracted information and its semantic model. Such framework, in authors’ view, should benefit from knowledge of the problem domain that can assist the fundamental tasks of NLP, which are Named Entity Recognition (NER) and Relation Extraction.

Domain Knowledge is knowledge about a specific field/domain of interest or subject that are understood by practitioners in that field/domain of expertise. Compiling this knowledge requires in-depth analysis of the problem domain characteristics. These characteristics could be about the grammar and the meaning of words in the context of a sentence structure or style of the language of the domain. It is crucial to comprehend these characteristics to allow engineering them as linguistic or structural features. These features can then employed in the implementation of IE systems using a variety of approaches such as rule-based or ML based (Aljamel, 2018).

In this paper, a knowledge-based framework is proposed that is based on the authors extensive research and development efforts in building a knowledge-driven financial recommender system (Aljamel, 2018). The framework adopts SWT for domain knowledge representation because they can be utilised to represent the problem domain in a highly structured knowledge model (ontology) that enables software agents to comprehend domain-related information, and thus assist in automating the extraction of concepts and relations of relevance to the domain-of-interest. The semantic ontology is formally expressed using the standardised Semantic Web languages, which are Resource Description Framework (RDF), RDF Schema (RDFS) and Web Ontology Language (OWL), and facilitate the inference of new facts from the extracted and semantically-tagged information to support decision-making and knowledge exploration activities. Furthermore, because the targeted domain-specific knowledge is heavily represented by non-binary relations, the authors have investigated how to represent these relations in the domain-specific ontology model by using N-ary relation patterns (Hogan, 2020).

The proposed knowledge-based framework presents a comprehensive methodology for IE and exploration that comprises the processes of analysis and modelling of the domain knowledge, extracting information from unstructured data, constructing the semantic knowledgebase, enriching the semantic knowledgebase and lastly exploiting the resulting semantic Knowledgebase by intelligently exploring and processing it to support the decision making. Delivering these processes requires the integration of several diverse technologies including NLP, knowledge representation, ML and, evolutionary optimisation algorithms.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 2 Issues (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing