Integrating Ontologies and Bayesian Networks in Big Data Analysis

Integrating Ontologies and Bayesian Networks in Big Data Analysis

Hadrian Peter (University of the West Indies, Cave Hill, Barbados) and Charles Greenidge (University of the West Indies, Cave Hill, Barbados)
Copyright: © 2014 |Pages: 8
DOI: 10.4018/978-1-4666-5202-6.ch115

Chapter Preview




Many definitions have been advanced for the term “ontology.” Some of the more common ones are: Ontology is a strategy for representing knowledge in a consistent fashion; A description of the types of entities within a given domain and the relationships among them. The Power of ontologies lies in their utility for reasoning by means of software applications.

The motivation for our ontology-based framework is a number of earlier approaches to information extraction on the Web. These include hand-written wrappers whose limitations are robustness and scalability hurdles (Crescenzi et al., 2001; Shen et al., 2008); ontology-based matching of data on the Web (Hassell et al., 2006; Embley et al., 1998; Isaac et al., 2007); Jaccard based measures (Euzenat & Shvaiko, 2007); table extraction issues (Holzinger et al., 2006); constructing the MARSON system for performing mappings between relational schema and an (OWL-based) ontology (Hu & Qu, 2007); ontology modeling system for the identification/extraction of instance data from tabular Web pages (Shchekotykhin et al., 2007); key issues in data retrieval (DR), information retrieval (IR), knowledge representation (KR) and information extraction (IE) (Manning et al., 2008); the twin problems of information overload and search (Lee et al., 2008).

Our chapter focuses on mapping Web data, and external data from other sources, to domain ontologies, allowing several IE issues to be directly addressed. We use a variety of techniques to make sense of the structure and meaning of these data, ultimately providing a match to a domain ontology. In particular the WordNet lexical database (Gomez-Perez et al., 2004; Euzenat & Shvaiko, 2007; Fellbaum, 1998) is used to facilitate some basic matching activities. We also make use of current search engine capability in our ontology mapping process. Allowing search engine inputs (and those from other sources) helps us to align the matching process with data as they exist online, rather than as construed in some selectively crafted catalog which may not be representative of Web data (Schoop et al., 2006). Figure 1 is a simple ontology showing the different types of relationships between entities, and is adopted from (Greenidge & Peter, 2010).

Figure 1.

Simplified ontology

Key Terms in this Chapter

Information Retrieval: Is the activity of obtaining information relevant to an information need from a collection of information resources. Searches can be based on metadata.

Ontology: A method of representing items of knowledge (for example, ideas, facts, things) in a way that defines the relationships and classifications of concepts within a specified domain of knowledge.

Bayesian Network: Graphical method that encodes probabilistic relationships among variables of interest. It is an artificial intelligence system that uses probabilistic information in making inferences.

Big Data: Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

Data Mining: (sometimes called knowledge discovery) Is the process of analyzing data from different perspectives and summarizing it into useful information.

Complete Chapter List

Search this Book: