Departing the Ontology Layer Cake

Departing the Ontology Layer Cake

Abel Browarnik (Tel Aviv University, Israel) and Oded Maimon (Tel Aviv University, Israel)
DOI: 10.4018/978-1-4666-8690-8.ch007
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter we analyze Ontology Learning and its goals, as well as the input expected when learning ontologies - peer-reviewed scientific papers in English. After reviewing the Ontology Learning Layer Cake model's shortcomings we suggest an alternative model based on linguistic knowledge. The suggested model would find the meaning of simple components of text – statements. From them it is easy to derive cases and roles that map the reality as a set of entities and relationships or RDF triples, somehow equivalent to Entity-relationship diagrams. Time complexity for the suggested ontology learning framework is constant (O(1)) for a sentence, and O(n) for an ontology with n sentences. We conclude that the Ontology Learning Layer Cake is not adequate for Ontology Learning from text.
Chapter Preview
Top

Introduction

An ontology is defined in our context as a formal, explicit specification of a shared conceptualization. Formal refers to the fact that the ontology should be machine readable (therefore ruling out natural language). Explicit means that the type of concepts used, and the constraints on their use are explicitly defined. Shared reflects the notion that an ontology captures consensual knowledge, that is, it is not private to some individual, but accepted by a group.

Most available ontologies are crafted and maintained with human intervention. Ontologies represent reality, and as such, require frequent updates, turning it into a resource that is both costly and difficult to obtain and maintain. To overcome this problem the discipline of Ontology Learning has emerged. Learning is interpreted in the literature as the process of creating the ontology and populating it. In our work, the goal of Ontology Learning is the (at least semi) automatic extraction of knowledge, possibly structured as simple or composite statements, from a given corpus of textual documents, to form an ontology.

Surveys have been conducted since the early days of Ontology Learning showing the different approaches used to tackle the problem. In fact, most, if not all, the approaches follow a model named the Ontology Learning Layer Cake and share many features such as statistical based information retrieval, machine learning and data and text mining, making resort to linguistics based techniques for certain tasks. The approaches include multiple steps towards learning an ontology, namely term extraction, concept formation, creation of a taxonomy of concepts, relation extraction and finally rules extraction. Usually, big corpora are required to obtain good results. Web textual data is often the target of choice, due to its abundance.

Our work includes the following items:

Analysis of the Goals of Ontology Learning from Text

An ontology represents in our view a “portion” of the world that we are looking at. As an example, the toxicity of engineered nanoparticles (or nanotoxicity) domain is something that we would like to model. Many organizations, among them the European Commission, are endorsing and financing research projects on the subject (See for instance NHECD – Browarnik et al., 2009). The nanotoxicity domain is relatively new and dynamic. Every new paper published on the subject may add a new detail to the nanotoxicity model (i.e., a new concept, or a new relation between concepts). Hence, we see Ontology Learning as a tool for modeling a domain and to keep it up-to-date.

Analysis of the Input Used to Learn Ontologies

The input used for modeling a domain is dependent on the domain itself. What matters when considering the input used is whether the input consists of well-formed text or not. Modeling scientific domains such as nanotoxicity would certainly be based on peer-reviewed papers or other kinds of scientific articles or books. These are texts that can be deemed well-formed and quality checked. We may safely estimate that a domain is not learnt from sources such as email or new media (such as posts in Facebook or Tweeter) because these media often contain ill-formed text. While it could be used for tasks such as sentiment analysis, the media itself seems inappropriate for Ontology Learning from text. Indeed, the quality of the input is one of the parameters to be taken into account when devising an ontology learning framework.

Complete Chapter List

Search this Book:
Reset