Ontologies
Many definitions have been advanced for the term “ontology.” Some of the more common ones are: Ontology is a strategy for representing knowledge in a consistent fashion; A description of the types of entities within a given domain and the relationships among them. The Power of ontologies lies in their utility for reasoning by means of software applications.
The motivation for our ontology-based framework is a number of earlier approaches to information extraction on the Web. These include hand-written wrappers whose limitations are robustness and scalability hurdles (Crescenzi et al., 2001; Shen et al., 2008); ontology-based matching of data on the Web (Hassell et al., 2006; Embley et al., 1998; Isaac et al., 2007); Jaccard based measures (Euzenat & Shvaiko, 2007); table extraction issues (Holzinger et al., 2006); constructing the MARSON system for performing mappings between relational schema and an (OWL-based) ontology (Hu & Qu, 2007); ontology modeling system for the identification/extraction of instance data from tabular Web pages (Shchekotykhin et al., 2007); key issues in data retrieval (DR), information retrieval (IR), knowledge representation (KR) and information extraction (IE) (Manning et al., 2008); the twin problems of information overload and search (Lee et al., 2008).
Our chapter focuses on mapping Web data, and external data from other sources, to domain ontologies, allowing several IE issues to be directly addressed. We use a variety of techniques to make sense of the structure and meaning of these data, ultimately providing a match to a domain ontology. In particular the WordNet lexical database (Gomez-Perez et al., 2004; Euzenat & Shvaiko, 2007; Fellbaum, 1998) is used to facilitate some basic matching activities. We also make use of current search engine capability in our ontology mapping process. Allowing search engine inputs (and those from other sources) helps us to align the matching process with data as they exist online, rather than as construed in some selectively crafted catalog which may not be representative of Web data (Schoop et al., 2006). Figure 1 is a simple ontology showing the different types of relationships between entities, and is adopted from (Greenidge & Peter, 2010).