Automatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia

Automatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia

V. Subramaniyaswamy (Department of Computer Science and Engineering, SASTRA University, Thanjavur, Tamilnadu, India)
Copyright: © 2013 |Pages: 29
DOI: 10.4018/jiit.2013070104
OnDemand PDF Download:


Due to the explosive growth of web technology, a huge amount of information is available as web resources over the Internet. Therefore, in order to access the relevant content from the web resources effectively, considerable attention is paid on the semantic web for efficient knowledge sharing and interoperability. Topic ontology is a hierarchy of a set of topics that are interconnected using semantic relations, which is being increasingly used in the web mining techniques. Reviews of the past research reveal that semiautomatic ontology is not capable of handling high usage. This shortcoming prompted the authors to develop an automatic topic ontology construction process. However, in the past many attempts have been made by other researchers to utilize the automatic construction of ontology, which turned out to be challenging due to time, cost and maintenance. In this paper, the authors have proposed a corpus based novel approach to enrich the set of categories in the ODP by automatically identifying the concepts and their associated semantic relationship with corpus based external knowledge resources, such as Wikipedia and WordNet. This topic ontology construction approach relies on concept acquisition and semantic relation extraction. A Jena API framework has been developed to organize the set of extracted semantic concepts, while Protégé provides the platform to visualize the automatically constructed topic ontology. To evaluate the performance, web documents were classified using SVM classifier based on ODP and topic ontology. The topic ontology based classification produced better accuracy than ODP.
Article Preview

1. Introduction

The unbridled growth of World Wide Web (WWW) has made a huge amount of information and resources available over the Internet. This rapid growth of information has resulted in searching for information on the web a challenging task (Sridevi & Nagaveni, 2011). In order to access web resources, a large number of standard web mining algorithms and information retrieval techniques have been developed based on simple keyword based matching. Yet, in a large corpus of documents, the users are unable to retrieve the desired information because these techniques do not consider semantic concepts in the web contents (Fortuna, Grobelnik, & Mladenic, 2005). To overcome this challenge, the semantic web has evolved with ontologies to describe the conceptual relationship between entities in a specific domain. Ontologies are simply defined as the taxonomy of the hierarchy of concepts (Wimmer et al., 2012; Deborah et al., 2012). It is mainly constructed to provide the knowledgeable representation that can describe the web resources using intelligent techniques for human understanding and machine processing (David & Antonio, 2004). In ontology, concepts in a specific domain are formulated using a proper encoding mechanism that can support efficient information retrieval and reduced information load due to the large corpus of documents (Nicola, 1998). An Ontology creation methodology for domain experts should be efficient and easy to learn (Nikolai, 2011). Ontology represents a set of concepts and the relationships among them for a particular domain (Jongwoo & Veda, 2011).

Topic ontology is defined as a hierarchy of a set of topics that are interconnected using semantic relations (Xujuan, Yuefeng, Yue, & Raymond, 2006). It is denoted as a graph in which each node represents the specific topic that forms a topic hierarchy. Further, a group of relevant topics is related to the specific concept in the topic ontology by maintaining a hierarchical semantic relationship among the concepts in topics. The construction process of topic ontology involves extracting keywords using standard text mining and information retrieval techniques. The construction is purely based on semantic relevance of the keywords. However, the keyword based construction approach is not efficient as it is not possible to construct ontology from the large corpus of web documents (Ana, Rocıo, Carlos, & Filippo, 2010).

Due to the shortcomings of keyword based construction, we propose the Open Directory Project (ODP), a multilingual open content directory of World Wide Web links (Dengya & Heinz, 2009). The ODP works on the principle of listing out the set of categories related to a specific concept. We propose a hyperlink based approach, wherein ontology is constructed through exploring and discovering the semantic concepts related to the categories associated in ODP. The main advantage of this approach is to allow the user to extend the categories according to their perspective to construct topic ontology. This approach merely requires the users to have a basic knowledge of the topic that they are searching to enrich the existing ontology. Hence, we deploy knowledge-based web resources, such as Wikipedia and WordNet to obtain the background semantic knowledge about the categories in the ODP.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing