Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia

Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia

Ziqi Zhang (University of Sheffield, UK) and Fabio Ciravegna (University of Sheffield, UK)
DOI: 10.4018/978-1-60960-625-1.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Named Entity Recognition (NER) deals with identifying and classifying atomic texts into pre-defined ontological classes. It is the enabling technique to many complex knowledge acquisition tasks. The recent flourish of Web resources has opened new opportunities and challenges for knowledge acquisition. In the domain of NER and its application in ontology population, considerable research work has been dedicated to exploiting background knowledge from Web resources to enhance the accuracy of the system. This chapter gives a review of existing literature in this domain with an emphasis on using background knowledge extracted from the Web resources. The authors discuss the benefits of using background knowledge and the inadequacies of existing work. They then propose a novel method that automatically creates domain-specific background knowledge by exploring the Wikipedia knowledge base in a domain- and language-independent way. The authors empirically show that the method can be adapted to ontology population, and generates high quality background knowledge that improves the accuracy of domain-specific NER.
Chapter Preview
Top

Introduction

Ontology encompasses a set of terms or concepts and relations between the concepts, which collectively represent a domain-of-interest. It is the essential artifact for enabling Semantic Web. For this reason, ontology learning has attracted constant attention of researchers and practitioners from various domains. Automatic ontology learning consists of a number of different tasks, such as term extraction and normalization, synonym identification, concept and instance recognition, and relation extraction. Cimiano (2006) advocates that one of the most challenging tasks is ontology population, which addresses finding relevant instances of relations as well as of concepts, the latter being closely related to the task of named entity recognition (NER). On the other hand, NER is considered as one of the fundamental techniques towards ontology learning; and it has been studied extensively in this context, such as Guiliano (2009), and Weber and Buitelaar (2006).

The NER task originates from the sixth Message Understanding Conferences (MUC6) (Grishman & Sundheim, 1996), which defines the task as recognizing named entities and classifying them into proper concept classes. Despite the extensive research on this topic in the last fifteen years, the state-of-the-art solutions still suffer from lack of portability and extensibility, largely due to its dependence on domain-specific knowledge resources such as specialist lexicons and training corpus, and the cost of building and maintaining such resources. Recent years have witnessed the exponential growth of Web resources and the emergence of high-quality, large-scale collaboratively maintained knowledge resources such as Wikipedia and Wiktionary, which have proved useful in the application of knowledge discovery and acquisition. The abundance of such resources has created both opportunities and new challenges for the task of NER and ontology population. This has attracted significant attention from researchers, who have proposed new methods of mining useful background knowledge from these resources to enhance various knowledge discovery and acquisition tasks, such as NER and ontology population (Kazama & Torisawa, 2008, Guiliano & Gliozzo, 2008; Guiliano, 2009), computing semantic relatedness and similarity (Strube & Ponzetto, 2006; Gabrilovich & Markovitch, 2007; Zesch et al., 2008), and sense disambiguation (Cucerzan, 2007).

However, some issues remain unsolved; are there systematic ways of using a specific Web resource as background knowledge? Is there a generic method for exploiting domain-specific knowledge? How do we combine different Web resources into a coherent knowledge base for the task of entity recognition? This chapter aims to provide a review of existing work that address these issues, and propose a novel method of mining domain-specific background knowledge from Wikipedia and using the knowledge for NER and ontology population as an attempt to address these issues. The rest of the chapter is structured as follows. Firstly, we introduce the NER task and its relation to ontology population; next, we explain the importance of using background knowledge in NER, the new opportunities opened by the increasingly available Web resources, and review existing work carried out in this direction; then we introduce our novel method of exploiting the background knowledge from the most popular knowledge resource on the Web – the Wikipedia1, and compare our approach with others; finally we discuss the advantages and the limitations of the proposed method and future trends of research, then concludes the chapter.

Complete Chapter List

Search this Book:
Reset