Knowledge Base Refinement Using Limited Amount of Efforts from Experts

Knowledge Base Refinement Using Limited Amount of Efforts from Experts

Ki Chan (Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China), Wai Lam (Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China) and Tak-Lam Wong (Department of Mathematics and Information Technology, The Hong Kong Institute of Education, Hong Kong, China)
Copyright: © 2014 |Pages: 19
DOI: 10.4018/ijkbo.2014040101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Knowledge bases are essential for supporting decision making during intelligent information processing. Automatic construction of knowledge bases becomes infeasible without labeled data, a complete table of data records including answers to queries. Preparing such information requires huge efforts from experts. The authors propose a new knowledge base refinement framework based on pattern mining and active learning using an existing available knowledge base constructed from a different domain (source domain) solving the same task as well as some data collected from the target domain. The knowledge base investigated in this paper is represented by a model known as Markov Logic Networks. The authors' proposed method first analyzes the unlabeled target domain data and actively asks the expert to provide labels (or answers) a very small amount of automatically selected queries. The idea is to identify the target domain queries whose underlying relations are not sufficiently described by the existing source domain knowledge base. Potential relational patterns are discovered and new logic relations are constructed for the target domain by exploiting the limited amount of labeled target domain data and the unlabeled target domain data. The authors have conducted extensive experiments by applying our approach to two different text mining applications, namely, pronoun resolution and segmentation of citation records, demonstrating consistent improvements.
Article Preview

Introduction

In many information systems, different information processing components are required for building intelligent applications (Su et al., 2009; Mohanty et al., 2010). Knowledge bases are particularly useful in aiding decision making as expert knowledge can be flexibly captured and utilized. Expert knowledge can be represented as comprehensible rules for decision making in different applications (Chandra & Ravi, 2009; Liang & Rubin, 2009). However, we often encounter situations where we already have an existing knowledge base from a source domain and we wish to apply it to solve the same task in a target domain which is different from the source domain. Typically, direct application of the source knowledge base to the target domain would result in large degradation in performance due to the difference between the two domains. One solution is to acquire expert knowledge for the target domain to manually refine the knowledge base. Alternatively, another solution is to collect sufficient amount of labeled data via manual annotations in the target domain so that the knowledge base can be automatically discovered. But additional expert knowledge is expensive to acquire and manual annotations for sufficient data in the target domain may be costly or even infeasible. Hence, a useful approach would be refining the existing available source domain knowledge base to the target domain using a very small amount of labeled target domain data. Labeled data refers to pieces of information containing the answers or labels provided by experts to certain queries in the domain. An automated computer algorithm can be developed for analyzing the data and automatically constructing a model for solving the task related to the domain. This model can be regarded as a knowledge base which can aid the prediction of the answers to queries given some observations.

We investigate the refinement of an existing knowledge base represented in Markov Logic Networks (MLN) (Richardson & Domingos, 2006). A standard MLN is a combination of probabilistic and first-order logic graphical models. It consists of a first-order knowledge base which is a set of first-order logic formulae describing the logic relations of the task and a set of weights, in which a weight is associated with each formula. The representation of first-order logic enables flexible model construction capturing knowledge such as relations among entities. The problem setting investigated in this paper is described as follows. Suppose we need to solve a particular task, typically an existing source domain MLN suitable for problem solving in the source domain is available. Now we wish to refine it so that it is suitable for the target domain. During the refinement, a limited amount of target domain data is selected automatically and the truth values (annotations) of the queries to the data are acquired from experts. This limited amount of labeled target domain data and the remaining unlabeled target domain data are used to refine the source domain MLN for the target domain. Note that unlabeled target domain data refers to the data elements not selected for annotations.

In our previous work (Chan et al., 2010), we have proposed a method for logic relation refinement using unlabeled data only. In this current paper, we propose a new MLN knowledge base refinement framework based on pattern mining and active learning. Our method first analyzes the unlabeled target domain data and actively asks the expert to provide labels (or answers) for a very small amount of automatically selected queries. The idea is to identify the target domain queries whose underlying relations are not sufficiently described by the existing source domain knowledge base. Although the source and the target domains may have different underlying data distributions, they must also share certain similarities since they solve the same task. Potential relational patterns in the unlabeled target domain data are discovered and new logic formulae are constructed.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing