Bootstrapping of Semantic Relation Extraction for a Morphologically Rich Language: Semi-Supervised Learning of Semantic Relations

Bootstrapping of Semantic Relation Extraction for a Morphologically Rich Language: Semi-Supervised Learning of Semantic Relations

Balaji Jagan (Anna University, Chennai, India), Ranjani Parthasarathi (Department of Information Science and Technology, Anna University, Chennai, India) and T V. Geetha (Department of Computer Science and Engineering, Anna University, Chennai, India)
Copyright: © 2019 |Pages: 31
DOI: 10.4018/IJSWIS.2019010106


This article focuses on the use of a bootstrapping approach for the extraction of semantic relations that exist between two different concepts in a Tamil text. The proposed system, bootstrapping approach to semantic UNL relation extraction (BASURE) extracts generic relations that exist between different components of a sentence by exploiting the morphological richness of Tamil. Tamil is essentially a partially free word order language which means that semantic relations that exist between the concepts can occur anywhere in the sentence not necessarily in a fixed order. Here, the authors use Universal Networking Language (UNL), an Interlingua framework, to represent the word-based features and aim to define UNL semantic relations that exist between any two constituents in a sentence. The morphological suffix, lexical category and UNL semantic constraints associated with a word are defined as tuples of the pattern used for bootstrapping. Most systems define the initial set of seed patterns manually. However, this article uses a rule-based approach to obtain word-based features that form tuples of the patterns. A bootstrapping approach is then applied to extract all possible instances from the corpus and to generate new patterns. Here, the authors also introduce the use of UNL ontology to discover the semantic similarity between semantic tuples of the pattern, hence, to learn new patterns from the text corpus in an iterative manner. The use of UNL Ontology makes this approach general and domain independent. The results obtained are evaluated and compared with existing approaches and it has been shown that this approach is generic, can extract all sentence based semantic UNL relations and significantly increases the performance of the generic semantic relation extraction system.
Article Preview

1. Introduction

Semantic relations are relations that can exist between words, expressions and/or concepts. Semantic relations are the core elements in representing information conveyed by natural language text. As discussed by Hjørland (2007), semantic relations are determined based on the query or situation specific information, universal entities, deep semantics common to all languages, domain specific information etc. Semantic relations exist between any two components in a sentence – concepts, words or Named Entities.

Semantic relations include domain specific relations (Bach & Badaskar, 2008) such as ORG-LOC (organization, location) (Liu & Zhu, 2010), EMP-ORG (employee, organization) (Sun, 2009), etc., ontological relations - is-a (Hearst, 1992), part-whole (Girju et al., 2006) and semantic relations between constituents of a sentence such as semantic roles (Glidea & Jurafsky, 2002; Pantel & Deepak, 2004) – agent, patient, beneficiary, etc., noun-modifier relations (Nastse & Szpakowicz, 2003) and UNL semantic relations (UNDL, 2009) – agt (agent), plc (place), obj (object), cag (co-agent), tim (time), pos (possessor), mod (modifier), man (manner), etc. While semantic roles only define relations between noun constituents and verb with the standard set usually consisting of 8-10 relations, Universal Networking Language (UNL) defines 46 semantic relations between all types of constituents of a sentence. In this work, we attempt to extract all the 46 UNL defined semantic relations and in this way, a complete semantic representation of natural language sentences is obtained.

Semantic relation extraction is the task of automatically identifying and classifying the semantic relationships within a text or a document. Semantic relation extraction is useful in various Natural Language Processing (NLP) application tasks such as information extraction, ontology learning, summarization, question answering, etc. (Xu, 2007). Semantic relations can be extracted from text through two basic approaches: rule-based techniques (Garcia & Gamallo, 2011) or using machine learning techniques (Xu, 2007). Rule-based approaches to relation extraction normally require an appropriate specially designed set of rules for certain types or structures of sentences. However, additional rules will be required for tackling other types of sentences. In general, it is difficult to adopt these rules to a new domain. Therefore, it is a challenging task to design rule-based systems that tackle all types of sentences in a generic way irrespective of the domain. In addition to these issues, rule-based systems have no way of defining the effect of combination of rules and the link between individual rules and the overall strategy of rule ordering. Moreover, when new situations are encountered, rule-based systems are unable to learn, since rules cannot modify themselves and there is no general way to learn when to break rules.

Unsupervised (Quan et al., 2014; Zeng et al., 2014; Xu et al., 2015) and Semi-supervised (Sarhan et al., 2016) techniques for semantic relation extraction task have been actively studied. When dealing with text extracted from the web with documents that have many types of structures of sentences and information from many domains, rule-based systems have been found to be insufficient (Balaji et al., 2011). Therefore, machine learning techniques to tackle automatic semantic relation extraction have been attempted. Most machine learning techniques require tagged corpora specially tagged for semantic relation extraction, however these corpora are not readily available in many languages.

Complete Article List

Search this Journal:
Open Access Articles
Volume 16: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 15: 4 Issues (2019): 3 Released, 1 Forthcoming
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing