Article Preview
Top1. Introduction
Semantic relations are relations that can exist between words, expressions and/or concepts. Semantic relations are the core elements in representing information conveyed by natural language text. As discussed by Hjørland (2007), semantic relations are determined based on the query or situation specific information, universal entities, deep semantics common to all languages, domain specific information etc. Semantic relations exist between any two components in a sentence – concepts, words or Named Entities.
Semantic relations include domain specific relations (Bach & Badaskar, 2008) such as ORG-LOC (organization, location) (Liu & Zhu, 2010), EMP-ORG (employee, organization) (Sun, 2009), etc., ontological relations - is-a (Hearst, 1992), part-whole (Girju et al., 2006) and semantic relations between constituents of a sentence such as semantic roles (Glidea & Jurafsky, 2002; Pantel & Deepak, 2004) – agent, patient, beneficiary, etc., noun-modifier relations (Nastse & Szpakowicz, 2003) and UNL semantic relations (UNDL, 2009) – agt (agent), plc (place), obj (object), cag (co-agent), tim (time), pos (possessor), mod (modifier), man (manner), etc. While semantic roles only define relations between noun constituents and verb with the standard set usually consisting of 8-10 relations, Universal Networking Language (UNL) defines 46 semantic relations between all types of constituents of a sentence. In this work, we attempt to extract all the 46 UNL defined semantic relations and in this way, a complete semantic representation of natural language sentences is obtained.
Semantic relation extraction is the task of automatically identifying and classifying the semantic relationships within a text or a document. Semantic relation extraction is useful in various Natural Language Processing (NLP) application tasks such as information extraction, ontology learning, summarization, question answering, etc. (Xu, 2007). Semantic relations can be extracted from text through two basic approaches: rule-based techniques (Garcia & Gamallo, 2011) or using machine learning techniques (Xu, 2007). Rule-based approaches to relation extraction normally require an appropriate specially designed set of rules for certain types or structures of sentences. However, additional rules will be required for tackling other types of sentences. In general, it is difficult to adopt these rules to a new domain. Therefore, it is a challenging task to design rule-based systems that tackle all types of sentences in a generic way irrespective of the domain. In addition to these issues, rule-based systems have no way of defining the effect of combination of rules and the link between individual rules and the overall strategy of rule ordering. Moreover, when new situations are encountered, rule-based systems are unable to learn, since rules cannot modify themselves and there is no general way to learn when to break rules.
Unsupervised (Quan et al., 2014; Zeng et al., 2014; Xu et al., 2015) and Semi-supervised (Sarhan et al., 2016) techniques for semantic relation extraction task have been actively studied. When dealing with text extracted from the web with documents that have many types of structures of sentences and information from many domains, rule-based systems have been found to be insufficient (Balaji et al., 2011). Therefore, machine learning techniques to tackle automatic semantic relation extraction have been attempted. Most machine learning techniques require tagged corpora specially tagged for semantic relation extraction, however these corpora are not readily available in many languages.