Towards Controlled Natural Language for Semantic Annotation

Brian Davis, Pradeep Dantuluri, Siegfried Handschuh, Hamish Cunningham

Source Title: International Journal on Semantic Web and Information Systems (IJSWIS) 6(4)

DOI: 10.4018/jswis.2010100103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Richly interlinked metadata constitute the foundation of the Semantic Web. Manual semantic annotation is a labor intensive task requiring training in formal ontological descriptions for the otherwise non-expert user. Although automatic annotation tools attempt to ease this knowledge acquisition barrier, their development often requires access to specialists in Natural Language Processing (NLP). This challenges researchers to develop user-friendly annotation environments. Controlled Natural Languages (CNLs) offer an incentive to the novice user to annotate, while simultaneously authoring his/her respective documents in a user-friendly manner. CNLs have been successfully applied to ontology authoring, but little research has focused on their application to semantic annotation. This paper describes two novel approaches to semantic annotation, which permit non-expert users to simultaneously author and annotate meeting minutes using CNL. Finally, this work provides empirical evidence that for certain scenarios applying CNLs for semantic annotation can be more user friendly than a standard manual semantic annotation tool.

Article Preview

Top

Introduction

The Semantic Web endeavors to bring machine-processable meaning to the content of webpages. It envisions the Web as a universal medium for data, information and knowledge exchange, creating an environment where intelligent software agents can travel freely between web resources, carrying out sophisticated tasks for users¹. In order for the Semantic Web to become a reality, we need, as a primer inter pares, semantic data. The process of providing semantic data is very often referred to as semantic annotation, because it frequently involves the embellishment of existing data, i.e. the text, with semantic metadata, which can subsequently describe the associated text. Hence semantic annotation is one of the core challenges for building the Semantic Web.

Manual semantic annotation however is a complex and labored task both time-consuming and expensive often requiring specialist annotators or the subsequent training of such annotators. This may require (an arguably unnecessary) exposure to formal ontological description. Such formal data representation can act as a significant deterrent for non-expert users or organizations seeking to annotate resources as part of their daily activity, thus allowing them to fully benefit from the adoption of Semantic Web technologies. While (Semi)-automatic annotation tools attempt to remove this constriction, which is commonly known as the knowledge acquisition bottleneck, their application often requires access to specialists who can combine Natural Language Processing(NLP)/Machine Learning(ML) and Semantic Web ontology languages. Such specialists are costly and rare and furthermore the creation or acquisition of quality language resources to bootstrap such approaches may require significant investment, which for a small to medium enterprises may not be justifiable. Consequently, this challenges researchers to develop user-friendly manual annotation environments to support the knowledge acquisition process.

Controlled Natural Languages (CNLs) offer an incentive to the novice user to annotate, while simultaneously authoring, her respective documents in a user-friendly manner, yet at the same time shielding her from the underlying complex knowledge representation formalisms of ontology languages. “Controlled Natural Languages are subsets of natural language whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity.”² The use of CNLs for ontology authoring and population is by no means a new concept and it has already evolved into quite an active research area (Smart, 2008). Furthermore, a natural overlap exists between tools used for both ontology creation and semantic annotation, for instance the Controlled Language for Information Extraction(CLIE)technology permits ontology creation and population by mapping both concept definitions and instances of concepts to a ontological representation using a CNL called CLOnE - Controlled Language for Ontology Editing (Funk et al., 2008). Despite such efforts, very little research has focused on applying CNLs to semantic annotation. The reader should note that there is a subtle difference between the process of ontology creation and population and that of semantic annotation. We describe semantic annotation as “a process as well as the outcome of the process”. Hence it describes i) “the process of adding semantic data or metadata to content given an agreed ontology and ii) it describes the semantic data or metadata itself as a result of this process”(Handschuh, 2005). Of particular importance here is the notion of the addition or association of semantic metadata to content.

Latent Annotation

As with any annotation environment, a major drawback is that in order to create metadata about a document, the author must first create the content and second annotate the content, in an additional a posteriori, annotation step. In the context of our application of CNL to semantic annotation, we seek to merge both authoring and annotation steps into one. This process differs from classic a-posteriori annotation resulting in a new type of annotation which we call latent annotation. Latent comes from the Latin word with identical spelling who’s etymology is derived from the Latin verb latere (lie hidden), a nod in respect to a-posteriori(later, what comes after)³.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Towards Controlled Natural Language for Semantic Annotation

Abstract

Introduction

Latent Annotation

Complete Article List