CorTag: A Language for a Contextual Tagging of the Words Within Their Sentence

Yves Kodratoff; Jérôme Azé; Lise Fontaine

doi:10.4018/978-1-60566-274-9.ch010

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

CorTag: A Language for a Contextual Tagging of the Words Within Their Sentence

Yves Kodratoff, Jérôme Azé, Lise Fontaine

Source Title: Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration

DOI: 10.4018/978-1-60566-274-9.ch010

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter argues that in order to extract significant knowledge from masses of technical texts, it is necessary to provide the field specialists with programming tools with which they themselves may use to program their text analysis tools. These programming tools, besides helping the programming effort of the field specialists, must also help them to gather the field knowledge necessary for defining and retrieving what they define as significant knowledge. This necessary field knowledge must be included in a well-structured and easy to use part of the programming tool. In this chapter, we present CorTag, a programming tool which is designed to correct existing tags in a text and to assist the field specialist to retrieve the knowledge and/or information he or she is looking for.

Chapter Preview

Top

Introduction And Motivation

In this paper we present a new programming language, called CorTag, which is devoted to tagging words within the boundaries of the sentence in which they are contained. The context we are concerned with here is therefore limited to the sentence and the words within it. The tagging process in CorTag includes syntactic, functional and semantic tags. Ultimately CorTag is designed to correct the existing tags in highly specialised or technical texts.

Our primary aim is to contribute to the creation of a system which is able to find interesting pieces of knowledge within specialised texts. There is no attempt being made towards the broader understanding of natural language. Our ambition is to be able to spot parts of the texts that may be of particular interest to the specialist of a given technical domain. As we shall see, the process does nevertheless require a kind of ‘primitive’ understanding of the text.

In creating this new language, we have been motivated by two facts which, despite being intuitively obvious, are challenging when used as a base for the building of a computer system.

The first of these is that the number of genre specific texts is increasing exponentially. It follows that humans can no longer handle these masses of texts and the whole process has to be automated. The scientific community is certainly aware of this need as it is exemplified by the large number of competitions and challenges, dealing with many topics expressed in many different languages. This has led to the development of software solutions devoted to solving at least one of the problems encountered for each step of the overall process. In order to make these steps explicit, let us propose a tentative list of the main steps involved. The text mining process starts by gathering the texts of interest, what we will refer to as ‘text gathering’. The process ends when the desired information has been found in the text. This final step is identified here as ‘information extraction’. There is a large set of intermediate steps which take place between these two steps, and the precise set of steps depends on the state of the retrieved texts and the nature of the information sought. The following sequence shows one possible ordering of the necessary intermediate steps:

text gathering → sorting → standardization → creation/improvement of lexicon → tagging and/or parsing → terminology → concept recognition → co-reference resolution → finding the relations among concepts → information extraction.

In the following, when speaking of any step in particular, we will always assume that all n-k steps have been executed before the current step_n. We shall not, however, assume that they have been correctly completed. One of the main difficulties is that these different levels of Natural Language (NL) processing are mutually dependent. In general, the context independent processes can be performed quite satisfactorily, while the context dependent ones are very challenging as we shall exemplify. Unfortunately, the users (and sometimes even the creators) of the ‘step_n specialized software’ are not aware that this software is absolutely unable to function properly if some of step_n-k has not been properly completed. For example, ‘sorting’, a step which will be described later in the paper, illustrates well the dependencies amongst steps. Sorting is not really context-dependent, as we shall explain, and therefore it is a step which should be completed relatively easily. However, an improperly performed step_n causes mistakes at step_n+k which then spread throughout the process. It is the context dependent steps which are most greatly affected by this. Since many of the context dependent mistakes of step_n-k cannot be detected before step_n, we need a language to backtrack and correct them. This defines the first primary constraint placed on CorTag’s development.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

CorTag: A Language for a Contextual Tagging of the Words Within Their Sentence

Abstract

Introduction And Motivation

Complete Chapter List