Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

XAR: An Integrated Framework for Semantic Extraction and Annotation

Naveen Ashish, Sharad Mehrotra

Source Title: Cases on Semantic Interoperability for Information Systems Integration: Practices and Applications

DOI: 10.4018/978-1-60566-894-9.ch011

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The authors present the XAR framework that allows for free text information extraction and semantic annotation. The language underpinning XAR, the authors argue, allows for the inclusion of probabilistic reasoning with the rule language, provides higher level predicates capturing text features and relationships, and defines and supports advanced features such as token consumption and stratified negotiation in the rule language and semantics. The XAR framework also allows the incorporation of semantic information as integrity constraints in the extraction and annotation process. The XAR framework aims to fill in a gap, the authors claim, in the Web based information extraction systems. XAR provides an extraction and annotation framework by permitting the integrated use of hand-crafted extraction rules, machine-learning based extractors, and semantic information about the particular domain of interest. The XAR system has been deployed in an emergency response scenario with civic agencies in North America and in a scenario with an IT department of a county level community clinic.

Chapter Preview

Top

Introduction

The vision of semantic interoperability on a large-scale, such as that envisioned by the concept of the Semantic Web (Berners-Lee, Hendler & Lassila, 2001), continues to sustain interest and excitement. The availability of automated tools for semantic annotation of data on the open Web is recognized as critical for Semantic Web enablement. In the process of semantic annotation we annotate significant entities and relationships in documents and pages on the Web, thus making them amenable for machine processing. The time and investment of marking and annotating Web content manually is prohibitive for all but a handful of Web content providers, which leads us to develop automated tools for this task. As an example, consider Web pages of academic researchers with their biographies in free text as shown in Figure 1.

Figure 1.

Semantic Annotation of Web Content

The annotation of significant concepts on such pages, such as a researcher’s current job-title, academic degrees, alma-maters and dates for various academic degrees etc (as shown in Figure 1) can then enable Semantic Web agent or integration applications over such data. Such annotation or mark-up tools are largely based on information extraction technology. While information extraction itself is a widely investigated area, one still lacks powerful, general purpose, and yet easy-to-use frameworks and systems for information extraction, particularly the extraction of information from free text which is a significant fraction of the content on the open Web. In this chapter we describe XAR, a framework and system for free text information extraction and semantic annotation. XAR provides a powerful extraction and annotation framework by permitting the integrated use of hand-crafted extraction rules, machine-learning based extractors, as well as semantic information about the particular domain of interest for extraction. In this chapter we will describe the XAR framework which permits the integrated use of 1) Hand-crafted extraction rules, 2) Existing machine-learning based extractors, and 3) Semantic information in the form of database integrity constraints to power semantic extraction and annotation.

We have designed XAR to be an open-source framework that can be used by end-user application developers with minimal training and prior expertise, as well as by the research community as a platform for information extraction research. Over the last year we have used XAR for semantic annotation of Web documents in a variety of interesting domains. These applications range from the semantic annotation of details of particular events in online news stories in an overall application for internet news monitoring, to the semantic annotation of free text clinical notes as part of a business intelligence application in the health-care domain. This chapter is organized as follows. In the next section we provide an overview of XAR from a user perspective i.e., as a framework for developing extraction applications. We then present the technical details of our approach including the XAR system architecture, algorithmic issues, and implementation details. We present experimental evaluations assessing the effectiveness of the system in a variety of different domains. We also describe use case studies of application development using XAR in two different organizations. Finally, we discuss related work and provide a conclusion.

Top

The Xar System

We first describe XAR from a user perspective i.e., as a framework for developing extraction applications and performing annotation tasks. The extraction step in annotation is treated as one of slot-filling. For instance in the researcher bios task, each Web page provides values for slots or attributes such as the job-title, academic degrees, dates etc. The two primary paradigms (Feldman et al., 2002) for automated information extraction systems are (i) Using hand-crafted extraction rules, and (ii) Using a machine-learning based extractor that can be trained for information extraction in a particular domain. Essentially, extraction applications in XAR are developed by using either hand-crafted extraction rules (Feldman et al., 2002) or machine-learning based extractors (Kayed 2006), which are further complemented with semantic information in the form of integrity constraints. We describe and illustrate each of these aspects.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

XAR: An Integrated Framework for Semantic Extraction and Annotation

Abstract

Introduction

The Xar System

Complete Chapter List