Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

XAR: An Integrated Framework for Semantic Extraction and Annotation

Naveen Ashish (University of California-Irvine, USA) and Sharad Mehrotra (University of California-Irvine, USA)

Source Title: Cases on Semantic Interoperability for Information Systems Integration: Practices and Applications

DOI: 10.4018/978-1-60566-894-9.ch011

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The authors present the XAR framework that allows for free text information extraction and semantic annotation. The language underpinning XAR, the authors argue, allows for the inclusion of probabilistic reasoning with the rule language, provides higher level predicates capturing text features and relationships, and defines and supports advanced features such as token consumption and stratified negotiation in the rule language and semantics. The XAR framework also allows the incorporation of semantic information as integrity constraints in the extraction and annotation process. The XAR framework aims to fill in a gap, the authors claim, in the Web based information extraction systems. XAR provides an extraction and annotation framework by permitting the integrated use of hand-crafted extraction rules, machine-learning based extractors, and semantic information about the particular domain of interest. The XAR system has been deployed in an emergency response scenario with civic agencies in North America and in a scenario with an IT department of a county level community clinic.

Chapter Preview

Top

Introduction

The vision of semantic interoperability on a large-scale, such as that envisioned by the concept of the Semantic Web (Berners-Lee, Hendler & Lassila, 2001), continues to sustain interest and excitement. The availability of automated tools for semantic annotation of data on the open Web is recognized as critical for Semantic Web enablement. In the process of semantic annotation we annotate significant entities and relationships in documents and pages on the Web, thus making them amenable for machine processing. The time and investment of marking and annotating Web content manually is prohibitive for all but a handful of Web content providers, which leads us to develop automated tools for this task. As an example, consider Web pages of academic researchers with their biographies in free text as shown in Figure 1.

Figure 1.

Semantic Annotation of Web Content

The annotation of significant concepts on such pages, such as a researcher’s current job-title, academic degrees, alma-maters and dates for various academic degrees etc (as shown in Figure 1) can then enable Semantic Web agent or integration applications over such data. Such annotation or mark-up tools are largely based on information extraction technology. While information extraction itself is a widely investigated area, one still lacks powerful, general purpose, and yet easy-to-use frameworks and systems for information extraction, particularly the extraction of information from free text which is a significant fraction of the content on the open Web. In this chapter we describe XAR, a framework and system for free text information extraction and semantic annotation. XAR provides a powerful extraction and annotation framework by permitting the integrated use of hand-crafted extraction rules, machine-learning based extractors, as well as semantic information about the particular domain of interest for extraction. In this chapter we will describe the XAR framework which permits the integrated use of 1) Hand-crafted extraction rules, 2) Existing machine-learning based extractors, and 3) Semantic information in the form of database integrity constraints to power semantic extraction and annotation.

We have designed XAR to be an open-source framework that can be used by end-user application developers with minimal training and prior expertise, as well as by the research community as a platform for information extraction research. Over the last year we have used XAR for semantic annotation of Web documents in a variety of interesting domains. These applications range from the semantic annotation of details of particular events in online news stories in an overall application for internet news monitoring, to the semantic annotation of free text clinical notes as part of a business intelligence application in the health-care domain. This chapter is organized as follows. In the next section we provide an overview of XAR from a user perspective i.e., as a framework for developing extraction applications. We then present the technical details of our approach including the XAR system architecture, algorithmic issues, and implementation details. We present experimental evaluations assessing the effectiveness of the system in a variety of different domains. We also describe use case studies of application development using XAR in two different organizations. Finally, we discuss related work and provide a conclusion.

Top

The Xar System

We first describe XAR from a user perspective i.e., as a framework for developing extraction applications and performing annotation tasks. The extraction step in annotation is treated as one of slot-filling. For instance in the researcher bios task, each Web page provides values for slots or attributes such as the job-title, academic degrees, dates etc. The two primary paradigms (Feldman et al., 2002) for automated information extraction systems are (i) Using hand-crafted extraction rules, and (ii) Using a machine-learning based extractor that can be trained for information extraction in a particular domain. Essentially, extraction applications in XAR are developed by using either hand-crafted extraction rules (Feldman et al., 2002) or machine-learning based extractors (Kayed 2006), which are further complemented with semantic information in the form of integrity constraints. We describe and illustrate each of these aspects.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

XAR: An Integrated Framework for Semantic Extraction and Annotation

Abstract

Introduction

The Xar System

Complete Chapter List