Save 10% on All IGI Global Research Books
& OnDemand Individual Chapter & Article DownloadsAvailable exclusively on IGI Global’s Online Bookstore. Offer valid through October 31, 2024

Special Offers
- Save 10% on the IGI Global Online bookstore
  Now through October 31, 2024, save 10% on all IGI Global research books & OnDemand individual chapter & article downloads. IGI Global contributors may stack this discount with their exclusive 50% contributor discount, which is automatically applied when logged into a contributor portal account. Non-contributors may also combine the discount with one other discount, including coupon codes. Not valid on open access processing charges, e-collections, or videos. Discount is not applicable for distributors.
  Explore Books & Chapters
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Domain Adaptation in Part-of-Speech Tagging

Miriam Lúcia Domingues, Eloi Luiz Favero

Source Title: Emerging Applications of Natural Language Processing: Concepts and New Research

DOI: 10.4018/978-1-4666-2169-5.ch003

OnDemand:

(Individual Chapters)

Available

$33.75

List Price: $37.50

Current Special Offers

10% Discount:-$3.75

TOTAL SAVINGS: $3.75

Abstract

Many Natural Language Processing (NLP) applications rely on accuracy of the part-of-speech taggers. Although many taggers have good accuracy for the domain in which they were trained, their accuracy typically is not portable to new domains due to problems, such as different linguistic structures or presence of new words. The need for domain adaptation has emerged as a new challenge for part-of-speech tagging and in most NLP tasks. The goal of this chapter is to highlight solutions that handle labeled and unlabeled data, methods that deal with such data to solve the domain adaptation problem, and to present a case study that has achieved significant accuracy rates on tagging journalistic and scientific texts.

Chapter Preview

Top

Introduction

Many state-of-the-art Natural Language Processing (NLP) applications based on supervised learning have good accuracy for the domain or genre in which they were trained; however, most of them exhibit a lack of portability to new domains due to problems such as different linguistic structures or the presence of new words. As a result, domain adaptation, which is the ability to exhibit good performance on both the training (source) and the new (target) domains, has emerged as a new challenge. This challenge arises in many NLP tasks, such as Part-Of-Speech (POS) tagging, Named Entity (NE) recognition, parsing, Word Sense Disambiguation (WSD), and relation extraction.

Published literature has addressed the importance of domain adaptation in NLP tasks by applying machine learning methods, such as supervised (Chelba & Acero, 2006; Daumé, 2007), unsupervised (Blitzer, McDonald, & Pereira, 2006; Jiang & Zhai, 2007; Huang & Yates, 2010), and ensemble methods (Daumé III& Marcu, 2006).

Jiang and Zhai (2007) cited several examples of domain adaptation problems. The first example is POS tagging, where the source domain being tagged is journalistic data and the target domain is scientific data. The second example is NE recognition, where the source domain being annotated is news articles and the target domain is personal blogs. The third example is personalized spam filtering, where many labeled spam and ham emails from publicly available sources must be adapted to an individual user’s inbox because of the specificities of the user distribution of emails and the individual notions of what constitutes a spam.

The objective of this chapter is to present state-of-the-art domain-adaptation problems focused on solutions in POS tagging, an important preprocessing task in many NLP applications. Specifically, we present experiments with the adaptation of a hybrid POS tagger, which improves tagging accuracy by reducing errors in new or Out-Of-Vocabulary (OOV) words and by making adjustments to the tagger to handle different data distributions in the source and in the target domains. This tagger has been trained with Portuguese texts to generate similar levels of accuracy on texts from two different domains: journalistic and scientific.

In the following sections, we first describe basic concepts of POS tagging and its main approaches. Then, we present the current state of the art in domain adaptation, including any related issues and problems. We highlight solutions using NLP systems that handle labeled and unlabeled data, taking the perspectives adopted by researchers working on NLP. There is also a brief overview of domain adaptation solutions in POS tagging. We then present a case study with a Portuguese POS tagger, followed by a discussion of future research directions and the conclusions of this chapter.

Top

Part-Of-Speech Tagging

POS tagging is the basic task of labeling a word or a token in a sentence with its grammatical category, such as noun, adjective, or verb. Punctuation marks are usually tagged as well. When a suitable automatic tagging algorithm is given a string of words and a specified tag set, the tagger outputs annotated results such as the following:

•
A/ART casa/N é/V grande/ADJ ./. (The house is big.)
•
Maria/NPROP casa/V hoje_à_noite/ADV ./. (Maria marries tonight.)

The tags of the examples are from the Mac-Morpho tag set (Aluísio, et al., 2003) and are described as the following: ART=article, N=noun, V=verb, ADJ=adjective, NPROP=proper noun, ADV=adverb and the punctuation mark .=.

A word is ambiguous when it has more than one grammatical category, such as the word “casa” in the example. (In Portuguese, the word “casa” may refer to the noun house or to the verb to marry.) The tag with the correct grammatical category will be assigned according to the context of the word in the sentence. For disambiguation, taggers use a large set of methods and techniques with different approaches to tag the words with the greatest accuracy possible. Tags may include more lexical attributes, such as gender, number, verbal mood, tense, and person. For example, the word “casa” may be tagged as NFS, a noun (N) that is feminine (F) in gender and singular (S) in number.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Domain Adaptation in Part-of-Speech Tagging

Abstract

Introduction

Part-Of-Speech Tagging

Complete Chapter List