Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Development of Part of Speech Tagger for Assamese Using HMM

Surjya Kanta Daimary, Vishal Goyal, Madhumita Barbora, Umrinderpal Singh

Source Title: Natural Language Processing: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-7998-0951-7.ch054

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This article presents the work on the Part-of-Speech Tagger for Assamese based on Hidden Markov Model (HMM). Over the years, a lot of language processing tasks have been done for Western and South-Asian languages. However, very little work is done for Assamese language. So, with this point of view, the POS Tagger for Assamese using Stochastic Approach is being developed. Assamese is a free word-order, highly agglutinate and morphological rich language, thus developing POS Tagger with good accuracy will help in development of other NLP task for Assamese. For this work, an annotated corpus of 271,890 words with a BIS tagset consisting of 38 tag labels is used. The model is trained on 256,690 words and the remaining words are used in testing. The system obtained an accuracy of 89.21% and it is being compared with other existing stochastic models.

Chapter Preview

Top

1. Introduction

Part-of-Speech (POS) tagging is the process where every word in a natural language sentence is marked with its corresponding part of speech category like noun, verb, adjective, adverb, etc. based on both its definition and context. Besides words, punctuation characters and symbols are also labeled accordingly. It is a very important process because it resolves the ambiguity of words in a sentence by assigning accurate POS label to a word depending on the context. As Assamese is morphologically rich and agglutinative language, several words have more than one POS category that makes the word ambiguous. There is an inflection of noun and verb in a sentence in accordance with the grammatical characteristics as well. Therefore, POS tagging becomes a challenging task for Assamese. POS Tagger tries to assign the accurate POS labels to ambiguous words in a sentence according to the context and it has a vital role in various NLP applications as because the POS tagged data is used in many other NLP tasks (Jurafsky & Martin, 2000), e.g., in Parsing, the tagged data helps in finding out noun and verb groups, in Named Entity Recognition, it helps in determining the proper names like the name of a person, place or a thing, in Information Retrieval, it helps in selecting the proper nouns or other important word classes from a given text, in Speech Recognition, it helps in modeling a language, in Machine Translation, it helps in generating the probability for word translation of the source language into the target language, as well as it is useful for many other NLP applications. Thus, it is considered as an initial step of the language processing task. As POS Tagger has a great impact on other NLP systems, a tagging result with high accuracy is always encouraging.

There are several methods of POS tagging and basically there are three main approaches which are Rule Based Approach, Stochastic Approach and Hybrid Approach. Rule Based POS tagging is the most primitive approach where hand-written linguistic rules are used for tagging. These rules identify the appropriate tag for an ambiguous word. This method is dependent on dictionary or lexicon to generate the possible POS tags for every word in input text. The Stochastic Approach is based on the probabilities of words that occur for a particular tag. The tag which occurs most repeatedly in the training data is assigned to unknown or ambiguous word. The probability of a given sequence of tags is calculated from the frequency of words from the annotated training corpus. Hybrid Approach is the combination of more than one method which usually contains rule- based and statistical methods. This model uses the essential feature of statistical approaches and uses the rules for better efficiency. The developed POS Tagger for Assamese follows the Stochastic Approach. A bigram Hidden Markov Model (HMM) is used which is one of the processes in this technique. It is a probabilistic model that uses an annotated training corpus. The tagging process is done by computing the tag sequence probability and the word likelihood probability of the corpus. This method is called supervised learning method. Therefore, HMM requires a large amount of annotated corpus to obtain high accuracy. On the other hand, unsupervised learning method does not use the annotated corpus and it calculates the probabilities by using automatic word groupings.

This paper is further divided into five more sections in which second section provides the related work and next section shows the morphological characteristics of Assamese. Fourth section describes the approach then fifth section gives the evaluation of the system. Finally, the paper is concluded in sixth section.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Development of Part of Speech Tagger for Assamese Using HMM

Abstract

1. Introduction

Complete Chapter List