Co-Occurrence-Based Error Correction Approach to Word Segmentation

Ekawat Chaowicharat, Kanlaya Naruedomkul

Source Title: Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches

ISBN13: 9781613504475|ISBN10: 1613504470|EISBN13: 9781613504482

DOI: 10.4018/978-1-61350-447-5.ch023

MLA

Chaowicharat, Ekawat, and Kanlaya Naruedomkul. "Co-Occurrence-Based Error Correction Approach to Word Segmentation." Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches, edited by Chutima Boonthum-Denecke, et al., IGI Global, 2012, pp. 354-364. https://doi.org/10.4018/978-1-61350-447-5.ch023

APA

Chaowicharat, E. & Naruedomkul, K. (2012). Co-Occurrence-Based Error Correction Approach to Word Segmentation. In C. Boonthum-Denecke, P. McCarthy, & T. Lamkin (Eds.), Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches (pp. 354-364). IGI Global. https://doi.org/10.4018/978-1-61350-447-5.ch023

Chicago

Chaowicharat, Ekawat, and Kanlaya Naruedomkul. "Co-Occurrence-Based Error Correction Approach to Word Segmentation." In Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches, edited by Chutima Boonthum-Denecke, Philip M. McCarthy, and Travis Lamkin, 354-364. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-61350-447-5.ch023

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

A number of word segmentation algorithms have been offered in the past; however, there is still room for improvement. Co-occurrence-Based Error Correction (CBEC), the proposed approach in this chapter, is a novel Thai word segmentation approach that was designed to provide accurate segmentation results based on context and purpose. CBEC quickly segments the input string using any available algorithm; maximal matching was used in the experiment. Next, CBEC checks its segmentation output against an error risk data bank to determine if there is any error risk. The error risk data bank is developed based on a training corpus. The current version of the error risk bank was based on the training corpus available at BEST 2009. Then, CBEC re-segments the input string using the co-occurrence score of the word sequence to ensure the accuracy of the segmentation result.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Co-Occurrence-Based Error Correction Approach to Word Segmentation

MLA

APA

Chicago

Export Reference

Abstract

Request Access