Character Segmentation Scheme for OCR System: For Myanmar Printed Documents

Character Segmentation Scheme for OCR System: For Myanmar Printed Documents

Htwe Pa Pa Win (University of Computer Studies, Myanmar), Phyo Thu Thu Khine (University of Computer Studies, Myanmar), and Khin Nwe Ni Tun (University of Computer Studies, Myanmar)
DOI: 10.4018/978-1-4666-3906-5.ch018
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Printed Document (OCRMPD), with several proposed techniques that can automatically recognize Myanmar printed text from document images. In order to get more accurate system, the authors propose the method for isolation of the character image by using not only the projection methods but also structural analysis for wrongly segmented characters. To reveal the effectiveness of the segmentation technique, the authors follow a new hybrid feature extraction method and choose the SVM classifier for recognition of the character image. The proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can increase the segmentation accuracy as well as recognition rates.
Chapter Preview
Top

2. Nature Of Myanmar Script

In Myanmar script, there is no distinction between upper case and lower case characters. The direction of writing is from left to right in horizontally. The character set consists of 35 consonants (including ‘978-1-4666-3906-5.ch018.g01’ and ‘978-1-4666-3906-5.ch018.g02’), 8 vowels signs, 7 independent vowels, 5 combining marks, 6 symbols and punctuations, and 10 digits. Each word can be formed by combining consonants, vowels and various signs. It has its own specified composition rules for combining vowels, consonants and modifiers. There are total of above 1881glyphs and has many similarity scripts in this language (e.g., 978-1-4666-3906-5.ch018.g03, 978-1-4666-3906-5.ch018.g04 and so on). When writing text, space is used after each phrase instead of each word or syllable. The shapes of Myanmar scripts are circular, consist of straight lines horizontally or vertically or slantways, and dots (Hussain, Durrani, & Gul, 2005; Maw, 2001; Alexander, 2003).

Complete Chapter List

Search this Book:
Reset