Segmentation of Multiple Touching Hand Written Devnagari Compound Characters: Image Segmentation for Feature Extraction

Segmentation of Multiple Touching Hand Written Devnagari Compound Characters: Image Segmentation for Feature Extraction

Prashant Madhukar Yawalkar (MET Institute of Engineering, India), Madan Uttamrao Kharat (MET Institute of Engineering, India) and Shyamrao V. Gumaste (MET Institute of Engineering, India)
Copyright: © 2018 |Pages: 24
DOI: 10.4018/978-1-5225-5775-3.ch008

Abstract

One of the most widely used steps in the process of reducing images to information is segmentation, which divides the image into regions that hopefully correspond to structural units in the scene or distinguish objects of interest. Segmentation is often described by analogy to visual processes as a foreground/background separation, implying that the selection procedure concentrates on a single kind of feature and discards the rest. Machine-printed or hand-drawn scripts can have various font types or writing styles. The writing styles can be roughly categorized into discrete style (handprint or boxed style), continuous style (cursive style), and mixed style. We can see that the ambiguity of character segmentation has three major sources: (1) variability of character size and inter character space; (2) confusion between inter character and within-character space; and (3) touching between characters.
Chapter Preview
Top

Introduction

One of the most widely used steps in the process of reducing images to information is segmentation which divides the image into regions that hopefully correspond to structural units in the scene or distinguish objects of interest. The segmentation of connected handwritten characters or digits is a main bottleneck in the Hand Written Character Recognition system. There are two major categories of touching strings, single touching and multiple-touching strings and they are divided into five subtypes of touching, as shown in Figure 1. Many algorithms have been proposed in the past years which can be classified into three categories based on the segmentation approaches: foreground-based, background-based, and recognition-based (Chen & Jhing-Fa, 2000). The methods working on foreground pixels (black pixels in a binary image) are categorized to the foreground-based approach. There are several possible techniques, such as contour tracing, stroke analysis, etc., in this category (Tang, Tu, Liu, Lee, Lin, & Shyu, 1998). They tend to become much more unstable in trying to accommodate for multiple-touching numeral strings or single-touching strings with long touching part. Most of the connected numeral strings of type 1 and type 2 in Figure 1 can be successfully segmented with foreground-based methods, but they may fail or could not get precise results in separating the connected numeral strings of type 3 (no obvious segmentation point), type 4 (containing useless stroke), or type 5 (multiple-touching). The sample for single-touching and multiple-touching handwritten devnagari compound characters is shown in Figure 2. The methods working on background pixels (white pixels in a binary image) are categorized to the background-based approach. The background-based methods first locate the feature points on the background regions (such as face-up valley, face-down valley, loop region...) or the feature points on the background skeletons (such as upper segment, lower segment, hole segment...). Then, the algorithm connects these feature points to get the segmentation path. The background-based methods still fail to separate the single-touching strings with long touching part and the multiple-touching strings especially when there are more than two touching points. It is similar to the foreground-based approach in that most of the connected numeral strings of type 1 and type 2 in Figure 1 are successfully segmented, but they usually fail or cannot get precise results in segmenting the connected numeral strings of type 3, type 4, or type 5 (Chen & Jhing-Fa, 2000). The methods applying a recognizer to separate the connected numeral strings are categorized to the recognition-based approach. In the recognition-based approach, the correct rate of segmentation depends too much on the robustness of recognizer and it is time consuming. The approach usually fails to separate the connected strings with overlap between the left and the right character or digit. Besides, they might fail in segmenting the connected numeral strings of type 4. This is because that the useless strokes of type 4 may cause the failure of a recognizer. The Chapter starts with introduction, then giving the basics of Image Processing, further discussion on various segmentation techniques followed by various techniques for feature extraction. Finally a detail discussion on segmentation of single and multiple touching hand written character or numeral strings which can help in improvising the accuracy of character recognition system has been done.

Figure 1.

Types of connected strings

Figure 2.

Sample of single touching and multiple touching characters

Complete Chapter List

Search this Book:
Reset