Search-Based Classification for Offline Tifinagh Alphabets Recognition

Search-Based Classification for Offline Tifinagh Alphabets Recognition

Mohammed Erritali, Youssef Chouni, Youssef Ouadid
DOI: 10.4018/978-1-7998-4444-0.ch013
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The main difficulty in developing a successful optical character recognition (OCR) system lies in the confusion between the characters. In the case of Amazigh writing (Tifinagh alphabets), some characters have similarities based on rotation or scale. Most of the researchers attempted to solve this problem by combining multiple descriptors and / or classifiers which increased the recognition rate, but at the expense of processing time that becomes more prohibitive. Thus, reducing the confusion of characters and their recognition times is the major challenge of OCR systems. In this chapter, the authors present an off-line OCR system for Tifinagh characters.
Chapter Preview
Top

1 Introduction

A large number of populations all over the world, especially in North Africa, speak Amazigh language. Since its normalization in 2001, efforts of research centers have manifested themselves in numerous and in-depth studies on the promotion of this alphabet and the widening of its scope. This has led to the appearance of Amazigh documents written in Amazigh. From then on, the automatic processing and recognition of these documents became a very active field of research. Even though the OCR research is well advanced for Arabic, Latin and Chinese scripts, research on Amazigh scripts OCR s is still in the infancy stage. The goal is to elaborate a fast and accurate OCR to convert the text of such documents into a machine- readable representation easily reproducible by computers.

The main difficulty in developing an efficient OCR system is the confusion between characters. Indeed, the automatic recognition of characters consists in describing the content of the images automatically by features through an analysis of their visual content.

This analysis is confronted by: the noise produced during acquisition and the problem of shape variability that can come from scale and rotation changes. This accentuates the problem of intra-class variation and inter-class resemblance. This visual variation creates complicated relationships between character classes and their visual content, which leads to confusion between the characters and makes the recognition problem very difficult to solve. Most of the researchers attempted to solve this problem by combining multiple descriptors and / or classifiers which increased the recognition rate, but at the expense of processing time that becomes more prohibitive.

Oulamara et al. (Oulamara and Duvernoy, 1988) used the Hough transform to extract straight segments with their attributes (length and orientation). By analyzing the characters in the parametric space, a reading matrix is constructed containing the feature vectors of the reference images in the learning phase. Using a local database, the authors achieved interesting results. However, according to Djematene et al. (Djematen et al., 1998), this technique is not appropriate because the segmentation by the Hough transform does not produce a correct segmentation.

Ait Ouguengay et al. (Ait Ouguengay and Taalabi, 2009a) have proposed a recognition system based on multilayer artificial neural networks (RNA) with a single hidden layer to classify characteristic vectors composed of geometric properties (horizontal and vertical projections, centers of gravity in x and y, perimeter, area, compactness and central moments of order 2). Interesting results are obtained by testing the system on a local database.

Amrouch et al. (Amrouch et al., 2009) proposed an approach based on the extraction of directional information from the Hough transformation of each character in the form of a vector of observations. This information feeds a hidden Markov model (HMM). The results obtained are promising. However, the discrimination of these models is not very good because in the learning phase each character, according to the authors, is represented by a single reference image. To remedy this issue, Amrouch et al. (Amrouch et al., 2012) have replaced the Hough transform with a new technique to express a set of structural features from the contour of the character based on points that have maximum deviation. In the learning and classification phase, they combined dynamic programming with continuous HMMs. This approach has the advantage of being independent of the number of recognition classes (in terms of memory and speed) since the model is built for all classes. The results, which are quite encouraging, have shown that continuous HMMs are more robust. However, the disadvantages of this approach are the detection of the points that have the maximum deviation for the features extraction phase that seem restrictive for some fonts of the Amazigh writing.

Complete Chapter List

Search this Book:
Reset