Article Preview
Top1. Introduction
The recognition of Arabic handwritten script is a difficult task due to several challenges mainly linked to the different qualities and styles of handwriting that are subject to variation in writing within a single writer and between writers as well.
In the last two decades, many handwritten text recognition systems (Chherawala, Roy & Cheriet, 2013; Elarian et al., 2015; Sudir & Ravishankar, 2015) have been developed in several applications achieving acceptable/good results. Most of these works have focused on the feature extraction methods such as Gabor filter, the Scale Invariant Feature Transform (SIFT) descriptor, Invariant Moments (IM), Gradient-Structural-Concavity (GSC), the Histogram of Oriented Gradients (HOG) and the Pixel Intensities, etc. The choice of these hand-crafted features determines the performance of systems applied for classification and recognition. However, building good features from the textual image represents both a hard and a complex work. Yet, discriminative features for Latin/Asian text are not necessarily discriminative for Arabic text and vice versa.
In recent years, deep architectures have become popular for many machines learning (ML) and pattern recognition (PR) application. Deep learning (DL) methods have been effectively used for handwritten recognition and have been applied to digits and handwritten Arabic/Latin text databases. For example, Hinton et al. (Hinton, Osindero & Teh, 2006; Hinton & Salakhutdinov, 2006) proposed a greedy-training algorithm to construct a multi-layer network architecture which learns higher level feature representations, named Deep Belief Networks (DBN). Furthermore, LeCun et al. (LeCun, Bottou, Bengio & Haffner, 1998) used the Convolutional Neural Network (CNN) as a feature extraction and classification technique. Both methods i.e. DBN and CNN have obtained very high accuracy reaching 98.75% and even 99.47% respectively using MNIST datasets. The advantage of these deep networks is their ability to manage large dimensions input and to extract/learn features from the input data (image, speech) via deep layers automatically, allowing the use of raw data inputs rather than extracting a feature vector (manual descriptors) and learning complex decision border between classes. These Automatic features are invariant to the shift and shape distortions of the textual image input. On the contrary, the hand-crafted feature extractor needs elaborately designed features or even applies different types of features to achieve the distortion invariance of each character.
Within this context, handwritten recognition, different algorithms like Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Artificial Neural Networks (ANN) and Hidden Markov Model (HMM) etc. have been deeply exploited by researchers who attained a lot of favorable results. Performance and accuracy have been demonstrated by the systems in a large field of applications. However, among these shallow methods such as HMM presents a major problem in Arabic handwriting recognition due to the huge variability and distortions of patterns. Yet, MLP shows two restrictions in classification tasks. Firstly, there is not any theoretical relationship between both the classification task and the MLP structure. Secondly, MLP derives hyper-planes separation surfaces, in feature representation space, which are not optimal in terms of margin between the examples of two different classes.
On the other hand, being considered as one of the strongest and most vigorous algorithm of shallow networks in machine learning invented by Vapnik (Vapnik, 1998) support vector machines, have turned into being a popular approach used in various domains (Byun & Lee, 2003; Gorgevik, Cakmakov & Radevski, 2001; Guo, Li & Chan, 2000; Lahiani, Elleuch & Kherallah, 2015), like pattern recognition, classification, image processing, and hand gesture recognition. SVM provides a high generalization performance. Moreover, it uses the Structural Risk Minimization (SRM) principle and it tries to keep away the over-fitting problem by obtaining the decision hyper-plane which is most favorable to the maximum margin between classes (Vapnik, 1998).