Article Preview
TopIntroduction
Nowadays, the cloud offers a variety of web services. This technique solves several problems such as resource limits, cost, and standardization by proposing concrete solutions which are scalability and virtualization of resources.
The principle of optical character recognition (OCR) is to move from coding which perceives data in the form of pixels (image) to coding which perceives data in the form of character (text)
In this work, we are only involved in the recognition of handwritten characters. Tifinagh is the character set of Berber languages also called Tamazight. Tamazight is a language spoken by millions of human beings. It covers the northwest of the African continent. Since Tifinagh has recently become an official language in Algeria, the computerization of this language has become one of the most popular research subjects among Algerian researchers.
To develop this language, several works in the field of machine learning have been carried out. There are databases containing thousands and thousands of images representing the 33 characters of the Tifinagh alphabet. Several shape recognition and segmentation techniques have emerged. Our objective is to participate in these efforts by realizing a web service which offers the possibility of exploiting the forms recognition function of Tifinagh characters on a large scale and without the need to have powerful machines and this through the cloud Computing.
The Tifinagh script has existed since 500 BC. It disappeared in the northern zone of the Berber world with the establishment of the Arabs in 700 AC, but it remained used among the Tuaregs. There are several variants of Berber's writing. The known are western Libyan, oriental and Saharan. The presence of many varieties is due to the large area of the African continent occupied by a large number of Berber population and the large distance separating the different tribes. In 1970, the Berber Academy was created in Paris. Its objective is to propose a Berber alphabet to write, Kabyle, one of the most used Berber dialects today. The great progress is made by the Moroccan institution which adopted “neo-Tifinagh” as the official Berber alphabet and this by a decision of the Royal Institute for Amazigh Culture in Morocco (see figure 1).
The majority of our population owned a smartphone, which encouraged the use of optical character recognition (OCR). Cloud technology has enabled the emergence of a new field which is OCR web service. This area consists of saving the models in the cloud and the client interface on the user's smartphone. It offers the possibility of instant translation of historical documents, product notices, and video subtitling. It can also be part of a disability assistance system.
Figure 1.
Tifinagh keyboard from Lexilogos website
Developing an OCR for the Tifinagh alphabet is not an easy task due to the database limit of handwriting characters. Another obstacle is the existence of several variants. In reality, we can distinguish the basic Tifinagh characters according to IRCAM, the extended Tifinagh characters (IRCAM), Other Tifinagh letters, and the modern Tuareg letters. Therefore, the number of classes represents a real dilemma. Here the decision will have a direct effect on the quality of the model.
In machine learning, we can distinguish several types of classifiers. There is no better classifier. you have to test several configurations to make a decision. These tests are based on the change of the inputs and also the change in the hyperparameter values.
Deep learning is one of the best classifiers. But its major drawback is its complexity. Therefore, the model construction phase requires considerable time to complete the learning. The solution is to parallelize independent tasks on several processors or GPUs. Cloud computing makes it possible to create several virtual machines that can ensure parallel and fast computation.
In this work, we will try to show the advantages of using cloud computing for the development of an OCR web service for Tifinagh characters. The use of Google Colab is characterized by a relatively simple configuration, free access to GPUs and we can easily share the code thus the execution in the form of APIs.