Tifinagh Handwriting Character Recognition Using a CNN Provided as a Web Service

Tifinagh Handwriting Character Recognition Using a CNN Provided as a Web Service

Ouahab Kadri, Abderrezak Benyahia, Adel Abdelhadi
Copyright: © 2022 |Pages: 17
DOI: 10.4018/IJCAC.297093
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Many cloud providers offer very high precision services to exploit Optical Character Recognition (OCR). However, there is no provider offers Tifinagh Optical Character Recognition (OCR) as Web Services. Several works have been proposed to build powerful Tifinagh OCR. Unfortunately, there is no one developed as a Web Service. In this paper, we present a new architecture of Tifinagh Handwriting Recognition as a web service based on a deep learning model via Google Colab. For the implementation of our proposal, we used the new version of the TensorFlow library and a very large database of Tifinagh characters composed of 60,000 images from the Mohammed Vth University in Rabat. Experimental results show that the TensorFlow library based on a Tensor processing unit constitutes a very promising framework for developing fast and very precise Tifinagh OCR web services. The results show that our method based on convolutional neural network outperforms existing methods based on support vector machines and extreme learning machine.
Article Preview
Top

Introduction

Nowadays, the cloud offers a variety of web services. This technique solves several problems such as resource limits, cost, and standardization by proposing concrete solutions which are scalability and virtualization of resources.

The principle of optical character recognition (OCR) is to move from coding which perceives data in the form of pixels (image) to coding which perceives data in the form of character (text)

In this work, we are only involved in the recognition of handwritten characters. Tifinagh is the character set of Berber languages also called Tamazight. Tamazight is a language spoken by millions of human beings. It covers the northwest of the African continent. Since Tifinagh has recently become an official language in Algeria, the computerization of this language has become one of the most popular research subjects among Algerian researchers.

To develop this language, several works in the field of machine learning have been carried out. There are databases containing thousands and thousands of images representing the 33 characters of the Tifinagh alphabet. Several shape recognition and segmentation techniques have emerged. Our objective is to participate in these efforts by realizing a web service which offers the possibility of exploiting the forms recognition function of Tifinagh characters on a large scale and without the need to have powerful machines and this through the cloud Computing.

The Tifinagh script has existed since 500 BC. It disappeared in the northern zone of the Berber world with the establishment of the Arabs in 700 AC, but it remained used among the Tuaregs. There are several variants of Berber's writing. The known are western Libyan, oriental and Saharan. The presence of many varieties is due to the large area of the African continent occupied by a large number of Berber population and the large distance separating the different tribes. In 1970, the Berber Academy was created in Paris. Its objective is to propose a Berber alphabet to write, Kabyle, one of the most used Berber dialects today. The great progress is made by the Moroccan institution which adopted “neo-Tifinagh” as the official Berber alphabet and this by a decision of the Royal Institute for Amazigh Culture in Morocco (see figure 1).

The majority of our population owned a smartphone, which encouraged the use of optical character recognition (OCR). Cloud technology has enabled the emergence of a new field which is OCR web service. This area consists of saving the models in the cloud and the client interface on the user's smartphone. It offers the possibility of instant translation of historical documents, product notices, and video subtitling. It can also be part of a disability assistance system.

Figure 1.

Tifinagh keyboard from Lexilogos website

IJCAC.297093.f01

Developing an OCR for the Tifinagh alphabet is not an easy task due to the database limit of handwriting characters. Another obstacle is the existence of several variants. In reality, we can distinguish the basic Tifinagh characters according to IRCAM, the extended Tifinagh characters (IRCAM), Other Tifinagh letters, and the modern Tuareg letters. Therefore, the number of classes represents a real dilemma. Here the decision will have a direct effect on the quality of the model.

In machine learning, we can distinguish several types of classifiers. There is no better classifier. you have to test several configurations to make a decision. These tests are based on the change of the inputs and also the change in the hyperparameter values.

Deep learning is one of the best classifiers. But its major drawback is its complexity. Therefore, the model construction phase requires considerable time to complete the learning. The solution is to parallelize independent tasks on several processors or GPUs. Cloud computing makes it possible to create several virtual machines that can ensure parallel and fast computation.

In this work, we will try to show the advantages of using cloud computing for the development of an OCR web service for Tifinagh characters. The use of Google Colab is characterized by a relatively simple configuration, free access to GPUs and we can easily share the code thus the execution in the form of APIs.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing