Introducing the Ultimate NOUN Dataset for Online Handwritten Alphabet Recognition

Introducing the Ultimate NOUN Dataset for Online Handwritten Alphabet Recognition

DOI: 10.4018/979-8-3693-1418-0.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

For decades, researchers in the field of machine learning have been trying to raise the features of systems and develop methods and approaches, but they often suffer from a scarcity of data in various machine learning fields. For this and due to the urgent need in our research that we have developed in the field of online handwriting recognition, we had to create a special data set for the Arabic language characters written simultaneously on a graphical tablet. In this work, we created a data set called “Noon” in university neighbourhood laboratories and used its first version in research work that was published in international refereed journals. The construction details have gone through phases, which will be presented in the following sections in this chapter.
Chapter Preview
Top

Introduction

Even if computers are invading the world of communication, writing and speech remain two privileged modes of communication. On a more or less distant horizon, many works and prospective seek to make the computer disappear from the user's environment. Nevertheless, this one without feeling the constraints will be connected to the information systems by natural interfaces: a hand gesture, a look, speech ... and of course handwriting. From the old utopian concept of the paperless office, we arrive at the paradigm of the office without a perceptible computer.

Today, it is still very often the human who makes the adaptation efforts. The mini-keyboards of mobile phones are an example of this. Their ergonomics are quite limited when we try, for example, to compose a small message. However, the trend is clear; important progress has been made to bring the digital world closer to that of handwriting. The appearance of personal assistants (PDAs), tablet PCs, smart phones, these latest generation phones that combine a multitude of features -personal diaries, notepads, video games, camera and camera... - confirm this situation.

Very significant progress has also been recorded in the comfort of use of digital pens. Today they have become very comparable to conventional pens: their weight and volume have been halved in three years. It is thus possible to write with an almost ordinary pen, on paper, and to process this information in real time, possibly anywhere on the planet. In parallel with all these material advances, it remains to better master the interpretation of these written traces which will proliferate on multiple devices. It is within the framework of these online handwriting input applications with limited capabilities that our work is part of.

Five thousand years after its invention, five hundred fifty years after its automation, writing is still at the heart of human communication. In an age of increasingly sophisticated and efficient interactions between human and machine using “buttons”, microphones or cameras, it is natural to seek to understand handwriting automatically. Since the first attempts, address reading systems for automatic mail processing or for reading checks have undergone significant development and are now widely used. However, the comprehension of writing by a computer is still far from fully satisfactory. The reason is that the study of handwriting recognition is a very broad field in terms of both its applications and its techniques.(Sharma & Jayagopi, 2021)

Substantial texts corpora are available for language analysis and recognition. In the field of automatic online handwriting recognition, on the other hand, there are fewer annotated databases. Unlike the Arabic manuscript, the Latin manuscript has known several data sets have been developed such as UNIPEN which is an international project for the collection of online data and examples for online writing, in which a large number of universities and of companies. The project had a closed stage, in which around 40 donors wanted to carry out testing and learning trials. Several companies and universities provided online writing samples of UNIPEN. The first set of UNIPEN's online manuscript database is a “training package”, containing enough data to extract test and training configurations. UNIPEN is a large database, widely used in the performance evaluation of various online Latin handwriting recognition systems. (Ratzlaff, 2003)The UNIPEN database is the reference base for the development and comparison of handwriting recognition systems. This database contains traces of more than 200 writers. The difficulty of this database is mainly due to the number of writers and therefore to the many allographs they employ. The R01 / V06 version is often used in the field.(Parizeau et al., 2001).

The IRONOFF database (IReste ON / OFF database) is a dual on-line and off-line database collected and distributed by LORIA. It contains many isolated characters, numbers and words in French and English in UNIPEN format. This database was created in such a way that an inline point can be projected to its corresponding position in the scanned image, and conversely every element of the offline plot can be temporally indexed. (Poisson, 2005) This base was acquired with a Wacom UltraPad tablet, the typical spatial resolution is on the order of 300 dots per inch while the data sampling rate is on the order of 100 dots per second. It was collected from around 700 different writers. The following table shows a sub-corpus of the database containing: isolated numbers, lowercase letters, uppercase letters.(Rabhi et al., 2021)

Complete Chapter List

Search this Book:
Reset