Article Preview
Top1. Introduction
Automatic handwriting recognition is the technique by which a computer system can recognize characters and other symbols written by hand in one’s natural handwriting. The role of automatic handwriting recognition, of both alphabetic characters and numeric digits, is increasingly important as today’s technologies continue to improve. There are an enormous amount of applications of handwriting recognition, including the automatic scanning of personal checks at an ATM to be deposited into a bank account. Other applications include handwriting recognition on devices such as PDA’s and tablet PC’s where a stylus-pen is used to write on a screen, after which the computer turns the handwriting into digital text.
Another noteworthy application of handwriting recognition is signature verification. This is important because every year, millions of dollars are lost to fraudulent credit card charges, which could be prevented by more stringent signature verification policies. For example, many store clerks do not routinely check the signature of a customer against that of his/her credit card. Even if signature verification was regularly conducted, the clerk’s knowledge of handwriting forgery would probably be limited, and thus the verification would be superficial. Signature verification, if done by specialized computer software, could do a much better analysis of the signature than any human specialist could ever do and might lessen the burden on the criminal justice system, which frequently investigates accusations of signature forgery (Huber & Headrick, 1999).
A few statistical techniques have been proposed within the handwriting recognition community, such as clustering procedures with Hidden Markov Models, Neural Network Models (Morasso, et. al., 1993), maximum likelihood estimators (Sas & Kurzynski, 2007), and feature extraction methods using distance measures such as Kullback-Liebler. Several previously studied statistical handwriting identification models involve hierarchal clustering techniques. Nosary, et. al. (2003) proposed a probabilistic approach to define clusters. In this study, each handwritten character or digit uses an approach to learn the probabilities that a character belongs to a given cluster.
Another statistical clustering approach was developed in Smyth (1997), where an algorithm was presented to cluster sequences into a predefined number of clusters, along with a preliminary method to find the numbers of clusters through cross-validation using a Monte Carlo estimation. This theoretical approach relies on iterative re-estimation of parameters via an instance of the expectation–maximization (EM) algorithm, which requires careful initialization. Furthermore, the structure of the model is limited to a mixture model of fixed-length left-right Hidden Markov Models, which may not correctly model sequences of varying length in the data. The idea of using Hidden Markov Models for clustering handwritten characters was later tackled by Perrone & Connell (2000), but their approach also depends on initialization parameters, thus some supervised information is needed to achieve good performance.
A research group at George Mason University and Gannon Technologies, under funding from the FBI, developed the system known as FLASH ID, which stands for Forensic Language-independent Analysis System for Handwriting Identification (Saunders, et. al., 2011). The method consists of extracting features from graphs of characters and digits, building a graph feature vector, and identifying the unknown character or digit graph by matching it against a database containing a set of known character/digit graphs. These graphs are denoted as isocodes, which are built using nodes as the ends and cross-points of curves and the curves as the edges. The distribution of the data sample of isocodes is then compared to the population distribution using the Kullback-Liebler distance.