A common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data extraction and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
Published in Chapter:
A Metaheuristic Algorithm for OCR Baseline Detection of Arabic Languages
F. Daneshfar (University of Kurdistan, Iran), W. Fathy (University of Kurdistan, Iran), and B. Alaqeband (University of Kurdistan, Iran)
Copyright: © 2015
|Pages: 28
DOI: 10.4018/978-1-4666-7258-1.ch023
Abstract
Preprocessing is a very important part of cursive languages Optical Character Recognition (OCR) systems. Thus, baseline detection, which is one of the main parts of the preprocessing operation, plays a basic role on OCR systems; improvement on baseline detection could be absolutely useful for decreasing errors in recognition words. In this chapter, a metaheuristic- and mathematical-based algorithm is recommended, which has improved the baseline detection process in relation to the well-known baseline detection algorithms. The most important advantages of the proposed method are simplicity, high speed processing, and reliability. To test this novel solution, IFN/ENIT database, which is a well-known and attending database, is utilized. However, the proposed solution is reliable to any standard database of cursive language's OCR.