Script Identification of Camera Based Bilingual Document Images Using SFTA Features

B.V. Dhandra (Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed University), Pune, India), Satishkumar Mallappa (Gulbarga University, Kalaburagi, India), and Gururaj Mukarambi (Symbiosis Institute of Computer Studies and Research, Symbiosis International (Deemed University), Pune, India)

Source Title: International Journal of Technology and Human Interaction (IJTHI) 15(4)

DOI: 10.4018/IJTHI.2019100101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this article, the exhaustive experiment is carried out to test the performance of the Segmentation based Fractal Texture Analysis (SFTA) features with nt = 4 pairs, and nt = 8 pairs, geometric features and their combinations. A unified algorithm is designed to identify the scripts of the camera captured bi-lingual document image containing International language English with each one of Hindi, Kannada, Telugu, Malayalam, Bengali, Oriya, Punjabi, and Urdu scripts. The SFTA algorithm decomposes the input image into a set of binary images from which the fractal dimension of the resulting regions are computed in order to describe the segmented texture patterns. This motivates use of the SFTA features as the texture features to identify the scripts of the camera-based document image, which has an effect of non-homogeneous illumination (Resolution). An experiment is carried on eleven scripts each with 1000 sample images of block sizes 128 × 128, 256 × 256, 512 × 512 and 1024 × 1024. It is observed that the block size 512 × 512 gives the maximum accuracy of 86.45% for Gujarathi and English script combination and is the optimal size. The novelty of this article is that unified algorithm is developed for the script identification of bilingual document images.

Article Preview

Top

1. Introduction

In the present scenario, most of the peoples are using smart phones and there is a rapid development of the mobile technology which influences to living style of the people to capture scenes containing text. Preprocessing and recognition of text from the camera captured scene images is one of the challenging task due to variation in image formats, possible degradation such as blur, uneven lighting, low resolution and contrast which makes it more difficult to recognize text and script from the background noise. Hence, Automatic extraction of the text and recognition of its script from camera images are still open problems to the researchers to meet the highest recognition accuracy.

This motivated us to design a unified algorithm for the bilingual script identification from the camera captured document images.

Very little work is reported in the literature about the problem of script identification from camera captured bilingual documents images. Linlin Li and Chew Lim Tan (2008) proposed the Statistical technique for script identification of Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese, Korean, Roman, Thai, and Bengali from camera-based images and they considered character level identification from text line using signature generating and script generating template methods. They reported 91% recognition accuracy. Gururaj.M et al. (2017) has proposed camera-based tri-lingual script identification based on LBP Features and obtained the average recognition accuracy of 96.60%, 98.00% for 128 × 128 block, 98.71%, 98.07% for 256 × 256, 99.70%, 98.00% for 512 × 512 and 94.90%, 99.01% for 1024 × 1024 block using KNN and SVM classifiers respectively for English, Hindi and English scripts.

O.K. Fasil, S. Manjunath and V.N. Manjunath Aradhya (2017) have considered the script identification of English with Kannada, English with Malayalam and Malayalam with Kannada at word level from the scene images of bus sign boards using morphological operations for localize the text and extract the Gabor features, Log-gabor features and wavelet features in (Gabor = 16, Log-Gabor = 16 and Wavelet = 28) total of 60 potential features for script identification and they have exhibited the recognition accuracy of 97.40% for document images containing English and Kannada only using KNN classifier.

Nabin Sharma et al. (2013) have proposed algorithm to identify the scripts from video frames and extracted the features using Zernike moments, Gabor and Gradient features and used SVM classifier to identify the English, Bengali and Hindi scripts and obtained the average recognition accuracy of 82.90% for short words and 89.15% for long words respectively.

P. Shivakumara et al. (2014) proposed the method to identify the scripts at word-level from video and used Gradient-Angular to extract the features from Arabic, Chinese, English, Japanese, Korean and Tamil scripts and obtained 88.2% average classification rate.

S. Lu and C.L. Tan (2006) have used density, distribution of vertical character and document image vectorization and obtained the 95% recognition accuracy. Danni Zhao et al. (2012) reported the work on identifying six video scripts at block level using Spatial Gradient Features (SGF) and 770 frames dataset have been considered belonging to six scripts and achieved 82.11% average recognition rate.

Anguelos Niclaou et al. (2016) have introduced the method of crafted texture features such as SRS-LBP, MLP for script identification from video text and handwritten text and achieved the 98.1% for SRS-LBP and 92.78% for MLP respectively.

Zumra Malik et al. (2015) presented a system for video script identification by combining the various textural features. They considered SFTA, LPQ, LBP, HOG, GLCM, Gabor and AR Coefficient features and achieved the recognition rate of 86.54% for LPQ based features and combination of all features they obtained 96.75% recognition rate.

Akhtal Jamil et al. (2016) designed and developed a system for identification of script from video images of Multilingual Artificial text using GLCM and LBP features and achieved 89% recognition accuracy for 5 scripts (English, Urdu, Hindi, Chinese and Arabic).

Trung Quy Phan et al. (2013) have proposed an approach for recognition of text from perspective distorted natural scene images (StreetViewText-Perspective (SVT)) using Scale-Invariant Feature Transform (SIFT) descriptors with bag-of-key points and obtained recognition accuracy of 76.5% and 82.2% at character level and word level for ICDAR dataset and 67.0% and 73.3% at character and word level for SVT database.

Complete Article List

Search this Journal:

Reset

Volume 21: 1 Issue (2025)

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 7 Issues (2022): 4 Released, 3 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Script Identification of Camera Based Bilingual Document Images Using SFTA Features

Abstract

1. Introduction

Complete Article List