Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement

Shalini Puri (Birla Institute of Technology, Mesra, Ranchi, India) and Satya Prakash Singh (Birla Institute of Technology, Mesra, Ranchi, India)

Source Title: Journal of Information Technology Research (JITR) 12(4)

DOI: 10.4018/JITR.2019100106

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This article introduces a new advanced tri-layered segmentation and bi-leveled-classifier-based Hindi printed document classification system, which categorizes imaged documents into pre-defined mutually exclusive categories by using SVM and Fuzzy matching at character and document classifications, respectively. During training, the improved and noise-free image is segmented into lines and words by profiling. Then it obtains Shirorekha Less (SL) isolated characters along with upper, left and right modifier components from the SL words. These components use their locations and inter character-modifier component distance to get associate with their corresponding characters only. Further, confidence values of all characters are calculated with SVM training and all characters are mapped into Romanized labels to generate the words. Finally, documents are classified by Fuzzy based matching of Romanized detected words and predefined classes. The average execution times of SL characters are 0.22675 sec. and 0.20375 sec. and classification accuracy are 74.61% and 80.73% for training and testing, respectively.

Article Preview

Top

Introduction

Over past two decades, with the tremendous advent, evolution, digitization and continuous growth in analysis of printed documents, Devanagari script and Hindi based processing systems have established their consistent and framed zone for information searching, extraction and retrieval from text and imaged documents. Although the text processing and accurate information retrieval (Puri & Kaushik, 2011; Puri & Kaushik, 2012) from Hindi printed scanned documents (Puri & Singh, 2018) have always been very complicated and challenging, yet it has achieved a great deal of success in accurate word and character recognition and also has got high level of researchers’ attention in recent days. In this article, a new automated Hindi Printed Document Classification System using Support Vector Machine and Fuzzy logic (HPDC-SF) is introduced, which proves to be an efficient advancement over currently available offline Hindi document processing systems. HPDC-SF is designed to classify scanned printed imaged documents into pre – defined mutually exclusive categories by using Support Vector Machine (SVM) at character level and Fuzzy matching at document level classification, respectively.

Many Hindi based processing systems have emerged in recent years through the combination of artificial intelligence (Padhy, 2005), pattern recognition, image processing (Gonzalez & Woods, 2008) and text mining (Han, Kamber, & Pei, 2012) concepts. These systems have contributed a lot towards the discrete and dynamic real time application areas of distributed environment. The automatic Hindi text processing system applications cover text syntax and semantics, editors, spell checkers, formatters, linguistics-based grammar and vocabulary, convertors, translators, transliteration, summarization, speech recognition with conversion, cross lingual and many other related fields. On the other side, many Hindi text imaged document methodologies have emerged in recent years (Puri & Singh, 2018; Sinha 2009), which have covered the areas of extraction and recognition of optical characters, words and lines in multi – script, multi – colored, multi – forms, multi – pattern, multi – oriented, multi – font and multi – sized documents. Therefore, it is found that there is a high need to design an advanced imaged document processing and classification system, which can work beyond Optical Character Recognition (OCR). Such systems need to build the words from extracted optical characters, to gather the image contents, and to classify the Hindi printed images optimally. Accuracy estimation of such systems is a major and highly critical aspect because only correct OCRing, word building, and effective classifier implementation can lead to accurate classification of Hindi printed images (Puri & Singh, 2018). The application areas of these automated document processing systems include categorization of Government legal files, security files, identification of property owners etc. In addition to this, they play a major role in separating the important text images from non-important ones. To estimate the measures and efficiency of HPDC-SF, various experiments have been performed on different types of Hindi printed images, which were collected from different Government sites, newsletters, novels, magazines, blogs, newspaper cuttings etc.

Complete Article List

Search this Journal:

Reset

Volume 17: 1 Issue (2025)

Volume 16: 1 Issue (2024)

Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 14: 4 Issues (2021)

Volume 13: 4 Issues (2020)

Volume 12: 4 Issues (2019)

Volume 11: 4 Issues (2018)

Volume 10: 4 Issues (2017)

Volume 9: 4 Issues (2016)

Volume 8: 4 Issues (2015)

Volume 7: 4 Issues (2014)

Volume 6: 4 Issues (2013)

Volume 5: 4 Issues (2012)

Volume 4: 4 Issues (2011)

Volume 3: 4 Issues (2010)

Volume 2: 4 Issues (2009)

Volume 1: 4 Issues (2008)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement

Abstract

Introduction

Complete Article List