Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Voice-Based Image Captioning System for Assisting Visually Impaired People Using Neural Networks

Nivedita M., AsnathVictyPhamila Y., Umashankar Kumaravelan, Karthikeyan N.

Source Title: Principles and Applications of Socio-Cognitive and Affective Computing

DOI: 10.4018/978-1-6684-3843-5.ch011

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Many people worldwide have the problem of visual impairment. The authors' idea is to design a novel image captioning model for assisting the blind people by using deep learning-based architecture. Automatic understanding of the image and providing description of that image involves tasks from two complex fields: computer vision and natural language processing. The first task is to correctly identify objects along with their attributes present in the given image, and the next is to connect all the identified objects along with actions and generating the statements, which should be syntactically correct. From the real-time video, the features are extracted using a convolutional neural network (CNN), and the feature vectors are given as input to long short-term memory (LSTM) network to generate the appropriate captions in a natural language (English). The captions can then be converted into audio files, which the visually impaired people can listen. The model is tested on the two standardized image captioning datasets Flickr 8K and MSCOCO and evaluated using BLEU score.

Chapter Preview

Top

Introduction

Image Captioning is the method which involves perceiving a particular scene or image and formulating connections among various objects in a image and assigning a concise depiction or rundown of the image/scene. The field of deep learning has progressively faced development in the architectures and methods used for image captioning. But generally these deep learning models adhere to a guideline structure with not many alterations.

The entire model generally comprises of two sub-models: A Encoder (CNN) for separating highlights from the image, A Decoder (NLP Language Model) for producing the subtitles in light of the information highlights. The Encoder’s output is straightforwardly passed to the NLP Model alongside the train subtitles during the training phrase. Alongside this model design, attention models are additionally executed to mirror the visual attention of a genuine human being to capture and leverage features and visual elements when generating a word based on the image.

In this chapter, we shall give a brief into the major components involved in image captioning, a summary of those components and some examples of available image captioning methods, metrics and datasets. We shall then take a look at the proposed system for image captioning for the visually impaired.

Figure 1.

General Image Captioning Architecture (Left: Image Feature Extraction Right: Language Model generates caption outputs (y1,y2,..) from input x1,x2,..)

Computer Vision

In Computer Vision, the main element in many algorithms is called a filter. A filter is used to extract a particular type of information from the image. For example, the Sobel and Prewitt filters are used to extract edges. Similarly, we can make algorithms learn filters for colors, shapes and other image features.

Figure 2.

Edge Detection Filters (Left-Right: Vertical, Horizontal, Diagonal Filters)

Convolutional Neural Networks

The base architecture model of any Image related Deep Learning Model is CNN. An image is taken as input by a Convolutional Neural Network (ConvNet/CNN) and weights and biases will be assigned to all the identified objects in the image which will be helpful in differentiating one from the other. In the previous section we talked about filters.

The characteristics of the image will be learnt by these ConvNets .So using these CNNs we can extract millions of such features and then pass these features to a Feed Forward Neural Net to classify these images. Since CNNs can easily learn different features quickly, the pre-processing required is minimal when compared to other classification algorithms. Generally CNNs consist of three major layers:

•
Convolutional Layer(CL)
•
Pooling Layer(PL)
•
Fully Connected Layer(FCL)

Convolutional Layer

A CL is the main layer which consists of a set of filters. Each filter is convolved across the input image and the dot product between the filters and the input will be computed and the result will be the filter’s 2-dimensional activation map. The network can learn the filters that will be activated when certain type of the feature is detected from the input at some spatial position.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference