Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Deep Neural Models and Retrofitting for Arabic Text Categorization

Fatima-Zahra El-Alami, Said Ouatik El Alaoui, Noureddine En-Nahnahi

Source Title: International Journal of Intelligent Information Technologies (IJIIT) 16(2)

DOI: 10.4018/IJIIT.2020040104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Arabic text categorization is an important task in text mining particularly with the fast-increasing quantity of the Arabic online data. Deep neural network models have shown promising performance and indicated great data modeling capacities in managing large and substantial datasets. This article investigates convolution neural networks (CNNs), long short-term memory (LSTM) and their combination for Arabic text categorization. This work additionally handles the morphological variety of Arabic words by exploring the word embeddings model using position weights and subword information. To guarantee the nearest vector representations for connected words, this article adopts a strategy for refining Arabic vector space representations using semantic information embedded in lexical resources. Several experiments utilizing different architectures have been conducted on the OSAC dataset. The obtained results show the effectiveness of CNN-LSTM without and with retrofitting for Arabic text categorization in comparison with major competing methods.

Article Preview

Top

Introduction

Over the last decades, we have been experiencing the explosion of textual information such as social media, numerical books and digital encyclopedia, etc. Natural Language Processing (NLP) techniques have been designed to help user to analyze and extract insight from huge amount of textual data. Innovative Machine Learning (ML) approaches, such as neural networks and deep learning models, showed significant enhancements in many NLP applications (information retrieval, document clustering, etc.). Text categorization (TC) is a fundamental task in diverse text mining applications such as sentiment analysis (Kim, 2014), question classification (Alami, En-Nahnahi, Zidani, & Ouatik, 2019), information filtering and topic classification (El-Alami & El Alaoui, 2018). This process consists of assigning a predefined label or a category to a textual document. However, building a TC system remains a challenging task due to the following two reasons: (1) The high dimensionality of feature space which decreases the performance of the categorization system; (2) The existence of redundant and noisy features that misguide the TC results. To address these issues, various feature representation methods have been proposed. The most known representations are Bag-of-Words (BoW), pLSA, LDA, word embeddings and doc2vec. BoW (Wang & Manning, 2012) extracts patterns like unigrams, bigrams, n-grams as features by considering text as independent tokens. However, this method cannot capture semantics within texts and fails to reflect similarities among words. PLSA (Cai & Hofmann, 2003) and LDA (Hingmire, Chougule, Palshikar, & Chakraborti, 2013) are topic modeling methods which are, generally, applied to select more discriminative features but they suffer from inference problem. More efficient representations as word embeddings (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013b) or document embeddings (Le & Mikolov, 2014) are defined as a set of language modeling techniques able to present words of the vocabulary or text as low dimensional vectors of real numbers through neural language models. They ignore the information embedded in lexical database. These representations have shown a good performance in Arabic text categorization. However, the information embedded in lexical database is ignored.

While several TC systems have been proposed for other languages (English, French, etc), Arabic TC still faces numerous difficulties in addition to the challenges discussed above. This could be explained by the complexity of Arabic language owing to the fact that it is inflectional and derivational.

In this paper, we explore deep neural models and retrofitting for Arabic text categorization to solve the aforementioned shortcomings such the luck of semantic, the high-dimensionality of representation space and the complexity of Arabic language. The retrofitting is defined as a graph-based learning technique utilizing lexical relational resources to train higher quality semantic vectors. It is employed for further enhancement. On the other hand, deep neural networks are able to achieve best results in many NLP tasks. Convolutional Neural Networks (CNNs) are able to achieve good performances in highlighting best features and empowering deeper models (Kim, 2014; Kalchbrenner, Grefenstette, & Blunsom, 2014). Long Short-Term Memory networks (LSTMs) are a kind of Recurrent Neural Networks (RNNs) which enable associations between cells to construct a directed graph along a sequence. They have demonstrated great capabilities in highlighting dynamic behavior of sequential data (Hochreiter & Schmidhuber, 1997). The main contributions of this work can be summarized as follows:

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Deep Neural Models and Retrofitting for Arabic Text Categorization

Abstract

Introduction

Complete Article List