Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

A Dynamic Scaling Approach in Hadoop YARN

Warda Ismahene Nemouchi, Souheila Boudouda, Nacer Eddine Zarour

Source Title: International Journal of Organizational and Collective Intelligence (IJOCI) 12(2)

DOI: 10.4018/IJOCI.286176

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.

Article Preview

Top

Introduction

Cloud computing (CC) has emerged as a recent paradigm that combines distributed computations with server virtualization and storage capacity (Shah & Trivedi, 2015). Its fundamental idea revolves around providing multiple services to customers over the internet through three models: Infrastructure As A Service (IAAS), Platform As A service (PAAS) and Software As A service (SAAS). The use of CC minimizes the burden on users and helps them to focus on their core business. It liberates them from any concerns or costs related to infrastructure (Kalagiakos & Karampelas, 2011) allows companies to scale their computations as they grow. Deploying applications on the cloud offers multiple advantages including scalability, resource sharing, on demand services and distributed computations (Balashandan & Shivika, 2017). It has been proved to be more versatile than the traditional infrastructure from both service quality and security perspectives (Armbrust et al, 2010).

The use of CC has given a rise to cloud-based applications especially when it comes to dealing with large scale data or in another term Big Data applications. The fundamental goal behind Big Data is to derive knowledges and insights from previously collected or real time generated data passing by different phases of cleaning, processing and analyzing which will improve the decision-making process. It exists some properties to differentiate Big Data from traditional data, referred to as the V model, including large volume of variety data generated in high velocity (Khan et al, 2015). These properties constitute big problems and big challenges to both companies and researchers, not only, due to high demanding requirements for handling and processing this data but also the need of minimizing response time to minutes and even seconds (near real time). Hence, most of enterprises deploy their data on the cloud for its elastic, on demand, self-service and resource pooling nature (Wang et al, 2015; Rajput et al, 2019).

One of the most used Cloud-based Big Data applications is Apache Hadoop. It is a framework that allows distributed processing of large data sets across computers’ clusters by using simple programming models. Hadoop implements MapReduce, one of the methods to run and analyze parallel processing of data (Shah & Trivedi, 2015). It is designed from the beginning to scale thousands of machines in a shared nothing architecture (Apache Foundation, n.d). Running Hadoop on the cloud makes the ability to add / remove nodes more smoothly.

Despite the fact that cloud has been proved to be beneficial for Big Data (Jannapureddy et al, 2019), running large scale data computations has a huge influence on data centers energy consumption. Most of users tend to preconfigure the cluster’ resources to be able to process the maximum workload. In addition, the scalability of the cloud can lead to uncontrolled growth of resources to meet users’ demands which leads to more unused servers and thus energy waste (Wang et al, 2015). It has been stated that big fractions of the overall cost of ownership of datacenters are energy costs (Jam et al, 2013; Wang et al, 2015). According to a study on a sample of 5000 servers in Google, CPU utilization of servers in such large-scale data centers is quite low, ranging at 10% to 20% and even 60% of computing resources are run without even being used (Barroso & Holezle, 2009). For that reason, dynamic scaling is required to use the resources efficiently. Multiple research works have been proposed in order to achieve energy efficiency and reduce operational costs through adjusting dynamically acquired resources to the workload (Hosamani et al, 2020). In other words, it is the possibility to add or remove nodes automatically according to the current need of workload and placing the others in lower power standby mode (Manikandan & Ravi, 2014).

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022)

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A Dynamic Scaling Approach in Hadoop YARN

Abstract

Introduction

Complete Article List