A New Fault-Tolerant Algorithm Based on Replication and Preemptive Migration in Cloud Computing

A New Fault-Tolerant Algorithm Based on Replication and Preemptive Migration in Cloud Computing

Abderraziq Semmoud, Mourad Hakem, Badr Benmammar, Jean-Claude Charr
Copyright: © 2022 |Pages: 14
DOI: 10.4018/IJCAC.305214
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Cloud computing is a promising paradigm that provides users higher computation advantages in terms of cost, flexibility, and availability. Nevertheless, with potentially thousands of connected machines, faults become more frequent. Consequently, fault-tolerant load balancing becomes necessary in order to optimize resources utilization while ensuring the reliability of the system. Common fault tolerance techniques in cloud computing have been proposed in the literature. However, they suffer from several shortcomings: some fault tolerance techniques use checkpoint-recovery which increases the average waiting time and thus the mean response time. While other models rely on task replication which reduces the cloud's efficiency in terms of resource utilization under variable loads. To address these deficiencies, an efficient and adaptive fault tolerant algorithm for load balancing is proposed. Based on the CloudSim simulator, some series of test-bed scenarios are considered to assess the behavior of the proposed algorithm.
Article Preview
Top

Introduction

The cloud is emerging as a wide-scale distributed computing infrastructure that enables resource sharing and coordinated problem solving in today’s world that needs information anywhere and anytime. It provides highly scalable, secure and efficient mechanisms for discovering and negotiating remote access to computing resources in a transparent manner.

System load is a measure of the amount of work that a computer system performs. If the load on some computers is generally heavier than on others, or if some processors execute tasks more slowly than others because of resources heterogeneity, they will be overloaded. Load balancing aims to ensure that all processors share the workload over the long term. Even though load balancing is essential to ensure high availability of applications in an increasingly critical environment, failures become inevitable as the number of components in the cloud system increases. Therefore, a load balancing algorithm should have the fault tolerance capability, i.e., it should perform uniform load balancing despite the presence of arbitrary node or link failures. One of two approaches can be adopted to provide fault tolerance: Proactive or Reactive Fault Tolerant Policy. The principle of the former is to avoid failures by predicting them and proactively taking preventive actions before a failure occurs. Preemptive migration (Engelmann et al., 2009), software rejuvenating (Pokharel & Park, 2010), self-healing (Dijkstra, 1974) are some proactive fault tolerant techniques. While the reactive one follows some kind of policies and helps to recover from failed state when a failure occurs. There are two classes of reactive fault tolerance techniques. The first class lets the application continue the execution until its termination even if some nodes fail. Compensation mechanisms in the run-time environment or in the application algorithm avoid the complete failure of the whole application (Cappello, 2009). This class of methods includes Replication (Schneider, 1993), Algorithmic-based fault tolerance (Chen et al., 2016), Natural Fault-tolerance (Ibrahim et al., 2013), Rescue work-flow (Sindrilaru et al., 2010) and Failure masking (Gamell et al., 2015). While in the second class, the effect of failures is repaired either by continuing the execution considering that the application will recover later a normal state or by re-executing the failing parts of the application. Rollback-Recovery (Mansouri et al. 2018), Forward-Recovery (Malekpour, 2006), Task re-submission (Plankensteiner et al., 2009), Retry (Lakhan & Li, 2020), task migration (Chakravorty et al. 2006) are common techniques of this class.

In this work, we design a hybrid failure management mechanism for load balancing in a cloud computing environment. This mechanism is based on preemptive migration and replication to proactively take preventive actions before a failure occurs and reactively manage the occurrence of failures. With this mechanism, we can effectively reduce resource usage, costs, and network traffic in data centers.

In the following, we summarize the contributions and the novelties of the presented study:

  • The design of a new distributed algorithm for fault-tolerant load balancing which is able to deal with both scalability and dynamicity of the targeted system.

  • The combination of proactive and reactive fault tolerance techniques in an adaptive way to cope with unpredictable failures.

  • Ensuring fault tolerance while maintaining the global load of the system balanced as much as possible.

The rest of this paper is organized as follows: Section 2 describes some related works on load balancing and fault tolerance techniques for the cloud environment. The system modeling and the problem formulation are given in Section 3. In Section 4, the distributed fault tolerance approach is presented. Section 5 focuses on the simulation setup and experimental results. Finally, the article ends with a summary of the contributions and some future work.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing