A Fault Tolerance and Recovery Formal Model for IoT Systems

A Fault Tolerance and Recovery Formal Model for IoT Systems

Sahar Smaali, Riadh Benbessem, Hatem Mohamed Nazim Touati
Copyright: © 2022 |Pages: 24
DOI: 10.4018/IJOCI.305840
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In an increasingly connected world, the internet of Things (IoT), Cloud and Fog computing are a major asset allowing to overcome previously inconceivable limits in terms of innovation. However, Fault Tolerance remains a major challenge for assuring IoT systems dependability. In order to tackle this issue, we propose a generic microservice architecture called FaTMA (Fault Tolerance- Microservice Architecture for IoT) permitting detection of Things failures by providing continuous and real-time monitoring of their states. In addition, it offers mechanisms to strengthen the reliability of the designed systems. We adopt Bigraphical Reactive Systems (BRS) as formalism to define a formal model that describes architectural elements of different IoT system layers and their behavior. It provides a clear separation between the various microservices controlling this system type and their side effects. Indeed, the execution of the proposed model, through BigraphER tool, permits to simulate and analyze different failure scenarios as well as their restitution strategies.
Article Preview
Top

Introduction

The emergence of paradigms such as IoT, Cloud, and Fog computing has given birth to a new type of complex systems comprising many users, sensors, and applications that must react to context data, provide services, and manipulate devices. Besides, with the advancements in sensor hardware technology and inexpensive materials, ensuring the reliability of billions of connected devices and sensors has become a major challenge for IoT. System dependability allows users to place justified confidence in the service delivered to them (Aviˇzienis et al, 2004) encompassing attributes such as availability, security, and maintainability. If IoT systems are not as reliable as the traditional systems they are replacing, they will not meet the expectations of their users (Terry, 2016).

Given the trend of using unreliable and low-cost devices, system dependability is threatened by failures and errors (Power et al, 2018) at any architectural level of an IoT system: sensor and actuator nodes may crash, network links may be interrupted, and processing and storage components may not function properly (Moghaddam & Muccini, 2019). Fault tolerance is one way to ensure system dependability by allowing the system to continue to deliver a reliable service despite the presence of failures and errors. Fault tolerance aims to avoid failures by detecting faults, restoring the system and applying corrective maintenance strategies.

Typically, in the real world, building a fault-tolerant IoT system is a complex task, mainly because of the extremely large variety of edge devices, data computing technologies, networks, and other resources that may be involved in the development process (Ghosh et al., 2020). Over the last few years, an increased effort has been oriented towards this issue. Several contributions were proposed to address IoT fault tolerance from hardware and network aspects of an IoT system. However, only little attention has been given to investigate and analyze this issue from the software architectural point of view. This later remains largely under-explored in contemporary scientific literature (Ghosh et al., 2020), although that software engineering provides good support for fault detection and recovery of failed services.

We propose, in this article, a formal model for the specification and verification of fault tolerant IoT systems. Our contribution in this paper is twofold:

  • 1.

    We propose a generic Fault Tolerance Microservice Architecture (called FaTMA) that takes account failures in different layers of an IoT system (Things, Fog, Cloud). Our architecture is based on the microservices style and the Fog infrastructure to provide decentralized control and fault management, weak coupling between services, and more flexibility.

  • 2.

    We provide a formal model in order to consolidate FaTMA architecture. In particular, we adopt bigraphical reactive systems (BRS) (Milner, 2008), as a semantic formal framework for specifying structural and behavioral aspects of our architecture. Moreover, FaTMA bigraphical model allows us to analyze and simulate various fault tolerance and system recovery scenarios at the early phases of software life cycle.

The remainder of the paper is structured as follows: In Section 2, we outline the principle of our approach and present the FaTMA architecture. We explain how it enforces fault tolerance in IoT systems in Section 3. Section 4, gives an overview of BRS systems and how they are adopted to formally model, simulate and analyze the FaTMA architecture. In section 5, some existing works related to our approach are discussed. Finally, Section 6 summarizes the paper and discusses ongoing work.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing