Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Fail-Stop Failure

Handbook of Research on Scalable Computing Technologies
Fail-stop failure is a type of failures that cause the component of a system experiencing this type of failure stops operating.
Published in Chapter:
Scalable Fault Tolerance for Large-Scale Parallel and Distributed Computing
Zizhong Chen (Colorado School of Mines, USA)
Copyright: © 2010 |Pages: 24
DOI: 10.4018/978-1-60566-661-7.ch033
Abstract
Today’s long running scientific applications typically tolerate failures by checkpoint/restart in which all process states of an application are saved into stable storage periodically. However, as the number of processors in a system increases, the amount of data that need to be saved into stable storage also increases linearly. Therefore, the classical checkpoint/restart approach has a potential scalability problem for large parallel systems. In this chapter, we introduce some scalable techniques to tolerate a small number of process failures in large parallel and distributed computing. We present several encoding strategies for diskless checkpointing to improve the scalability of the technique. We introduce the algorithm-based checkpoint-free fault tolerance technique to tolerate fail-stop failures without checkpoint or rollback recovery. Coding approaches and floating-point erasure correcting codes are also introduced to help applications to survive multiple simultaneous process failures. The introduced techniques are scalable in the sense that the overhead to survive k failures in p processes does not increase as the number of processes p increases. Experimental results demonstrate that the introduced techniques are highly scalable.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR