Abstract
Error recovery capability is examined in processing arrays that employ spare nodes for fault tolerance. Spares can provide fault tolerance to high-performance single-package arrays, where it is not feasible to repair faulty subsystems. The cost of such a fault-tolerance solution, redundant hardware that idles until needed, may not be practical. Manufacturers must be offered hardware solutions to fault tolerance that provide useful work at all times. In this paper, new schemes are presented in which idling spares can be utilized to improve error recovery. Without expedient error recovery, computation in environments experiencing frequent errors can be burdened with extra cost in terms of job completion time. Further, in such environments, a job may never be able to reach completion. Spares will aid in the validation and in the selection of recovery points in systems experiencing randomly distributed errors. Successful job completion in environments of error bursts is performed with the aid of a scheme [1] that identifies reliable data when periodic on-line testing is available. Spares will help identify the boundaries of reliable data. We consider these features in mesh arrays that are used in digital signal processing applications. Preliminary simulations highlight the overhead of our schemes in terms of job completion times in environments burdened with transient errors.
| Original language | English |
|---|---|
| Pages (from-to) | 318-326 |
| Number of pages | 9 |
| Journal | IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems |
| State | Published - 1996 |
| Event | Proceedings of the 1996 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, DFT'96 - Boston, MA, USA Duration: Nov 6 1996 → Nov 8 1996 |
Fingerprint
Dive into the research topics of 'Recovery schemes for mesh arrays utilizing dedicated spares'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver