Skip to main content
Why do computers stop and what can be done about it?
An analysis of the failure statistics of a commercially available fault-tolerant system
shows that administration and software are the major contributors to failure. Various
approaches to software fault-tolerance are then discussed -- notably process-pairs,
transactions and reliable storage. It is pointed out that faults in production software are
often soft (transient) and that a transaction mechanism combined with persistent process-
pairs provides fault-tolerant execution -- the key to software fault-tolerance.
Read the article over
here or alternatively
here.
Comments
Post a comment