This is the blog of Ronald Bartels that wanders on and off the subject of problem management (that is how it started), but it is best described by Ray who says this is Daddy's thoughts! Like the best music is from the Eighties and a wee dram helps in solving most inconveniences.
Why do computers stop and what can be done about it?
An analysis of the failure statistics of a commercially available fault-tolerant system
shows that administration and software are the major contributors to failure. Various
approaches to software fault-tolerance are then discussed -- notably process-pairs,
transactions and reliable storage. It is pointed out that faults in production software are
often soft (transient) and that a transaction mechanism combined with persistent process-
pairs provides fault-tolerant execution -- the key to software fault-tolerance.