This is the blog of Ronald Bartels that wanders on and off the subject of problem management (that is how it started), but it is best described by Ray who says this is Daddy's thoughts! Like the best music is from the Eighties and a wee dram helps in solving most inconveniences.
What uptime really means
Previously I have written that it is not possible to improve uptime but
only minimize the impact of major incidents (read about the major
incident process in IT here).
This philosophy results in an improvement in uptime and is not a new
idea, the avionics industry has been using it for decades to improve
flight safety. It is thus clear that the investigation of accidents in
IT will result in IT safety. Every time a plane falls out the sky, no
stone is left unturned until the precise reason is known. IT is not as
diligent and obviously not as safety conscious. However, safety in IT is
more than measuring the power availability to a server in an arbitrary
data centre using the "how many 9s" technique!