Previously I have written that it is not possible to improve uptime but only minimize the impact of major incidents (read about the major incident process in IT here). This philosophy results in an improvement in uptime and is not a new idea, the avionics industry has been using it for decades to improve flight safety. It is thus clear that the investigation of accidents in IT will result in IT safety. Every time a plane falls out the sky, no stone is left unturned until the precise reason is known. IT is not as diligent and obviously not as safety conscious. However, safety in IT is more than measuring the power availability to a server in an arbitrary data centre using the "how many 9s" technique!
Read the full article on LinkedIn's Pulse here.