This paper describes the problem management process my company uses to investigate, classify, communicate and remediate the causes of service outages. Most outages have multiple addressable root causes; our process links these to the outage for analysis and assignment of multi le remediation actions. Root causes can also be analyzed in dependently, providing powerful trending metrics. The evolution of our problem management system is discussed, along with classification methods and items tracked. This process has proven to be very effective in eliminating repeat outages.
Read the paper here.