Skip to main content

Service Outage Analysis

An outage analysis is conducted of the service impacted. Two areas are assessed. Each area has a maximum score of 4 and service outage is the score of all areas represented as a percentage.
  • Period - The measurement is based on elapsed time.
  • Consequence - determined by financial means or business perceptions
Measurement scale
Service period classification
  • (4) Critical - App, server, link (network or voice) unavailable for greater than 4 hours or degraded for greater than 1 day – negative business delivery for more than 1 month.
  • (3) Major - App, server, link (network or voice) unavailable for greater than 1 hour or degraded for greater than 4 hours - negative business delivery for more than 1 week.
  • (2) Moderate - App, server, link (network or voice) unavailable for greater than 30 minutes or degraded for greater than 1 hour - negative business delivery for more than 1 day.
  • (1) Minor - App, server, link (network or voice) unavailable greater than 5 minutes or degraded for greater than 30 minutes - negative business delivery for more than 1 hour.
  • (0) Low (default) - App, server, link (network or voice) unavailable for less than 5 minutes or degraded for less than 30 minutes - negative business delivery for less than 1 hour.

Service consequence outage classification
  • (4) Critical - Financial loss, which puts a business unit in a critical position - greater than $10m or substantial loss of credibility or litigation or prosecution or fatality or disability.
  • (3) Major - Financial loss which severely impacts the profitability of a business unit - greater than $1m or serious loss of credibility or sanction or impairment.
  • (2) Moderate - Financial loss which impacts the profitability of the business unit, greater than $100k or embarrassment or reported to regulator or hospitalization.
  • (1) Minor -Financial loss with a visible impact on profitability but no real effect, greater than $10k or some embarrassment or rule or process breaches or medical treatment.
  • (0) Low (default) - Financial loss with no real effect, less than R50k or irritating or no legal or regulatory issue or no medical treatment.
Example
  • The period is rated as 3 - Major - App, server, link (network or voice) unavailable for greater than 1 hour or degraded for greater than 4 hours.
  • The consequence is rated as 2 - Moderate - Financial loss which impacts the profitability of the business unit, greater than $100k or embarrassment or reported to regulator or hospitalization.
  • The score is thus 5 out of a max of 8 = 63%
Read about the Major Incident Process here.

Comments

Popular posts from this blog

Why Madge Networks, the token-ring company, went titsup

There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

Flawed "ITIL aligned"​ Incident Management

Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.

The best social media requires no batteries

  Today it is all about social media such as whatsapp, facebook, twitter or even LinkedIn. However, the best social media is Craic. No, it is not to be confused with substance abuse. Let me explain. Often when people meet around a braai , dinner table, or share either a pot of beer, bottle of wine, a cup of tea or a mug of coffee a conversation is likely to happen. This conversation is invariably about things and is referred to as Craic. And it is best reinforced with a good bottle of whisky (typically an Irish one, which would be known as a whiskey). Now talking about why some people call it whisky, and other whiskey is good Craic. Craic is often a discussion about things that spark a debate or lead to an extensive and prolonged engagement. Things in our world are objects that exist or have existed for a long time period. We typically assume that things in our modern world have only been around a short time period but invariably many