Skip to main content

Major incident tsunami


A major incident tsunami, is when a single or a group of Information Technology major incidents occur in a single wave during a very short time period. A tsunami is a single large wave often caused by earthquakes with deep sea epicentres. As with a real tsunami, the effects of an IT major incident tsunami can be devastating.
Examples of major incident tsunamis are:
  • worm or virus outbreak like the Nimda virus.
  • patching bug that disables a large proportion of a corporate's desktop.
  • power failure in the data centre where the backup systems fail to operate.
  • when a redundant system experiences a failure to both redundant components.
  • security certificates expiring on a large number of networked devices , resulting in them all becoming inoperable.
  • a migration process fault where a flood of calls hits a call centre simultaneously.
When a major incident tsunami hits, the internal support capabilities of an organization are overwhelmed. The best method to handle these occureneces is to bring in IT firefighters on short term contracts, to deal with the issue. These IT firefighters will be skilled in dealing with major incident tsunamis and be experienced and skilled in the processes of resolving them.
These modern day IT firefighters are similar to the oil firefighters as represented by the legendary, Red Adair.

It is my prediction that during the next five years, the rate of major incident tsunamis will increase. Eventually, each of these incidents could be graded in a manner that earthquakes are graded via the Richter scale. It will eventually be very easy to identify a major incident tsunami via this scale as all incidents above a certain value will be graded as tsunamis. I suggest using the name, the Hopper Scale, after Grace Hopper, the first bug detector. Any ideas on how to calculate this scale?


Comments

Popular posts from this blog

Why Madge Networks, the token-ring company, went titsup

There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

Flawed "ITIL aligned"​ Incident Management

Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.

Updated: Articles by Ron Bartels published on iot for all

  These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all .  These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all . Improving Data Center Reliability With IoT Reliability and availability are essential to data centers. IoT can enable better issue tracking and data collection, leading to greater stability. Doing the Work Right in Data Centers With Checklists Data centers are complex. Modern economies rely upon their continuous operation. IoT solutions paired with this data center checklist can help! IoT Optimi