Skip to main content

No faults is bad news

In our technological world we strive to be error free. Everything should run as smoothly as clock work, with never an item out of place. When something breaks, the reaction is to mouth off and blame, levelling accusations of incompetence left, right and centre.
But an error free environment is not optimal. You probably think I have finally lost my marbles, but I am making the case that no faults is bad news.
The reason is that a techie who has had no faults or a limited exposure to faults, is not able to deal with the situation when the sh*t really hits the fan. If not exposed to faults, the individual will in most probability run around like a headless chicken and be unable to execute a workaround within an acceptable time period.
Yes, there is a Catch-22! Most companies do not want faults but without faults, techies do not gain experience to deal with major incidents. The answer is to have simulations and induced failures for training. A company needs a regular scheduled continuity testing where link and system failure is induced, the reaction measured and the appropriated counter-measures documented.
As an example, if you owned a BMW X6 and it malfunctioned in Harare, what would be the expected outcome? Alternative;y, if you owned a Volksie and the same happened there would be a totally different outcome, as there would be a larger population base in Harare able to fix a Volksie. In your company you need techies that are able to fix, so what are you doing to be able to achieve that goal?


Popular posts from this blog

easywall - Web interface for easy use of the IPTables firewall on Linux systems written in Python3.

Firewalls are becoming increasingly important in today’s world. Hackers and automated scripts are constantly trying to invade your system and use it for Bitcoin mining, botnets or other things. To prevent these attacks, you can use a firewall on your system. IPTables is the strongest firewall in Linux because it can filter packets in the kernel before they reach the application. Using IPTables is not very easy for Linux beginners. We have created easywall - the simple IPTables web interface . The focus of the software is on easy installation and use. Access this neat software over on github: easywall

Why Madge Networks, the token-ring company, went titsup

There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

Flawed "ITIL aligned"​ Incident Management

Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.