Skip to main content

An integrated approach to working with problems

Problems are latently present in all work environments. I have previously written about dealing with major incidents here, which have typically resulted because of underlying problems. In this post I'll focus on outlining an integrated approach to working with problems. It entails a number of process and methodologies. The items detailed below are important considerations to have for an integrated approach to working with problems. This approach is a mature, analytical and measured method to have excellence achieved within an organization.

You need to write the story of the problem! You don't have to be a Kipling but the purpose of documenting the problem provides structure to your thinking. Additionally, it often helps to explain it to someone else because in the explaining, the problem is clarified and sometimes even solved. And if no one is available you can always talk to a rubber duck as written about here.

Dashboards and metrics

Firstly you require dashboards that provide you metrics of the operating status of your work environment. Without metrics you will never be able to determine that a problem is occurring. Dashboards can be simplistic, like those found in motor vehicles. A metric in a motor vehicle is fuel. You use the fuel gage to determine when is the most appropriate time to refuel. In a work environment staring at your phone waiting for it to ring is not a dashboard. That is reactive. Dashboards provide proactive and often visual feedback. An extreme example is NASA's mission control (view the video below). You are not required to implement a NASA type Mission Control but choose an appropriate dashboard that will provide visual feedback of your business. As an example in the emergency section of a hospital, a monitor displaying the number of patients in each type of emergency category and the waiting time for each is extremely useful and provides confidence to both staff and patients.

Crime scenes (the locations of the problem)

The location where the problem has occurred needs to be investigated. It is preferable to secure the area and gather all evidence and log it, just like a crime scene. The method to start when solving a murder is no different than the method you start when solving a problem in a work related incident. In reality, it is a key part of production and manufacturing. Let me explain.

Taiichi Ohno, who refined the production systems at Toyota Production System, would take new managers and engineers to the factory and drawing a chalk circle on the floor. The subordinate would be told to stand in the circle and to observe and note down what he saw. When Ohno returned he would check; if the person in the circle had not seen enough he would be asked to keep observing. Ohno was trying to imprint upon his future managers and engineers that the only way to truly understand what happens in the factory was to go there. It was here that value was added and here that waste could be observed. This was known as Genchi Genbutsu and is a primary method to start solving problems. If the problem exists in the factory then it needs to be understood and solved in the factory and not on the top floors of some office block or city skyscraper.


Kipling does provide good advice for what to put in your notepad:

I KEEP six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.
I send them over land and sea,
I send them east and west;
But after they have worked for me,
I give them all a rest.

Prevailing conditions and business impact

Take note of the prevailing conditions. These might be economic or even weather related. It is a big assumption to discount prevailing conditions. It is also important to take a snapshot of the prevailing conditions at the time of the problem. If the problem remains unresolved and it happens again, a comparison of prevailing conditions might provide significant incite.

Additionally, if it is a technical problem it is important to determine and measure the business impact. How has the problem affected the organisation experiencing it not only from an internal resource perspective but also an external interaction and commercial point of view.

On the morning of Monday, 29th August 2005 hurricane Katrina hit the Gulf coast of the US. New Orleans, Louisiana suffered the main brunt of the hurricane but the major damage and loss of life occurred when the levee system catastrophically failed. Floodwaters surged into 80% of the city and lingered for weeks. At least 1,836 people lost their lives in the hurricane and resulting floods making it the largest natural disaster in the history of the United States. On July 31, 2006 the Independent Levee Investigation Team released a report on the Greater New Orleans area levee failures. Their report

identified flaws in design, construction and maintenance of the levees. But underlying it all, the report stated, were the problems with the initial model used to determine how strong the system should be.

The hypothetical model storm upon which storm protection plans were based is called the Standard Project Hurricane or SPH. The model storm was simplistic, and led to an inadequate network of levees, flood walls, storm gates and pumps. The report also found that

t he creators of the standard project hurricane, in an attempt to find a representative storm, actually excluded the fiercest storms from the database.”

When the probability of an occurrence is great, it is incorrect to assume that it will only happen way into the future. It is best to realize that major events can happen anytime within the probability period and not at the end of the probability period. Work on prevailing conditions not perceived projected conditions.


Something that is important especially when the crisis is significant is to realise that you need to be skilled in fighting fires. Meaning, the problem might require an immediate workaround to maintain service. As such you might not be solving the problem but on a temporary basis alleviating any further negative consequences.

The Devil's Cigarette Lighter was a natural gas well fire at Gassi Touil in the Sahara Desert of. Ignited when a pipe ruptured on 6th November, 1961, the well produced natural gas whose flame rose up to 240 metres. The flame was seen from orbit by John Glenn during the flight a Mercury spacecraft on 20th February, 1962. The blowout and fire were estimated to have consumed enough gas to supply Paris for three months. After burning almost six months, the fire was extinguished by well fire expert Red Adair, who used explosives to deprive the flame of oxygen. Red Adair is the ultimate firefighter and it highlights the fact that part of dealing with problems is to extinguish the flames. His example is an extreme case, but it is one thing resolving the consequences and cause of a problem, it is another dealing with the immediate effects!


It is one thing collecting data of a problem and recording it, but a totally different skill is required to interpret it. Here you look at visual representations by graphing the data in an appropriate fashion. As an example, bar graphs are often referred to as Manhattan graphs. Just as with the Manhattan skyline where the large buildings are prominent, so too is those significant bits of data that is represent in a graph. Convert the data to a visual representation and this will aid in the process of solving the problems. View the video below to see the importance of processing visual information:

Lessons learnt

In February 1945, a force of around 70,000 US Marines invaded Iwo Jima, an significant volcanic island 840 kilometres south of Tokyo. The island was defended by over 22,000 Japanese with the Americans expecting the island to fall within five days. Instead the battle lasted more than seven times longer with 6,800 US. fatalities, 20,000 US wounded, and the death of 20,700 defenders.

The significance of this action was that the Marines resolved to aggregate all the lessons learned from the non-optimal engagement and action in invading Iwo Jima and channelled these lessons into future conflicts.

Some problems have been solved before and it is wasted resource to resolve these from primary sources and effort repetitively. In a work environment, different people might work on resolving different problems at different times. If the successful resolution of problems is pooled into a knowledge base then future problems will be dealt with in a dramatically more optimal fashion. Thus it is important to not only populate a lessons learnt knowledge base when a problem is solved but also to reference it when dealing with a problem to find a potential resolution or even insight into how to deal with the current problem.


When working with problems time is the most crucial attribute to record. Refer here to the expanded incident life-cycle. The time an event happens, the time between events provide the most significant clues into a problems source. As an example, it is important to known when the event occurred as opposed to be it was detected. The two might not necessarily have occurred at the same time and could in itself be a problem.

The best example of how time solved a problem is illustrated by that of Harrison, a carpenter. Time solved the problem of determining longitude and hence your exact position on Earth.

Longitude a geographic coordinate that specifies the east-west position of a point on the Earth's surface and is best determined using time measurements. Galileo Galilei proposed that with accurate knowledge of the orbits of the moons of Jupiter one could use their positions as a universal clock to determine of longitude, but this was practically difficult especially at sea. An English clockmaker, John Harrison, invented the marine chronometer, helping solve the problem of accurately establishing longitude at sea, thus revolutionising safe long distance travel. Harrison’s watches were rediscovered after the First World War, restored and given the designations H1 to H5 by Rupert T. Gould. Harrison completed the manufacturing of H4 in 1759.


It is crucial to be able mitigate the risk associated with problems and thus an established risk analysis methodology needs to be adopted and utilized. How will we know if the problem is required to be solved or not? Ho will we know which problems need to be worked on and prioritized over others? Read here about a proposed rapid risk assessment methodology.

Meerkats have a sentinel or lookout role performed by non-breeding members of the community. They watch for possible predators and other potential threats to the community. This behaviour is also called the raised guarding position. This position rotates amongst different members of the group in no particular order or structure. Sentinels are usually around when the group is foraging away from the burrow. The meerkat on the lookout will sound an alarm by producing a distinct bark. This allows the offspring to escape inside the burrows and under protection of adults.Meerkats are aware that life is full of risks, like cobra's and eagles and thus plan to mitigate those risks. In the workplace a person cannot be ignorant about the risks associated with problems occurring. Evaluate what you have done to mitigate those risks!

Escalation and grading

When working on problems there needs to be a communications channel to which escalations of the status and resolution occurs. It is beneficial if this communications is handled separately by resources who aren't directly involved in resolving the problem as it will make more optimal utilization of resources.

Just as important, once the problem is resolved the resources involved in the problem need to be reviewed and graded. Without this you will not assign resources optimally to problem solving. But don't be like Dilbert:

The article was originally published over on LinkedIn: An integrated approach to working with problems


Popular posts from this blog

Why Madge Networks, the token-ring company, went titsup

There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

Flawed "ITIL aligned"​ Incident Management

Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.

Updated: Articles by Ron Bartels published on iot for all

  These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all .  These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all . Improving Data Center Reliability With IoT Reliability and availability are essential to data centers. IoT can enable better issue tracking and data collection, leading to greater stability. Doing the Work Right in Data Centers With Checklists Data centers are complex. Modern economies rely upon their continuous operation. IoT solutions paired with this data center checklist can help! IoT Optimi