Skip to main content

Kepner-Tregoe: Houston, we have a problem!

The Apollo 13 team is famous for bringing back the astronauts stranded in space by solving difficult and complex problems. The teams solving the problems using the Kepner-Tregoe (KT) methodology and some pointers and resources are displayed on this blog. KT is one of the recommended ITSM/ITIL problem solving techniques.
Here is a presentation about problem solving in IT which includes Kepner-Tregoe and here is a presentation to the ITSMF.
Sun Global Resolution Troubleshooting (SGRT) blog is here and uses a modified Kepner-Tregoe.

The Fallacy of People Problems, and How to Solve Them is an article which points out that there are a number of causes to be investigated, usually human related.

The KT template is available here.

This paper talks about using KT in project management.

The New Rational Manager is a book published about the KT technique. Read this article of making rational decesions.
Here is a summary of the rational process:
Assess Situations
  • Identify concerns by listing them
  • Separate the level of concern
  • Set the priority level to measure seriousness, urgency and growth potential
  • Decide what action to take next
  • Plan for who is involved, what they will be doing, where they will be involved and the extent of involvement
Make Decisions
  • Identify what is being decided
  • Establish and classify objectives
  • Separate the objectives into must and want categories
  • Generate the alternatives
  • Evaluate the alternatives by scoring the wants against the main objective
  • Review adverse consequences
  • Make the best possible choice
See the Upcoming Potential Opportunity
  • State the action
  • List the potential opportunities
  • Consider the possible solutions
  • Take the action to address the likely cause/solution
  • Prepare actions to enhance likely effects
  • Set triggers for capitalizing actions
Uncover and Handle Problems
  • State the problem
  • Specify the problem by asking what is and what is not
  • Develop possible causes
  • Test possible causes
  • Determine the most probable cause
  • Verify any assumptions
  • Try the best possible solution and monitor
Foresee Future Problems
  • State the action
  • List the potential problems
  • Consider the potential problem causes
  • Take the action to address the likely causes
  • Prepare actions to reduce likely effects
  • Set triggers for contingent actions


Popular posts from this blog

Why Madge Networks, the token-ring company, went titsup

There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

Flawed "ITIL aligned"​ Incident Management

Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.

Updated: Articles by Ron Bartels published on iot for all

  These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all .  These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all . Improving Data Center Reliability With IoT Reliability and availability are essential to data centers. IoT can enable better issue tracking and data collection, leading to greater stability. Doing the Work Right in Data Centers With Checklists Data centers are complex. Modern economies rely upon their continuous operation. IoT solutions paired with this data center checklist can help! IoT Optimi