Skip to main content

SDQ - Service Delivery Quadrants

SDQ (Service Delivery Quadrants) is my doodle around creating a framework for network and systems management.  This network and system management framework uses a combination of the following:
  • Two dimensional FCAPS (derived from Welcher)
  • ITIL’s expanded incident lifecycle
  • Toyota’s TPS problem solving method
FCAPS is a well known set of categories for network and system’s management. Typically products bracket themselves into the various speciality categories. However, most products focus largely at the element level, with very few if any being able to offer service views and nearly none proving a business view. The legacy element view needs to be extended to a network, service and business view. I have remodeled FCAPS to OPACS:
  • (O) Outages
  • (P) Performance
  • (A) Accounting
  • (C) Configuration
  • (S) Security
The differences between FCAPS and OPACS is:
  • Performance and Configuration is swapped. This enables the model to be better logically sequenced.
  • Faults is renamed to outages. Outages can be either blackouts, brownouts or whiteouts.
The reason for the focus on outages and not faults is that most availability is measured using the five nines methodology.  This is incorrect as 5h1t happens!  The measurement should concentrate on the outage, as this is where the improvement lies.  You'll have better availability by reducing the length of outages instead of exclusively trying to improve the length between outages.  At best you can reduce the rate of outages over time.
    Business view is typically provided by end user experience monitoring and spans the categories of performance, accounting, configuration and faults. A good example of this type of view is facilitated by Cisco’s SAA (IPSLA). However, IPSLA is a cumbersome and non-automated feature to operate.

    Another of Cisco’s box of tricks worthy of mention is Netflow. Netflow operates in the accounting category. Many product sets are still immature in Netflow visualizations.
    Outages or issues are a given in complex networks so strong incident handling is required. This is outlined via ITIL’s expanded incident lifecycle. Vendors and service providers can provide:
    • Committed detection times via monitoring (in most cases the customer tells you!)
    • Committed repair time as this is a logistics issue.
    • Committed restore times as this is a technology and system response metric.
    • Committed recover time which is a committed process and restart procedure.
    Usually, vendors and service providers are unable to commit to diagnostics times. This becomes the network and systems management sweet spot. OPACS activities need to be optimized to deliver acceptable times to diagnose. However, from a business and services view the most crucial component is the implementation of a workaround, which can be implemented at any stage.
    No network or systems management system is worth anything unless it provides due diligence in the form of documentation and knowledge.


    Popular posts from this blog

    Why Madge Networks, the token-ring company, went titsup

    There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

    Flawed "ITIL aligned"​ Incident Management

    Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.

    Updated: Articles by Ron Bartels published on iot for all

      These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all .  These are articles that I published during the course of the past year on one of the popular international Internet of Things publishing sites, iot for all . Improving Data Center Reliability With IoT Reliability and availability are essential to data centers. IoT can enable better issue tracking and data collection, leading to greater stability. Doing the Work Right in Data Centers With Checklists Data centers are complex. Modern economies rely upon their continuous operation. IoT solutions paired with this data center checklist can help! IoT Optimi