SDQ (Service Delivery Quadrants) is my doodle around creating a framework for network and systems management. This network and system management framework uses a combination of the following:
- Two dimensional FCAPS (derived from Welcher)
- ITIL’s expanded incident lifecycle
- Toyota’s TPS problem solving method
FCAPS is a well known set of categories for network and system’s management. Typically products bracket themselves into the various speciality categories. However, most products focus largely at the element level, with very few if any being able to offer service views and nearly none proving a business view. The legacy element view needs to be extended to a network, service and business view. I have remodeled FCAPS to OPACS:
- (O) Outages
- (P) Performance
- (A) Accounting
- (C) Configuration
- (S) Security
The differences between FCAPS and OPACS is:
- Performance and Configuration is swapped. This enables the model to be better logically sequenced.
- Faults is renamed to outages. Outages can be either blackouts, brownouts or whiteouts.
The reason for the focus on outages and not faults is that most availability is measured using the five nines methodology. This is incorrect as 5h1t happens! The measurement should concentrate on the outage, as this is where the improvement lies. You'll have better availability by reducing the length of outages instead of exclusively trying to improve the length between outages. At best you can reduce the rate of outages over time.
Business view is typically provided by end user experience monitoring and spans the categories of performance, accounting, configuration and faults. A good example of this type of view is facilitated by Cisco’s SAA (IPSLA). However, IPSLA is a cumbersome and non-automated feature to operate.
Another of Cisco’s box of tricks worthy of mention is Netflow. Netflow operates in the accounting category. Many product sets are still immature in Netflow visualizations.
Outages or issues are a given in complex networks so strong incident handling is required. This is outlined via ITIL’s expanded incident lifecycle. Vendors and service providers can provide:
- Committed detection times via monitoring (in most cases the customer tells you!)
- Committed repair time as this is a logistics issue.
- Committed restore times as this is a technology and system response metric.
- Committed recover time which is a committed process and restart procedure.
Usually, vendors and service providers are unable to commit to diagnostics times. This becomes the network and systems management sweet spot. OPACS activities need to be optimized to deliver acceptable times to diagnose. However, from a business and services view the most crucial component is the implementation of a workaround, which can be implemented at any stage.