Skip to main content

The IF THEN ELSE of SD-WAN reflects on reality versus the lab

The underlying reason behind closed loop automation in Software Defined Wide Area Networking (SD-WAN) and not ignoring ignoring negative events that will think the system.


When I first started programming many moons ago on a ZX81, within the first day I learnt about:

IF logical expression THEN procedure
ELSE procedure
No alt text provided for this image

I would have a procedure for when the data matched and a procedure for when it did not. Most programmers now avoid the latter. They programme the data to match and ignore mismatches. Basically error handling goes out the window. Often a decent programme would double in size when all the error handling routines were added. In those days we started with 16KB memory so space was at a premium but we still implemented it.

SD-WAN is effectively closed loop automation relying on a set of conditions and actions associated with those conditions being matched. Also its crucial to handle mismatches as well as outliers.


Now over to SD-WAN and error handling but let us rather call it exception handling because there are three types that can occur namely loss, faults and errors. It is important to distinguish between them as it is fundamentally the basis of the well-known CIA security framework. This is specified as follows:

  • C = Confidentiality (where the exceptions are a loss);
  • I = Integrity (where the exceptions are errors); and
  • A = Availability (where the exceptions are faults).


Let us start at the first. Loss is usually associated with theft. In our neck of the savannah that is a common occurrence. From the cable on the last mile to the SD-WAN CPE itself. I have previously written about power management and the SD-WAN CPE here using IoT technology. Of course when someone is going to abscond with your CPE, the first thing they are going to do unplug the power. A method to monitor the power as I have previously explained in the aforementioned article is a good start. The next obvious aspect would be if the SD-WAN CPE had an accelerometer (or alternative) it would detect and notify on movement. Of course, the additional attributes of geo-tagging that I wrote about here are also relevant. Additionally, the SFP cage and use of smart SFPs that I wrote about here will also detect and solve the cable break issues related to the last mile that has been provisioned on fibre.


The next one is errors. Errors can be detected using a higher level protocol scheme which is slow or the SD-WAN hardware can read the error counters directly on the hardware and make a determination. As an example, wireless connections might experience BERs and it would be better to deal with these via queuing or alternative paths than dropped packets. Also during congestion, it would be better to handle traffic in a deterministic manner. The problem when broadband is used is that the throughput to use for QoS calculations is not always consistent. Thus the mechanism that I described here is relevant. This facilities the QoS calculations being closer to the actual link ability instead of a perceived one.


Finally faults. In an SD-WAN environment there are both uplinks (connection to the carrier) and downlinks (connection to the client). It is crucial that the SD-WAN portal measures the uptime availability of these connections and reports them as separate metrics. We have talked previous about cable break on the carrier side which will lead to faults but the actual probability of faults is higher on the client side. The fibre cables get bend or unplugged. The equipment itself fails so it is crucial to know the status of the downlink as well as downstream networking kit. Here the use of LLDP as I have described here will be crucial.

In conclusion, when building a SD-WAN CPE, "Fuck Everything, Do Five Blades."

Fusion Broadband South Africa

IF you would like to contribute THEN please comment below ELSE please click the like button.

Ronald works connecting Internet inhabiting things at Fusion Broadband.


Popular posts from this blog

Why Madge Networks, the token-ring company, went titsup

There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

Flawed "ITIL aligned"​ Incident Management

Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.

The best social media requires no batteries

  Today it is all about social media such as whatsapp, facebook, twitter or even LinkedIn. However, the best social media is Craic. No, it is not to be confused with substance abuse. Let me explain. Often when people meet around a braai , dinner table, or share either a pot of beer, bottle of wine, a cup of tea or a mug of coffee a conversation is likely to happen. This conversation is invariably about things and is referred to as Craic. And it is best reinforced with a good bottle of whisky (typically an Irish one, which would be known as a whiskey). Now talking about why some people call it whisky, and other whiskey is good Craic. Craic is often a discussion about things that spark a debate or lead to an extensive and prolonged engagement. Things in our world are objects that exist or have existed for a long time period. We typically assume that things in our modern world have only been around a short time period but invariably many