Skip to main content

Prestik, Scotch tape and barbed wire data centres

 


No alt text provided for this image

Many times I have encountered Prestik, Scotch tape and barbed wire data centres. The reason that they are called this is because it seems these are the only tools the techies have to keep things up and running. And you know it is one, because you can visually recognize the dirt, spaghetti and graffiti around the place. The management denies any request for anything else but then still expects a quality service. The reason seems to be to satisfy budget constraints and operational service delivery in the data centre seems to be low on the list of priorities. The result is obvious, major incidents that will result in large productivity and even financial damage to the companies involved.

The symptoms of a Prestik, Scotch tape and barbed wire data centre are:

  • Limited documentation and non-existent standard operation procedures;
  • No visible emergency procedures;
  • Poor labeling;
  • Continuous change; and
  • Headless chickens.

This is often made worse by the division in the data centre of the responsibilities between the service delivery and operational departments. Invariably the two protagonists end up pointing fingers at each other.

Water leaks are no uncommon in data centres. Most often it is due to air handling units icing or piping perishing or failing. A relocated kitchen with a burst geyser, a ablution facility in an inappropriate location or a burst mains a large distance away but the flood directs itself into the data centre as it is located in the basement. Workable and tested flood detection is a must, and emergency plans must be in place when it happens. If you don't have a sump pump in your storeroom, you definitely won't have time to order one off ebay and have it delivered before you are floating out the data centre like Noah using one of your server cabinets as an ark.

Power outages and rolling black outs are also a common occurrence. It is naive to assume that it will work without testing and even more naive to conduct a test of a few minutes when the reality will be an outage of a few hours. The reason a full test of from four to eight hours is required can be described by this example. Modern air handling units use variable speed which also makes them more efficient. The controllers in these units have inputs from thermostats and air flow which allows the to operate at optimal levels. Air handling units are not connected to UPS but directly to utility power which is backed up by generators. If the air handling units were connected to UPS, they would invariably drain the reserve power too rapidly. In a power failure the air handling units will power cycle as there is a latency before the generators start and can sustain a load. However, due to the nature of the controllers it is required that the air handling unit's controllers which manages the sensors is on UPS. Secondly, these controllers need to be programmed to time delay the air handling unit start when power load is available. This is because if all air handling units start at the same time the load will be too high, and the start will fail. A different time delayed start for each unit will allow each unit to power spike at different intervals preventing the high load all at once. (Much the same way as LM pilot Mattingly was required to achieve in the movie Apollo 13). A modern data centre will starting heating up withing 5 minutes and critical shutdown will be required within 45 minutes. A 5 minute power test will never highlight the kinks in the system, thus a full test is always required.

IT guys like to view network status using management tools and web based tools. Facility guys walk around with clipboards and eyeball the equipment. Both methods have merit but only doing one has none. It always fascinates me why consultants install UPS's at a million bucks and overlook the web or Ethernet module for the unit. I have seen a data centre fail as no-one knew the generator was not charging the UPS and no web module was installed.

When a major incident happens and large amounts of moola are flushed down the toilet, then it is too late to say that the data centre should have been built with more than Prestik, Scotch tape and barbed wire.

No alt text provided for this image
Ronald Bartels works at Fusion Broadband and is driving SD-WAN adoption in South Africa.

This article was originally published over at LinkedIn: Prestik, Scotch tape and barbed wire data centres

Comments

Popular posts from this blog

easywall - Web interface for easy use of the IPTables firewall on Linux systems written in Python3.

Firewalls are becoming increasingly important in today’s world. Hackers and automated scripts are constantly trying to invade your system and use it for Bitcoin mining, botnets or other things. To prevent these attacks, you can use a firewall on your system. IPTables is the strongest firewall in Linux because it can filter packets in the kernel before they reach the application. Using IPTables is not very easy for Linux beginners. We have created easywall - the simple IPTables web interface . The focus of the software is on easy installation and use. Access this neat software over on github: easywall

No Scrubs: The Architecture That Made Unmetered Mitigation Possible

When building a DDoS mitigation service it’s incredibly tempting to think that the solution is scrubbing centers or scrubbing servers. I, too, thought that was a good idea in the beginning, but experience has shown that there are serious pitfalls to this approach. Read the post of at Cloudflare's blog: N o Scrubs: The Architecture That Made Unmetered Mitigation Possible

Should You Buy A UniFi Dream Machine, USG, USG Pro, or Dream Machine Pro?