Skip to main content

Why Do Internet Services Fail, and What Can Be Done About It?

The number and popularity of large-scale Internet services such as Google, MSN, and Yahoo! have grown significantly in recent years. Such services are poised to increase further in importance as they become the repository for data in ubiquitous computing systems and the platform upon which new global-scale services and applications are built. These services' large scale and need for 24x7 operation have led their designers to incorporate a number of techniques for achieving high availability. Nonetheless, failures still occur.  Although the architects and operators of these services might see such problems as failures on their part, these system failures provide important lessons for the systems community about why large-scale systems fail, and what techniques could prevent failures. In an attempt to answer the question "Why do Internet services fail, and what can be done about it?" we have studied over a hundred post-mortem reports of user-visible failures from three large-scale Internet services. In this paper we
  • identify which service components are most failure-prone and have the highest Time to Repair (TTR), so that service operators and researchers can know what areas most need improvement;
  • discuss in detail several instructive failure case studies;
  • examine the applicability of a number of failure mitigation techniques to the actual failures we studied; and
  • highlight the need for improved operator tools and systems, collection of industry-wide failure data, and creation of service-level benchmarks.
Read the paper here.

Ron - the problem management guy

Comments

Popular posts from this blog

LDWin: Link Discovery for Windows

LDWin supports the following methods of link discovery: CDP - Cisco Discovery Protocol LLDP - Link Layer Discovery Protocol Download LDWin from here.

easywall - Web interface for easy use of the IPTables firewall on Linux systems written in Python3.

Firewalls are becoming increasingly important in today’s world. Hackers and automated scripts are constantly trying to invade your system and use it for Bitcoin mining, botnets or other things. To prevent these attacks, you can use a firewall on your system. IPTables is the strongest firewall in Linux because it can filter packets in the kernel before they reach the application. Using IPTables is not very easy for Linux beginners. We have created easywall - the simple IPTables web interface . The focus of the software is on easy installation and use. Access this neat software over on github: easywall

Using OpenSSL with Ed Harmoush 1/6 Generating Public & Private Keys