Skip to main content

Check_MK - comprehensive IT monitoring solution in the tradition of Nagios


Components of the Check_MK Project

Configuration & Check Engine

What makes Check_MK CCE new is its elegant method for configuring Nagios. Instead of the normal Nagios configuration data, an automatic service recognition and configuration generator is activated. Likewise as Checks are operating Check_MK functions in its own efficient way. Each Host ist contacted only once per Check Interval. Check-Results are sent to Nagios as passive Checks. This saves substantial resources on the server and client. More...

Livestatus

Livestatus is a Nagios-Broker-Module that provides a direct connection to Status Data on Hosts and Services via a UNIX-Socket. This enables Addons such as NagVis to have quick and efficient access to Status Data and makes the NDO-Database unnecessary. The access is made via its own simple protocol and is possible from all programming languages with no need for a special library - even from the Shell. Livestatus also supports Icinga-Core. More...

Multisite

The Web-GUI Multisite replaces the classical Nagios-GUI and is also usable without Check_MK's Configuration & Check Engine. Longside a modern-looking and rapid page loading it offers a user-definable interface, distributed monitoring by integrating multiple Monitoring-entities via Livestatus, integration of NagVis and PNP4Nagios, an integrated LDAP-connection, an access to Status Data via Webservice and much more. Multisite utilises Livestatus for access to the Status Data. More...

WATO

Check_MK's Web Administration Tool makes the complete administration of a Check_MK-based system possible over a Browser. This is not restricted to the management of Hosts and Services and the typical Check_MK-rules, but also includes the management of users, roles, groups, time periods, classical Nagios-Checks and much more. With a modern roles concept authorisations can be assigned so that tasks can be reliably given to colleagues. More...

Notify

Check_MK's new Notifications System makes the configuration of notifications simple and flexible. Multiple channels can be defined and differently configured per user. In this way for example, a full day's emails, but SMS only for serious problems during oncall hours can be generated - without needing to define multiple artificial users. The users can additionally configure their notifications themselves. More...

Business Intelligence

The BI-Module is integrated in the Multisite-GUI. It aggregates Status Data from numerous hosts and services to provide a complete status of complex applications and similar processes. This provides a quick overview for managers and user helpdesks. Likewise questions about how concrete problems are affecting applications can be quickly answered. The integrated "What if?"-Analysis simplifies the downtime planning. More...

Mobile

The Mobile-Version of the Multisite-GUI is optimised for Smartphones and enables access to all Status Data while underway. Likewise commands such as Acknowledge and Set for Downtimes can be executed. The Mobile-GUI is automatically available when Multisite is installed. Mobile devices are automatically recognised.

Event Console

The Check_MK Event Console integrates the processing of log messages and SNMP-Traps into the monitoring. Its own Daemon - the mkeventd - is configured through a flexible Rule Set and determines which and how incoming messages are classified. In this way messages can be counted, correlated, anticipated, transcribed and much more. The Event Console even utilises an inbuilt Syslog-Daemon that receives messages directly from Port 514. More...

Download the tool here checkout a more comprehensive list of tools here.

Comments

Popular posts from this blog

Why Madge Networks, the token-ring company, went titsup

There I was shooting the breeze with an old mate. The conversation turned to why Madge Networks which I wrote about here went titsup. My analysis is that Madge Networks had a solution and decided to go out and find a problem. They deferred to more incorrect strategic technology choices. The truth of the matter is that when something goes titsup, its not because of one reason only, but a myriad of them all contributing to the negative consequence. There are the immediate or visual ones, which are underpinned by intermediate ones and finally after digging right down, there are the root causes. There is never a singular root cause for anything but I'll present my opinion and encourage everyone else to chip in. All of them together are more likely the reason the company went titsup. As far as technology brainfarts go there is no better example than Kodak . They invented the digital camera that killed them. However, they were so focused on milking people in their leg

Flawed "ITIL aligned"​ Incident Management

Many "ITIL aligned" service desk tools have flawed incident management. The reason is that incidents are logged with a time association and some related fields to type in some gobbledygook. The expanded incident life cycle is not enforced and as a result trending and problem management is not possible. Here is a fictitious log of an incident at PFS, a financial services company, which uses CGTSD, an “ITIL-aligned” service desk tool. Here is the log of an incident record from this system: Monday, 12 August: 09:03am (Bob, the service desk guy): Alice (customer in retail banking) phoned in. Logged an issue. Unable to assist over the phone (there goes our FCR), will escalate to second line. 09:04am (Bob, the service desk guy): Escalate the incident to Charles in second line support. 09:05am (Charles, technical support): Open incident. 09:05am (Charles, technical support): Delayed incident by 1 day. Tuesday, 13 August: 10:11am (Charles, technical support): Phoned Alice.

A checklist for troubleshooting network problems (22 things to catch)

  Assumptions! What is really wrong? Is it the network that is being blamed for something else? Fully describe and detail the issue . The mere act of writing it down, often clarifies matters. Kick the tyres and do a visual inspection. With Smartphones being readily available, take pictures. I once went to a factory where there was a problem. Upon inspection, the network equipment was covered in pigeon pooh! The chassis had rusted and the PCB boards were being affected by the stuff. No wonder there was a problem. In another example, which involved radio links. It is difficult with radio links to remotely troubleshoot alignment errors. (I can recall when a heavy storm blew some radio links out of alignment. Until we climbed onto the roof we never realised how strong the wind really was that day!) Cabling. Is the cable actually plugged in? Is it plugged into the correct location. Wear and tear on cabling can also not b