Default check_mk checks - How to remove/disable them?
Posted: Wed Jan 27, 2016 5:58 pm
I'm completely lost and unsure as to how this happened:
I'm building up my own checks in my check_mk file. The checks works on my hosts. The appear fine on the web interface.
I'm have nagios working (setup, email notifications etc) fine, so I figured it is ok now to add the rest of my environment to nagios and continue building my checks.
I write a small ansible playbook that installs the check_mk rpm, install xinetd, and a few other tasks to the rest of the host in my environment. It works fine. I start adding the hosts to my check_mk file. All of the host in my environment show up in nagios after ironing out some network issues.
And then I see an odd nagios email alert from a server about a check I didn't add to my check_mk.
Now all of the servers that I've added are using what resembles the default nagios checks. I can't figure out why my check_mk isn't taking precedence.
This is basically what my check_mk file looks like, i've posted something small just to get my point across:
My web interface would displayed exactly what is there. But instead, it posts this:
CPU load - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 15 min load 0.62 at 32 Cores (0.02 per Core)
CPU utilization - OK 01-27-2016 22:46:12 0d 3h 36m 1s 1/1 OK - user: 2.2%, system: 1.0%, wait: 0.2%, steal: 0.0%, total: 3.4%
Check_MK - OK 01-27-2016 22:46:12 1d 11h 16m 11s 1/1 OK - Agent version 1.2.7i3p4, execution time 0.5 sec
Disk IO SUMMARY - OK 01-27-2016 22:46:12 0d 3h 36m 1s 1/1 OK - Utilization: 2.2%, Read: 7.46 kB/s, Write: 36.26 kB/s, Average Wait: 4.74 ms , Average Read Wait: 11.86 ms , Average Write Wait: 4.32 ms , Latency: 3.02 ms
Interface 2 - OK 01-27-2016 22:46:12 0d 3h 36m 1s 1/1 OK - [eth0] (up) MAC: 00:25:90:87:c6:64, 1 Gbit/s, in: 956.81 kB/s(0.8%), out: 40.68 kB/s(0.0%)
Kernel Context Switches - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 6659/s
Kernel Major Page Faults - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 0/s
Kernel Process Creations - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 6/s
MD Softraid md0 - CRITICAL 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 CRIT - raid state is 'inactive' (should be 'active')
MD Softraid md125 - CRITICAL 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 CRIT - raid state is 'inactive' (should be 'active')
MD Softraid md126 - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - raid active, disk state is [3/3] [UUU]
It's especially odd because on one of the other servers that I had setup on nagios before today, lets call it server03, only shows the checks from the current check_mk file.
Any ideas? Thanks in advance.
I'm building up my own checks in my check_mk file. The checks works on my hosts. The appear fine on the web interface.
I'm have nagios working (setup, email notifications etc) fine, so I figured it is ok now to add the rest of my environment to nagios and continue building my checks.
I write a small ansible playbook that installs the check_mk rpm, install xinetd, and a few other tasks to the rest of the host in my environment. It works fine. I start adding the hosts to my check_mk file. All of the host in my environment show up in nagios after ironing out some network issues.
And then I see an odd nagios email alert from a server about a check I didn't add to my check_mk.
Now all of the servers that I've added are using what resembles the default nagios checks. I can't figure out why my check_mk isn't taking precedence.
This is basically what my check_mk file looks like, i've posted something small just to get my point across:
Code: Select all
all_hosts = [
"server01|linux|home",
"server02|linux"
]
define_hostgroups = True
host_groups = [
( "linux", [ "linux" ], ALL_HOSTS ),
( "data directory", [ "datad" ], ALL_HOSTS )
]
checks = [
( [ "linux" ], ALL_HOSTS, "df", "/", (90.0, 95.0) ),
( [ "linux" ], ALL_HOSTS, "mem.used", None, (95.0, 99.0) ),
( [ "datad" ], ALL_HOSTS, "df", "/data", (90.0, 95.0) )
]
service_contactgroups = [
( "admins", [ "linux" ], ALL_HOSTS, ALL_SERVICES )
]My web interface would displayed exactly what is there. But instead, it posts this:
CPU load - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 15 min load 0.62 at 32 Cores (0.02 per Core)
CPU utilization - OK 01-27-2016 22:46:12 0d 3h 36m 1s 1/1 OK - user: 2.2%, system: 1.0%, wait: 0.2%, steal: 0.0%, total: 3.4%
Check_MK - OK 01-27-2016 22:46:12 1d 11h 16m 11s 1/1 OK - Agent version 1.2.7i3p4, execution time 0.5 sec
Disk IO SUMMARY - OK 01-27-2016 22:46:12 0d 3h 36m 1s 1/1 OK - Utilization: 2.2%, Read: 7.46 kB/s, Write: 36.26 kB/s, Average Wait: 4.74 ms , Average Read Wait: 11.86 ms , Average Write Wait: 4.32 ms , Latency: 3.02 ms
Interface 2 - OK 01-27-2016 22:46:12 0d 3h 36m 1s 1/1 OK - [eth0] (up) MAC: 00:25:90:87:c6:64, 1 Gbit/s, in: 956.81 kB/s(0.8%), out: 40.68 kB/s(0.0%)
Kernel Context Switches - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 6659/s
Kernel Major Page Faults - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 0/s
Kernel Process Creations - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - 6/s
MD Softraid md0 - CRITICAL 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 CRIT - raid state is 'inactive' (should be 'active')
MD Softraid md125 - CRITICAL 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 CRIT - raid state is 'inactive' (should be 'active')
MD Softraid md126 - OK 01-27-2016 22:46:12 0d 0h 41m 7s 1/1 OK - raid active, disk state is [3/3] [UUU]
It's especially odd because on one of the other servers that I had setup on nagios before today, lets call it server03, only shows the checks from the current check_mk file.
Any ideas? Thanks in advance.