Page 1 of 1
check_live for Nagios 4
Posted: Mon Sep 19, 2016 9:48 am
by disconnex
I recently replaced my production Nagios server due to a HW issue. In doing so, I upgraded to Nagios 4 on centOS 7 from 3 on centOS 6.5. When porting over our configs and checks, the one plugin I had an issue with was check_live. I came across a bug report for RHEL that said Nagios 4 does not support livestatus which check_live uses. Does anyone know of a work around, or an alternative to check_live on Nagios 4? Or has anyone else encountered this and is willing to share the steps they did to correct it?
Re: check_live for Nagios 4
Posted: Mon Sep 19, 2016 3:50 pm
by mcapra
Can you share the full contents of the check_live script or the source you acquired this plugin from?
Re: check_live for Nagios 4
Posted: Tue Sep 20, 2016 7:49 am
by disconnex
Thank you for the reply. Here is the code from check_live.py
Code: Select all
#!/usr/bin/python
import socket
import sys
import argparse
parser = argparse.ArgumentParser(description='check service status for a hostgroup')
parser.add_argument('-H', '--hostgroup', help='Host group to check')
parser.add_argument('-S', '--service', required=True, help='Service to check')
parser.add_argument('-W', '--warn', required=True, help='Number of WARN services to alert on')
parser.add_argument('-C', '--critical', required=True, help='Number of CRIT services to alert on')
parser.add_argument('--socket', default='/var/spool/nagios/rw/live', help='livestatus socket location, defaults to /var/spool/nagios/rw/live')
args = parser.parse_args()
if '%' in args.warn or '%' in args.critical:
percent = True
else:
percent = False
socket_path = args.socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect(socket_path)
filter = ''
if args.hostgroup is not None:
filter += 'Filter: host_groups >= %s\n' % (args.hostgroup)
filter += 'Filter: description = %s\n' % (args.service)
if percent:
query = '''GET services
Stats: last_hard_state = 0
Stats: last_hard_state = 1
Stats: last_hard_state = 2
Stats: last_hard_state = 3
Stats: state >= 0
%s
'''% (filter)
else:
query = '''GET services
Stats: last_hard_state = 0
Stats: last_hard_state = 1
Stats: last_hard_state = 2
Stats: last_hard_state = 3
%s
'''% (filter)
s.send(query)
s.shutdown(socket.SHUT_WR)
answer = s.recv(100000000).strip().split(';')
print "%s OK %s WARN %s CRIT %s on %s" % (answer[0], answer[1], answer[2], args.service, args.hostgroup)
if percent:
if int(answer[2]) >= int(float(args.critical.strip('%'))/100*int(answer[4])):
sys.exit(2)
elif int(answer[1]) >= int(float(args.warn.strip('%'))/100*int(answer[4])):
sys.exit(1)
else:
sys.exit(0)
else:
if int(answer[2]) >= int(args.critical):
sys.exit(2)
elif int(answer[1]) >= int(args.warn):
sys.exit(1)
else:
sys.exit(0)
This check calls on livestatus, which from what I have read is no longer supported. I came across an op5 workaround, however that GitHub page is no longer available.
An example of how we use this check:
Code: Select all
define service{
use pager-service
host_name bumble
service_description CDH Tasktrackers
check_command check_live!-H cdh-cluster -S "HTTP - Tasktracker" -W 7% -C 7%
}
define service{
use pager-service
host_name bumble
service_description CDH Datanodes
check_command check_live!-H cdh-cluster -S "HTTP - Datanode" -W 7% -C 7%
}
It's used in our clusters such as hadoop where we have hundreds of nodes and only need to be alerted when a certain percentage are down. We also use it to check the percentage of snapshot age and overflow bytes:
Code: Select all
define service{
use pager-service
host_name bumble
service_description IAD TS Snapshot Age
check_command check_live!-H ts-ext -S "JMX - SnapshotFileLastModified" -W 50% -C 50%
}
If there is a fix for livestatus in Nagios 4 that would be awesome however, if there is another plugin that would yield the same results that would be great too.
Re: check_live for Nagios 4
Posted: Tue Sep 20, 2016 2:37 pm
by mcapra
You should be able to configure livestatus for nagios 4 as of version 1.2.6:
https://mathias-kettner.de/checkmk_livestatus.html
Refer to the "Livestatus now supports Nagios 4" section of this document. 1.2.8 has dependencies that fail on my CentOS 7 machine for some reason, but I was able to get it working with 1.2.6 following the instructions on that page. The paths in the Python script may need to be updated for the latest spool file locations; not entirely sure on that one.