check_live for Nagios 4
check_live for Nagios 4
I recently replaced my production Nagios server due to a HW issue. In doing so, I upgraded to Nagios 4 on centOS 7 from 3 on centOS 6.5. When porting over our configs and checks, the one plugin I had an issue with was check_live. I came across a bug report for RHEL that said Nagios 4 does not support livestatus which check_live uses. Does anyone know of a work around, or an alternative to check_live on Nagios 4? Or has anyone else encountered this and is willing to share the steps they did to correct it?
Re: check_live for Nagios 4
Can you share the full contents of the check_live script or the source you acquired this plugin from?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: check_live for Nagios 4
Thank you for the reply. Here is the code from check_live.py
This check calls on livestatus, which from what I have read is no longer supported. I came across an op5 workaround, however that GitHub page is no longer available.
An example of how we use this check:
It's used in our clusters such as hadoop where we have hundreds of nodes and only need to be alerted when a certain percentage are down. We also use it to check the percentage of snapshot age and overflow bytes:
If there is a fix for livestatus in Nagios 4 that would be awesome however, if there is another plugin that would yield the same results that would be great too.
Code: Select all
#!/usr/bin/python
import socket
import sys
import argparse
parser = argparse.ArgumentParser(description='check service status for a hostgroup')
parser.add_argument('-H', '--hostgroup', help='Host group to check')
parser.add_argument('-S', '--service', required=True, help='Service to check')
parser.add_argument('-W', '--warn', required=True, help='Number of WARN services to alert on')
parser.add_argument('-C', '--critical', required=True, help='Number of CRIT services to alert on')
parser.add_argument('--socket', default='/var/spool/nagios/rw/live', help='livestatus socket location, defaults to /var/spool/nagios/rw/live')
args = parser.parse_args()
if '%' in args.warn or '%' in args.critical:
percent = True
else:
percent = False
socket_path = args.socket
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect(socket_path)
filter = ''
if args.hostgroup is not None:
filter += 'Filter: host_groups >= %s\n' % (args.hostgroup)
filter += 'Filter: description = %s\n' % (args.service)
if percent:
query = '''GET services
Stats: last_hard_state = 0
Stats: last_hard_state = 1
Stats: last_hard_state = 2
Stats: last_hard_state = 3
Stats: state >= 0
%s
'''% (filter)
else:
query = '''GET services
Stats: last_hard_state = 0
Stats: last_hard_state = 1
Stats: last_hard_state = 2
Stats: last_hard_state = 3
%s
'''% (filter)
s.send(query)
s.shutdown(socket.SHUT_WR)
answer = s.recv(100000000).strip().split(';')
print "%s OK %s WARN %s CRIT %s on %s" % (answer[0], answer[1], answer[2], args.service, args.hostgroup)
if percent:
if int(answer[2]) >= int(float(args.critical.strip('%'))/100*int(answer[4])):
sys.exit(2)
elif int(answer[1]) >= int(float(args.warn.strip('%'))/100*int(answer[4])):
sys.exit(1)
else:
sys.exit(0)
else:
if int(answer[2]) >= int(args.critical):
sys.exit(2)
elif int(answer[1]) >= int(args.warn):
sys.exit(1)
else:
sys.exit(0)
An example of how we use this check:
Code: Select all
define service{
use pager-service
host_name bumble
service_description CDH Tasktrackers
check_command check_live!-H cdh-cluster -S "HTTP - Tasktracker" -W 7% -C 7%
}
define service{
use pager-service
host_name bumble
service_description CDH Datanodes
check_command check_live!-H cdh-cluster -S "HTTP - Datanode" -W 7% -C 7%
}
Code: Select all
define service{
use pager-service
host_name bumble
service_description IAD TS Snapshot Age
check_command check_live!-H ts-ext -S "JMX - SnapshotFileLastModified" -W 50% -C 50%
}
Re: check_live for Nagios 4
You should be able to configure livestatus for nagios 4 as of version 1.2.6:
https://mathias-kettner.de/checkmk_livestatus.html
Refer to the "Livestatus now supports Nagios 4" section of this document. 1.2.8 has dependencies that fail on my CentOS 7 machine for some reason, but I was able to get it working with 1.2.6 following the instructions on that page. The paths in the Python script may need to be updated for the latest spool file locations; not entirely sure on that one.
https://mathias-kettner.de/checkmk_livestatus.html
Refer to the "Livestatus now supports Nagios 4" section of this document. 1.2.8 has dependencies that fail on my CentOS 7 machine for some reason, but I was able to get it working with 1.2.6 following the instructions on that page. The paths in the Python script may need to be updated for the latest spool file locations; not entirely sure on that one.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/