check_live for Nagios 4

This forum is intended for the discussion of Nagios plugin development. Feature requests, patches, bug fixes, and all types of development-related discussions are welcome!

NOTE: The SourceForge.net nagiosplug-devel mailing list has been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

check_live for Nagios 4

Postby disconnex » Mon Sep 19, 2016 9:48 am

I recently replaced my production Nagios server due to a HW issue. In doing so, I upgraded to Nagios 4 on centOS 7 from 3 on centOS 6.5. When porting over our configs and checks, the one plugin I had an issue with was check_live. I came across a bug report for RHEL that said Nagios 4 does not support livestatus which check_live uses. Does anyone know of a work around, or an alternative to check_live on Nagios 4? Or has anyone else encountered this and is willing to share the steps they did to correct it?
disconnex
 
Posts: 2
Joined: Mon Sep 19, 2016 9:41 am

Re: check_live for Nagios 4

Postby mcapra » Mon Sep 19, 2016 3:50 pm

Can you share the full contents of the check_live script or the source you acquired this plugin from?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mcapra
Support Tech
 
Posts: 1397
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises

Re: check_live for Nagios 4

Postby disconnex » Tue Sep 20, 2016 7:49 am

Thank you for the reply. Here is the code from check_live.py

Code: Select all
#!/usr/bin/python

import socket
import sys
import argparse

parser = argparse.ArgumentParser(description='check service status for a hostgroup')
parser.add_argument('-H', '--hostgroup', help='Host group to check')
parser.add_argument('-S', '--service',   required=True, help='Service to check')
parser.add_argument('-W', '--warn',      required=True, help='Number of WARN services to alert on')
parser.add_argument('-C', '--critical',  required=True, help='Number of CRIT services to alert on')
parser.add_argument('--socket', default='/var/spool/nagios/rw/live', help='livestatus socket location, defaults to /var/spool/nagios/rw/live')
args = parser.parse_args()

if '%' in args.warn or '%' in args.critical:
  percent = True
else:
  percent = False

socket_path = args.socket

s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect(socket_path)

filter = ''
if args.hostgroup is not None:
  filter += 'Filter: host_groups >= %s\n' % (args.hostgroup)
filter += 'Filter: description = %s\n' % (args.service)

if percent:
  query = '''GET services
Stats: last_hard_state = 0
Stats: last_hard_state = 1
Stats: last_hard_state = 2
Stats: last_hard_state = 3
Stats: state >= 0
%s
'''% (filter)
else:
  query = '''GET services
Stats: last_hard_state = 0
Stats: last_hard_state = 1
Stats: last_hard_state = 2
Stats: last_hard_state = 3
%s
'''% (filter)

s.send(query)

s.shutdown(socket.SHUT_WR)

answer = s.recv(100000000).strip().split(';')

print "%s OK %s WARN %s CRIT %s on %s" % (answer[0], answer[1], answer[2], args.service, args.hostgroup)
if percent:
  if int(answer[2]) >= int(float(args.critical.strip('%'))/100*int(answer[4])):
    sys.exit(2)
  elif int(answer[1]) >= int(float(args.warn.strip('%'))/100*int(answer[4])):
    sys.exit(1)
  else:
    sys.exit(0)
else:
  if int(answer[2]) >= int(args.critical):
    sys.exit(2)
  elif int(answer[1]) >= int(args.warn):
    sys.exit(1)
  else:
    sys.exit(0)


This check calls on livestatus, which from what I have read is no longer supported. I came across an op5 workaround, however that GitHub page is no longer available.

An example of how we use this check:

Code: Select all
define service{
        use                     pager-service
        host_name               bumble
        service_description     CDH Tasktrackers
        check_command           check_live!-H cdh-cluster -S "HTTP - Tasktracker" -W 7% -C 7%
        }

define service{
        use                     pager-service
        host_name               bumble
        service_description     CDH Datanodes
        check_command           check_live!-H cdh-cluster -S "HTTP - Datanode" -W 7% -C 7%
        }


It's used in our clusters such as hadoop where we have hundreds of nodes and only need to be alerted when a certain percentage are down. We also use it to check the percentage of snapshot age and overflow bytes:

Code: Select all
define service{
        use                     pager-service
        host_name               bumble
        service_description     IAD TS Snapshot Age
        check_command           check_live!-H ts-ext -S "JMX - SnapshotFileLastModified" -W 50% -C 50%
        }


If there is a fix for livestatus in Nagios 4 that would be awesome however, if there is another plugin that would yield the same results that would be great too.
disconnex
 
Posts: 2
Joined: Mon Sep 19, 2016 9:41 am

Re: check_live for Nagios 4

Postby mcapra » Tue Sep 20, 2016 2:37 pm

You should be able to configure livestatus for nagios 4 as of version 1.2.6:
https://mathias-kettner.de/checkmk_livestatus.html

Refer to the "Livestatus now supports Nagios 4" section of this document. 1.2.8 has dependencies that fail on my CentOS 7 machine for some reason, but I was able to get it working with 1.2.6 following the instructions on that page. The paths in the Python script may need to be updated for the latest spool file locations; not entirely sure on that one.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
mcapra
Support Tech
 
Posts: 1397
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises


Return to Nagios Plugin Development

Who is online

Users browsing this forum: No registered users and 1 guest