Nagios ping check returning "1" as 'OK'
Posted: Mon Dec 28, 2020 4:39 pm
We have written a custom script for ping checks that will check devices and if pings fail we want the check to return a '1' status so Nagios reads that as a Warning. We do not want to alarm these devices with a Critical status (2). Right now our script returns a '1' value but Nagios is now seeing that as a Warning - it see's it as OK. We can change the script to return a '2' value and Nagios is picking that up as a Critical, but that is not the notification we want.
Here is the script:
Here are examples of return values:
Host is down and ping return value is set to: 2
[-@nagios Ping]$ python pinghost.py 172.20.17.18
PING 172.20.17.18 (172.20.17.18) 56(84) bytes of data.
--- 172.20.17.18 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2030ms
Warning - Failed Ping Host
[-@nagios Ping]$ echo $?
2
Device that is up and running
[-@nagios Ping]$ python pinghost.py 172.20.6.40
PING 172.20.6.40 (172.20.6.40) 56(84) bytes of data.
--- 172.20.6.40 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.698/2.698/2.698/0.000 ms
Okay - Ping Host
[-@nagios Ping]$ echo $?
0
[-@nagios Ping]$
Host is down, Script return value is set to: 1
[-@nagios Ping]$ python pinghost.py 172.20.17.18
PING 172.20.17.18 (172.20.17.18) 56(84) bytes of data.
--- 172.20.17.18 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2000ms
Warning - Failed Ping Host
[-@nagios Ping]$ echo $?
1
[-@nagios Ping]$
Anyone know why Nagios would not be picking up the '1' value to indicate a Warning?
Thanks!
Here is the script:
Code: Select all
# Date Created: 12/21/2017
# Description: This script will check for ping
#
# Usage: pinghost.py <host>
################################################################################
#!/usr/bin/python
import sys
import os
# Used to map to alarm file
sys.path.append(os.path.join(os.path.dirname(os.path.realpath(__file__)), os.pardir))
from alarm import alarm
class Ping(alarm):
"""docstring for XXXXXXX"""
def __init__(self, host):
self.host = host
def get_ping_entry(self):
# response = os.system("ping -c 2 -W 1 " + self.host)
response = os.system("ping -c 1 -w 3 -q " + self.host)
# print(response)
if response == 0:
print("Okay - Ping Host")
sys.exit(0)
else:
print("Warning - Failed Ping Host")
sys.exit(1)
if __name__ == '__main__':
if(len(sys.argv) == 2):
[b]p = Ping(sys.argv[1])[/b] - return value for missing pings.
p.get_ping_entry()
else:
print("Usage: pinghost.py <host>")
Host is down and ping return value is set to: 2
[-@nagios Ping]$ python pinghost.py 172.20.17.18
PING 172.20.17.18 (172.20.17.18) 56(84) bytes of data.
--- 172.20.17.18 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2030ms
Warning - Failed Ping Host
[-@nagios Ping]$ echo $?
2
Device that is up and running
[-@nagios Ping]$ python pinghost.py 172.20.6.40
PING 172.20.6.40 (172.20.6.40) 56(84) bytes of data.
--- 172.20.6.40 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.698/2.698/2.698/0.000 ms
Okay - Ping Host
[-@nagios Ping]$ echo $?
0
[-@nagios Ping]$
Host is down, Script return value is set to: 1
[-@nagios Ping]$ python pinghost.py 172.20.17.18
PING 172.20.17.18 (172.20.17.18) 56(84) bytes of data.
--- 172.20.17.18 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2000ms
Warning - Failed Ping Host
[-@nagios Ping]$ echo $?
1
[-@nagios Ping]$
Anyone know why Nagios would not be picking up the '1' value to indicate a Warning?
Thanks!