Page 1 of 1

Check failing to run correctly on new XI server

Posted: Tue Aug 18, 2020 9:02 am
by danniiffxi
OK so on the left is our current production XI server on CentOS 6 on the right is the new build on CentOS 7. it is almost complete but i have this one issue that is bugging me.

As you can see in the screenshot, the service check on the left works fine and takes 16 seconds to execute, on the right is fails after 2 seconds.

Image

Now this is the bit that is confusing me. When I run the check from the CLI of the new server it works perfectly, but fails to work from the GUI

Code: Select all

[root@nagxit02 libexec]# /usr/local/nagios/libexec/check_internet
OK - Internet Bearer is via Primary
The script is a custom script I wrote that basically goes out and periodically checks our internet bearer status over our 10GB link, if our main site fails, the internet will fail over to our secondary site, if both fail it should go critical.

Code: Select all

#!/bin/bash
# set -x
# Check if the Internet Bearer has switched from Primary to Backup
#

# Check which Bearer is being used
# ----------------------------------------------------------
sudo traceroute -I 8.8.8.8 > /tmp/traceroute.txt

# Alert if it's the wrong one
# ----------------------------------------------------------
cat /tmp/traceroute.txt | grep "111.111.111.111" > /dev/null 2>&1
Primary=$?
if [ ${Primary} -eq 0 ]; then
  echo "OK - Internet Bearer is via Primary"
  exit 0;
fi

cat /tmp/traceroute.txt | grep "111.111.111.111" > /dev/null 2>&1
Backup=$?
if [ ${Backup} -eq 0 ]; then
  echo "WARNING - Internet Bearer is on Backup"
  exit 1;
fi

echo "CRITICAL - Internet Bearer is DOWN !!"
cat /tmp/traceroute.txt
exit 2

"/usr/local/nagios/libexec/check_internet" 35L, 887C
Any idea how i can get this to run correctly in the GUI, worth noting that the other 8000+ checks are working fine.

Re: Check failing to run correctly on new XI server

Posted: Tue Aug 18, 2020 9:21 am
by scottwilkerson
What are the permission on this on the server on the right

Code: Select all

ls -l /tmp/traceroute.txt
Can it be read/written by the nagios user?

Re: Check failing to run correctly on new XI server

Posted: Tue Aug 18, 2020 12:23 pm
by danniiffxi
Hi Scott

Both servers were set the same with the following permissions.

this is from the server that works

Code: Select all

[root@nagip01 ~]# ls -l /tmp/traceroute.txt
-rw-r--r-- 1 nagios nagios 641 Aug 18 18:14 /tmp/traceroute.txt
This is the new server

Code: Select all

[root@nagxit02 ~]# ls -l /tmp/traceroute.txt
-rw-r--r-- 1 nagios nagios 0 Aug 18 18:10 /tmp/traceroute.txt

I then did a chmod 777 and run the test again. Unfortunately It still fails in the GUI.

Code: Select all

[root@nagxit02 ~]# chmod 777 /tmp/traceroute.txt
[root@nagxit02 ~]# ls -l /tmp/traceroute.txt
-rwxrwxrwx 1 nagios nagios 0 Aug 18 18:10 /tmp/traceroute.txt
GUI output

Code: Select all

[nagios@nagxit02 ~]$ /usr/local/nagios/libexec/check_internet
CRITICAL - Internet Bearer is DOWN !!
CLI

Code: Select all

[root@nagxit02 ~]# /usr/local/nagios/libexec/check_internet
OK - Internet Bearer is via HQ

Re: Check failing to run correctly on new XI server

Posted: Tue Aug 18, 2020 12:37 pm
by scottwilkerson
I also see this in your script

Code: Select all

sudo traceroute -I 8.8.8.8 > /tmp/traceroute.txt
Does the nagios user have sudoers permissions to do this on the new server?

Re: Check failing to run correctly on new XI server

Posted: Tue Aug 25, 2020 10:04 am
by danniiffxi
Hi Scott

Sorry for the late reply, I have been off for a while. It's all working now, you can lock this one, thank you.

Re: Check failing to run correctly on new XI server

Posted: Tue Aug 25, 2020 10:54 am
by scottwilkerson
danniiffxi wrote:Hi Scott

Sorry for the late reply, I have been off for a while. It's all working now, you can lock this one, thank you.
Great!

Locking thread