Check works when testing via CLI and GUI, but not live

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
philip.ianni
Posts: 29
Joined: Tue Dec 29, 2015 12:35 pm

Check works when testing via CLI and GUI, but not live

Post by philip.ianni »

Some weird issue Im having with a ESXI check.

The check runs successfully as user Nagios in the command line, also when testing in GUI (which runs as user Apache as far as I know) however when I implement it live, it times out. I have tried increases the timeout interval by a lot but it does not work. What is interesting is I have 6 hosts under that one service definition and three of them work live.

It may be relevant that the directory structure for the script is $USER1$/check_vmware/check_vmware_esx so the plugin is actually siting in a folder which is sitting in the libexec directory.

The script is also referencing a credential file.
You do not have the required permissions to view the files attached to this post.
bwallace
Posts: 1145
Joined: Tue Nov 17, 2015 1:57 pm

Re: Check works when testing via CLI and GUI, but not live

Post by bwallace »

1) To clarify, from what area of the UI are you running the tests?

2) Lets check permissions real quick. What is the output of the following?

Code: Select all

grep nag /etc/group

Code: Select all

grep "User \|Group " /etc/httpd/conf/httpd.conf
Be sure to check out the Knowledgebase for helpful articles and solutions!
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Check works when testing via CLI and GUI, but not live

Post by rkennedy »

To add to what @bwallace mentioned, what is the permissions on the credential file?
Former Nagios Employee
philip.ianni
Posts: 29
Joined: Tue Dec 29, 2015 12:35 pm

Re: Check works when testing via CLI and GUI, but not live

Post by philip.ianni »

nagios:x:500:nagios,apache
nagcmd:x:501:nagios,apache



# . On SCO (ODT 3) use "User nouser" and "Group nogroup".
# when the value of (unsigned)Group is above 60000;
# don't use Group #-1 on these systems!
User apache
Group apache


I am doing the tests inside the service definition via the "Test Command" button

Permissions on the credential file is apache:nagios

**Keep in mind this check works for some of hosts but not all, despite using the same file
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Check works when testing via CLI and GUI, but not live

Post by rkennedy »

What error messages are you seeing for the 3 that do not work currently?
Former Nagios Employee
philip.ianni
Posts: 29
Joined: Tue Dec 29, 2015 12:35 pm

Re: Check works when testing via CLI and GUI, but not live

Post by philip.ianni »

The error I get is "(Service check timed out after 60.01 seconds)"
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Check works when testing via CLI and GUI, but not live

Post by lmiltchev »

You tested your check from the CLI and under the CCM but have you tried forcing an immediate check under the "Service Status Detail" page? Can you show us a screenshot of the "Service Status Detail" page ("Advanced" tab)?
Be sure to check out our Knowledgebase for helpful articles and solutions!
philip.ianni
Posts: 29
Joined: Tue Dec 29, 2015 12:35 pm

Re: Check works when testing via CLI and GUI, but not live

Post by philip.ianni »

Here is some more info

The first photo shows how most of the checks work live however not all of them work. The unknown host will eventually time out
BeforeFail.PNG
As requested here is the advanced tab of the host that failed
After.PNG
I think I found the problem, when I try to force a new check while tailing /var/log/messages, I can see some errors

Code: Select all

Jan 14 10:50:47 vfmsrv107 nagios: wproc:   host=DC02-VMH-03; service=System Temperatures;
Jan 14 10:50:47 asddsasdrv107 nagios: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jan 14 10:50:47 vasddasdrv107 nagios: wproc:   stderr line 01: Use of uninitialized value $PID_old in kill at /usr/local/nagios/libexec/check_vmware/check_vmware_esx.pl line 1630.
Jan 14 10:50:47 vfasdasdrv107 nagios: Warning: Check of service 'System Temperatures' on host 'DC02-VMH-03' timed out after 60.007s!
Jan 14 10:50:47 vfasdasdrv107 nagios: wproc: Core Worker 50711: job 132 (pid=58179): Dormant child reaped
You do not have the required permissions to view the files attached to this post.
philip.ianni
Posts: 29
Joined: Tue Dec 29, 2015 12:35 pm

Re: Check works when testing via CLI and GUI, but not live

Post by philip.ianni »

So it looks like there some OLD PID issue with the plugin, Im looking into it now however Im afraid I don't know perl that well

I've posted the plugin here just in case someone is willing to quickly debug it

I appreciate the help thus far
philip.ianni
Posts: 29
Joined: Tue Dec 29, 2015 12:35 pm

Re: Check works when testing via CLI and GUI, but not live

Post by philip.ianni »

Sorry It would not let post the script in the above post. Here it is
check_vmware_esx.pl
You do not have the required permissions to view the files attached to this post.
Locked