Page 1 of 2

Check works when testing via CLI and GUI, but not live

Posted: Fri Jan 08, 2016 3:58 pm
by philip.ianni
Some weird issue Im having with a ESXI check.

The check runs successfully as user Nagios in the command line, also when testing in GUI (which runs as user Apache as far as I know) however when I implement it live, it times out. I have tried increases the timeout interval by a lot but it does not work. What is interesting is I have 6 hosts under that one service definition and three of them work live.

It may be relevant that the directory structure for the script is $USER1$/check_vmware/check_vmware_esx so the plugin is actually siting in a folder which is sitting in the libexec directory.

The script is also referencing a credential file.

Re: Check works when testing via CLI and GUI, but not live

Posted: Mon Jan 11, 2016 10:59 am
by bwallace
1) To clarify, from what area of the UI are you running the tests?

2) Lets check permissions real quick. What is the output of the following?

Code: Select all

grep nag /etc/group

Code: Select all

grep "User \|Group " /etc/httpd/conf/httpd.conf

Re: Check works when testing via CLI and GUI, but not live

Posted: Mon Jan 11, 2016 11:38 am
by rkennedy
To add to what @bwallace mentioned, what is the permissions on the credential file?

Re: Check works when testing via CLI and GUI, but not live

Posted: Tue Jan 12, 2016 2:29 pm
by philip.ianni
nagios:x:500:nagios,apache
nagcmd:x:501:nagios,apache



# . On SCO (ODT 3) use "User nouser" and "Group nogroup".
# when the value of (unsigned)Group is above 60000;
# don't use Group #-1 on these systems!
User apache
Group apache


I am doing the tests inside the service definition via the "Test Command" button

Permissions on the credential file is apache:nagios

**Keep in mind this check works for some of hosts but not all, despite using the same file

Re: Check works when testing via CLI and GUI, but not live

Posted: Tue Jan 12, 2016 2:33 pm
by rkennedy
What error messages are you seeing for the 3 that do not work currently?

Re: Check works when testing via CLI and GUI, but not live

Posted: Wed Jan 13, 2016 11:31 am
by philip.ianni
The error I get is "(Service check timed out after 60.01 seconds)"

Re: Check works when testing via CLI and GUI, but not live

Posted: Wed Jan 13, 2016 3:00 pm
by lmiltchev
You tested your check from the CLI and under the CCM but have you tried forcing an immediate check under the "Service Status Detail" page? Can you show us a screenshot of the "Service Status Detail" page ("Advanced" tab)?

Re: Check works when testing via CLI and GUI, but not live

Posted: Thu Jan 14, 2016 10:56 am
by philip.ianni
Here is some more info

The first photo shows how most of the checks work live however not all of them work. The unknown host will eventually time out
BeforeFail.PNG
As requested here is the advanced tab of the host that failed
After.PNG
I think I found the problem, when I try to force a new check while tailing /var/log/messages, I can see some errors

Code: Select all

Jan 14 10:50:47 vfmsrv107 nagios: wproc:   host=DC02-VMH-03; service=System Temperatures;
Jan 14 10:50:47 asddsasdrv107 nagios: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jan 14 10:50:47 vasddasdrv107 nagios: wproc:   stderr line 01: Use of uninitialized value $PID_old in kill at /usr/local/nagios/libexec/check_vmware/check_vmware_esx.pl line 1630.
Jan 14 10:50:47 vfasdasdrv107 nagios: Warning: Check of service 'System Temperatures' on host 'DC02-VMH-03' timed out after 60.007s!
Jan 14 10:50:47 vfasdasdrv107 nagios: wproc: Core Worker 50711: job 132 (pid=58179): Dormant child reaped

Re: Check works when testing via CLI and GUI, but not live

Posted: Thu Jan 14, 2016 11:12 am
by philip.ianni
So it looks like there some OLD PID issue with the plugin, Im looking into it now however Im afraid I don't know perl that well

I've posted the plugin here just in case someone is willing to quickly debug it

I appreciate the help thus far

Re: Check works when testing via CLI and GUI, but not live

Posted: Thu Jan 14, 2016 11:15 am
by philip.ianni
Sorry It would not let post the script in the above post. Here it is
check_vmware_esx.pl