Check works when testing via CLI and GUI, but not live
-
philip.ianni
- Posts: 29
- Joined: Tue Dec 29, 2015 12:35 pm
Check works when testing via CLI and GUI, but not live
Some weird issue Im having with a ESXI check.
The check runs successfully as user Nagios in the command line, also when testing in GUI (which runs as user Apache as far as I know) however when I implement it live, it times out. I have tried increases the timeout interval by a lot but it does not work. What is interesting is I have 6 hosts under that one service definition and three of them work live.
It may be relevant that the directory structure for the script is $USER1$/check_vmware/check_vmware_esx so the plugin is actually siting in a folder which is sitting in the libexec directory.
The script is also referencing a credential file.
The check runs successfully as user Nagios in the command line, also when testing in GUI (which runs as user Apache as far as I know) however when I implement it live, it times out. I have tried increases the timeout interval by a lot but it does not work. What is interesting is I have 6 hosts under that one service definition and three of them work live.
It may be relevant that the directory structure for the script is $USER1$/check_vmware/check_vmware_esx so the plugin is actually siting in a folder which is sitting in the libexec directory.
The script is also referencing a credential file.
You do not have the required permissions to view the files attached to this post.
Re: Check works when testing via CLI and GUI, but not live
1) To clarify, from what area of the UI are you running the tests?
2) Lets check permissions real quick. What is the output of the following?
2) Lets check permissions real quick. What is the output of the following?
Code: Select all
grep nag /etc/groupCode: Select all
grep "User \|Group " /etc/httpd/conf/httpd.confBe sure to check out the Knowledgebase for helpful articles and solutions!
Re: Check works when testing via CLI and GUI, but not live
To add to what @bwallace mentioned, what is the permissions on the credential file?
Former Nagios Employee
-
philip.ianni
- Posts: 29
- Joined: Tue Dec 29, 2015 12:35 pm
Re: Check works when testing via CLI and GUI, but not live
nagios
500:nagios,apache
nagcmd
501:nagios,apache
# . On SCO (ODT 3) use "User nouser" and "Group nogroup".
# when the value of (unsigned)Group is above 60000;
# don't use Group #-1 on these systems!
User apache
Group apache
I am doing the tests inside the service definition via the "Test Command" button
Permissions on the credential file is apache:nagios
**Keep in mind this check works for some of hosts but not all, despite using the same file
nagcmd
# . On SCO (ODT 3) use "User nouser" and "Group nogroup".
# when the value of (unsigned)Group is above 60000;
# don't use Group #-1 on these systems!
User apache
Group apache
I am doing the tests inside the service definition via the "Test Command" button
Permissions on the credential file is apache:nagios
**Keep in mind this check works for some of hosts but not all, despite using the same file
Re: Check works when testing via CLI and GUI, but not live
What error messages are you seeing for the 3 that do not work currently?
Former Nagios Employee
-
philip.ianni
- Posts: 29
- Joined: Tue Dec 29, 2015 12:35 pm
Re: Check works when testing via CLI and GUI, but not live
The error I get is "(Service check timed out after 60.01 seconds)"
Re: Check works when testing via CLI and GUI, but not live
You tested your check from the CLI and under the CCM but have you tried forcing an immediate check under the "Service Status Detail" page? Can you show us a screenshot of the "Service Status Detail" page ("Advanced" tab)?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
philip.ianni
- Posts: 29
- Joined: Tue Dec 29, 2015 12:35 pm
Re: Check works when testing via CLI and GUI, but not live
Here is some more info
The first photo shows how most of the checks work live however not all of them work. The unknown host will eventually time out
As requested here is the advanced tab of the host that failed
I think I found the problem, when I try to force a new check while tailing /var/log/messages, I can see some errors
The first photo shows how most of the checks work live however not all of them work. The unknown host will eventually time out
As requested here is the advanced tab of the host that failed
I think I found the problem, when I try to force a new check while tailing /var/log/messages, I can see some errors
Code: Select all
Jan 14 10:50:47 vfmsrv107 nagios: wproc: host=DC02-VMH-03; service=System Temperatures;
Jan 14 10:50:47 asddsasdrv107 nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jan 14 10:50:47 vasddasdrv107 nagios: wproc: stderr line 01: Use of uninitialized value $PID_old in kill at /usr/local/nagios/libexec/check_vmware/check_vmware_esx.pl line 1630.
Jan 14 10:50:47 vfasdasdrv107 nagios: Warning: Check of service 'System Temperatures' on host 'DC02-VMH-03' timed out after 60.007s!
Jan 14 10:50:47 vfasdasdrv107 nagios: wproc: Core Worker 50711: job 132 (pid=58179): Dormant child reaped
You do not have the required permissions to view the files attached to this post.
-
philip.ianni
- Posts: 29
- Joined: Tue Dec 29, 2015 12:35 pm
Re: Check works when testing via CLI and GUI, but not live
So it looks like there some OLD PID issue with the plugin, Im looking into it now however Im afraid I don't know perl that well
I've posted the plugin here just in case someone is willing to quickly debug it
I appreciate the help thus far
I've posted the plugin here just in case someone is willing to quickly debug it
I appreciate the help thus far
-
philip.ianni
- Posts: 29
- Joined: Tue Dec 29, 2015 12:35 pm
Re: Check works when testing via CLI and GUI, but not live
Sorry It would not let post the script in the above post. Here it is
You do not have the required permissions to view the files attached to this post.