Page 1 of 2

Costantly getting Uknown Errors monitoing ESX

Posted: Thu Jun 25, 2015 3:12 pm
by bosecorp
I am constantly getting this error

State: CRITICAL
Info:
ESX3 CRITICAL - HOST CPU Unknown error
Date/Time: 2015-06-25 15:14:29


I tried the stuff recommended here

https://support.nagios.com/forum/viewto ... or#p125599

but it did not help

after some time, the system retries and goes back to normal. in any given day happens like 40 times

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Thu Jun 25, 2015 3:59 pm
by lmiltchev
Does the check work every time when you run it from the command line? Can you show us the actual command, run from the CLI and the output of it? Are you using Mod Gearman?

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Fri Jun 26, 2015 10:23 am
by bosecorp
the error is random. if I run it from the CLI, it might work.

it seems like at times a get the error.

and yes, I am using mod_gearman

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Fri Jun 26, 2015 10:49 am
by lmiltchev
if I run it from the CLI, it might work.
It is possible that the check might work locally, but it would fail when it's run from the remote worker if you haven't copied over the auth file. This can explain why it is working intermittently.
If this is a timeout issue, which is not very likely, you can try increasing the timeout to let's say 60 seconds (the default is 30).

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Mon Jun 29, 2015 8:49 am
by bosecorp
I tried increasing the timeout to 90

the worker where this check is running on is on the same server where XI runs on. therefore it can not be an issue with the scripts

I can try moving this check to a different worker

I agree, this does not seem a time out issue.

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Mon Jun 29, 2015 9:33 am
by lmiltchev
What is the mod gearman version that you are currently using? Can you show us the worker config? Hide sensitive info.

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Mon Jun 29, 2015 10:04 am
by bosecorp
mod_gearman.x86_64 1.5.0b1-1.el6 @/mod_gearman-1.5.0b1-1.el6.x86_64

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Mon Jun 29, 2015 2:40 pm
by tgriep
Can you check and see if you are running the latest VMWare Wizard?
Go in to Admin > Manage Config Wizards and see if the VMWare wizard is at the latest version.
Also, do you have your gearman settings set to only allow this check to happen on the local server only and not a worker?

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Mon Jun 29, 2015 2:58 pm
by bosecorp
yes, I do have one worker

here is the version 1.6

Re: Costantly getting Uknown Errors monitoing ESX

Posted: Mon Jun 29, 2015 3:32 pm
by jdalrymple
Hi Bosecorp

I reviewed your latest profile.zip and it appears that your esx servers should indeed be monitored by your primary XI instance and not a remote worker (assuming the configs haven't changed a lot)

I think the next step is to see if there are any verbose errors in your nagios.log

Code: Select all

grep -i esx /usr/local/nagios/var/nagios.log
You might find a great deal of useless output in that command, but if this is happening 40 times a day you should see some output related to the issue. Take a look and if you see anything interesting point it out here. Otherwise we may have to turn debugging on at your local gearman worker and look there.