Page 1 of 1
CRITICAL: Return code of 255 is out of bounds. (worker:)
Posted: Fri Apr 21, 2017 1:50 pm
by jordielaforge15
I am getting the error - CRITICAL: Return code of 255 is out of bounds. (worker: worker-name). Right underneath that error is this one: UNKNOWN - check_by_ssh: Remote command 'nice -n19 /home/vi-admin/box293_check_vmware.pl --server 10.1.2.100 --check Host_CPU_Usage --concurrent_checks 75 --timeout 120' returned status 255.
I have a mod gearman worker running a check through a VMA (vmware management assistant) for checking ESXi host health, mem, cpu, ect... The check works fine, however fairly often it will throw the above errors for just a few seconds and then they clear up and back to everything being okay. I can run the check manually from the worker and have 100% success but Nagios is throwing these errors quite often. Any ideas on how to make this stop happening?
Re: CRITICAL: Return code of 255 is out of bounds. (worker:)
Posted: Fri Apr 21, 2017 2:31 pm
by dwhitfield
It looks like this is your first post, so let me say this. If you're a customer, you should post in the customer forum. If you don't have access to the customer forum (and are a customer!) email
[email protected] with your forum username, forum email, and customer email, and they will set you up for customer forum access.
With that out of the way...
This may be due to a recent upgrade of perl on the mod_gearman server.
Can you please post your yum.log (probably at /var/log), assuming the mod_gearman server uses yum. If not, apt logs or whatever.
The following may also be useful (you likely only have one)
Code: Select all
tail -500 tail /var/log/gearmand/gearmand.log
tail -500 tail /var/log/gearmand.log
I've seen
cpan Config::IniFiles fix this for others, but make sure you have a backup first. Mixing cpan and yum perl can cause issues.
Re: CRITICAL: Return code of 255 is out of bounds. (worker:)
Posted: Fri Apr 21, 2017 3:06 pm
by jordielaforge15
Thanks for the reply.
[2017-04-21 14:50:36][7492][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Host State
[2017-04-21 14:51:26][13637][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Host Health
[2017-04-21 14:51:26][9592][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Memory
[2017-04-21 14:51:44][29410][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Host State
[2017-04-21 14:52:36][7492][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Host Health
[2017-04-21 14:52:46][25265][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Host State
[2017-04-21 14:52:56][26210][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Memory
[2017-04-21 14:53:46][17291][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Host Health
[2017-04-21 14:53:56][2659][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Host State
[2017-04-21 14:54:16][4535][INFO ] timeout (60s) hit for servicecheck: AGP_01VM01 - 01VM01 Memory
Looks like it is timing out. The last update in the yum log was from this last October, so I dont think that is the issue.
Re: CRITICAL: Return code of 255 is out of bounds. (worker:)
Posted: Sun Apr 23, 2017 5:15 pm
by dwhitfield
Your service timeout is set to 120, but it appears your mod_gearman timeout is set to 60. You could try raising it to 120 and see if you still have the same errors.