CRITICAL: Return code of 255 is out of bounds. (worker:)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
yangzhiyao2653
Posts: 27
Joined: Fri May 18, 2018 1:15 am

CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by yangzhiyao2653 »

I am getting the error - CRITICAL: Return code of 255 is out of bounds. (worker: worker-name). Right underneath that error is this one: UNKNOWN - check_by_ssh: Remote command '$USER1$/check_by_ssh -E 1 -t 120 -l vi-admin -H $ARG1$ -C "~/box293_check_vmware.pl --timeout 120 --server $ARG2$ --check Host_Memory_Usage --host \"$HOSTADDRESS$\" --perfdata_option Memory_Free:1,Memory_Total:1,Memory_Used:1,Memory_Used%:1 --reporting_si \"$ARG4$\" --warning \"$ARG5$\" --critical \"$ARG6$\" \"$ARG7$\" \"$ARG8$\""' returned status 255.
I have a mod gearman worker running a check through a VMA (vmware management assistant) for checking ESXi host health, mem, cpu, ect... The check works fine, however fairly often it will throw the above errors for just a few seconds and then they clear up and back to everything being okay. I can run the check manually from the worker and have 100% success but Nagios is throwing these errors quite often. Any ideas on how to make this stop happening?
Just like it,but it didn't solve the actual problem. https://support.nagios.com/forum/viewto ... =6&t=43576
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by scottwilkerson »

Is it possible that any of your workers don't have shared keys with the server? I see it is calling check_by_ssh

This could have it to fail on one worker and not another.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
yangzhiyao2653
Posts: 27
Joined: Fri May 18, 2018 1:15 am

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by yangzhiyao2653 »

I'm sure the worker there is no problem, because I have other queue also performed for the worker. I can run the check manually from the worker and have 100% success but Nagios is throwing these errors quite often.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by scottwilkerson »

Is there any errors in the mod_gearman logs on the worker that is having this issue?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
yangzhiyao2653
Posts: 27
Joined: Fri May 18, 2018 1:15 am

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by yangzhiyao2653 »

I think I found the reason, I'm mod_gearman worker set the timeout is 120 s, but see the timeout in the log is 60s, I restarted the service, but still is the result of this.Can you answer my doubts?
In the /etc/mod_gearman2/worker.conf ,the job_timeout=120,but in my mod_gearman_worker is timeout(60s)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by scottwilkerson »

You may need to adjust your service_check_timeout setting in your nagios.cfg and then restart nagios

This is the maximum time nagios will wait for the service to return.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
yangzhiyao2653
Posts: 27
Joined: Fri May 18, 2018 1:15 am

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by yangzhiyao2653 »

I have been configured for 120 seconds before, but the problem is still not improve.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by scottwilkerson »

When you looked at the /etc/mod_gearman2/worker.conf for the job_timeout setting, did you look at the file on all of your workers?

Did you restart the mod_gearman workers after adjusting it up to 120?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
yangzhiyao2653
Posts: 27
Joined: Fri May 18, 2018 1:15 am

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by yangzhiyao2653 »

I check is warning that the worker, but all the worker I have adjusted, and restart the mod-gearman2-worker and the gearmand services, nagios. cfg was adjusted to 120 seconds.I can't think of what can lead to appear the timeout.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: CRITICAL: Return code of 255 is out of bounds. (worker:)

Post by scottwilkerson »

Do you see the errors in the worker logs?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked