Page 1 of 1

Return code of 255 for service alerts are being triggered

Posted: Tue Sep 22, 2020 11:59 am
by Jagannadharao
Dear Team,
We are observing "Return code of 255 for service 'AAAAAA' on host 'BBBBBBB' was out of bounds" alerts being triggered frequently for one or more file system usage services from a Solaris11 server which is in turn causing high volume of alerts being created as false alerts per day.

Kindly help to advise how to stop sending these kind of alerts and send only one alert to indicate that one or more services are giving this error.

Please refer the attached screenshot for additional details on this regard.
Total 310 alerts triggered for 24 hour duration.



Thank you.

Best Regards,
Jagan

Re: Return code of 255 for service alerts are being triggere

Posted: Wed Sep 23, 2020 11:17 am
by benjaminsmith
Hi,

That error message usually means it was able to connect to the remote host but did not get a proper response code from the plugin. Please run the service check directly from the terminal and post the error to the thread. Also, after running the service, type echo $? to see the return code provided by the plugin.

For example:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H REMOTEHOST -c check_disk 
echo $?
Please refer to following KB article for instructions on testing commands the shell.

Nagios XI - How To Test Check Commands From The Command-line

In the meantime, if you'd like to reduce the number of notifications, you can set the notification interval to 0 and it will only send a single notification.

Regards,
Benjamin
notification_interval: This directive is used to define the number of "time units" to wait before re-notifying a contact that this service is still down or unreachable. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this host - only one problem notification will be sent out.