We're using nagios core 3.5.1 and every once in a while, we will experience a terrible bit of confusion.
All of our services for each of our hosts will return "(null)" without any rhyme or reason. Also the occasional "No address associated with hostname" will sometimes appear.
Here's a small snippet of our service history as the above was happening:
After about 10-20 minutes later they (services) will all start to show signs of recovery and we get a mixture of warning and recovery notifications. Then a little while after that, everything goes back to being okay.[2015-05-25 14:17:42] SERVICE ALERT: hosta;HTTPS;CRITICAL;SOFT;1;No address associated with hostname
Service Warning[2015-05-25 14:17:42] SERVICE ALERT: hostb;Swap status Quiet;WARNING;SOFT;3;(null)
Service Warning[2015-05-25 14:17:42] SERVICE ALERT: hostc;Swap status;WARNING;SOFT;1;(null)
Service Warning[2015-05-25 14:17:42] SERVICE ALERT: hostd;Ping some.site.com;WARNING;SOFT;1;(null)
Service Warning[2015-05-25 14:17:42] SERVICE ALERT: hoste;Current Users;WARNING;SOFT;1;(null)
This happens every few months, without any changes to our configs.
The logs do not show anything beyond what is already shown by the plugin output. Also, running the commands manually through check_nrpe or even as the nagios user executing a plugin that's showing null as output, returns sane values.
We would like to know:
A. What causes this? or How to find out what causes this.
and
B. If this is a known issue, ways to deal with it.
Like I said, the logs do not show anything useful and what adds to the confusion is that when we execute any of the services experiencing issues, they return OK values. We have yet to be able to manually reproduce a "(null)" output.
Does anyone have any advice regarding what I can do next time this happens to try and find the cause and implement a fix? Any and all tips are welcome.