Dear Support team,
We are planning to roll out NRPE on Linux servers in our infrastructure to monitor disks, hardware and system load.
I installed NRPE on few servers for testing purpose and everything seems fine. When I shutdown a server it will raise a ticket for the host down but at times the server takes sometime before it can shutdown all services before going for a halt. In such a case Nagios would raise a service ticket stating that NRPE has been timed out and later a host down ticket. This behaviour is pretty genuine. But in my case it is an unwanted and double alert. Is there a trick to overcome this?
I understand one way is to increase the value of max_check_attempts for NRPE service check but we don't want to risk increasing it too high.
Thanks in advance.
NRPE and alerts
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: NRPE and alerts
You could also increase the retry interval, default is every minute until max_check_attempts is reached.