Best practice for monitoring process crash/restarts?
Posted: Wed Mar 07, 2012 4:46 pm
We have some processes which we'd like an alert for if they crash. Typically we'd just monitor the service as normal with NagiosXI, however this process restarts itself after it crashes and logs the crash in /var/log/messages.
Are there any best practices for monitoring when this process has crashed? You can't simply monitor the service since there's a good chance if it crashes and restarts, it will happen between NRPE checks. We can parse log files and check for the crash, but then when do you take it off the board as all clear?
Just wondering if anyone else is monitoring for these sorts of crash/restart events, and how you handle them.
Are there any best practices for monitoring when this process has crashed? You can't simply monitor the service since there's a good chance if it crashes and restarts, it will happen between NRPE checks. We can parse log files and check for the crash, but then when do you take it off the board as all clear?
Just wondering if anyone else is monitoring for these sorts of crash/restart events, and how you handle them.