Best practice for monitoring process crash/restarts?

cscholz · Post by **cscholz** » Wed Mar 07, 2012 4:46 pm

We have some processes which we'd like an alert for if they crash. Typically we'd just monitor the service as normal with NagiosXI, however this process restarts itself after it crashes and logs the crash in /var/log/messages.

Are there any best practices for monitoring when this process has crashed? You can't simply monitor the service since there's a good chance if it crashes and restarts, it will happen between NRPE checks. We can parse log files and check for the crash, but then when do you take it off the board as all clear?

Just wondering if anyone else is monitoring for these sorts of crash/restart events, and how you handle them.

scottwilkerson · Post by **scottwilkerson** » Thu Mar 08, 2012 1:26 pm

This is going to come down to preference, when do you want it marked all clear?

What I mean by that is, is it all clear if the process restarts correctly?

cscholz · Post by **cscholz** » Mon Mar 12, 2012 8:54 am

scottwilkerson wrote:This is going to come down to preference, when do you want it marked all clear?

What I mean by that is, is it all clear if the process restarts correctly?

If it restarted cleanly and has been running that way for, say, 10 minutes without another restart I would consider that all clear. The most important thing is the email alert to the team so we know to pull the logs for the crash.

I am tempted to use logwatch for this, but I don't know if that can run every minute without adding to system load on production systems.

scottwilkerson · Post by **scottwilkerson** » Mon Mar 12, 2012 12:02 pm

I found some 3rd party plugins that look like they could do the trick.

The first is at http://www.unixautomation.com/unix-log- ... alysis.htm and although it looks like the developer wants $9.95 for it, it does exactly what you want, you can set the amount of time and pattern. I haven't used this, just read the documentation.

The second option is open source, more comprehensive but I believe you may be able to use it to fulfill your needs
http://labs.consol.de/lang/en/nagios/check_logfiles/

Nagios Support Forum

Best practice for monitoring process crash/restarts?

Best practice for monitoring process crash/restarts?

Re: Best practice for monitoring process crash/restarts?

Re: Best practice for monitoring process crash/restarts?

Re: Best practice for monitoring process crash/restarts?