Best practice for monitoring process crash/restarts?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
cscholz
Posts: 36
Joined: Wed Mar 07, 2012 4:40 pm

Best practice for monitoring process crash/restarts?

Post by cscholz »

We have some processes which we'd like an alert for if they crash. Typically we'd just monitor the service as normal with NagiosXI, however this process restarts itself after it crashes and logs the crash in /var/log/messages.

Are there any best practices for monitoring when this process has crashed? You can't simply monitor the service since there's a good chance if it crashes and restarts, it will happen between NRPE checks. We can parse log files and check for the crash, but then when do you take it off the board as all clear?

Just wondering if anyone else is monitoring for these sorts of crash/restart events, and how you handle them.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Best practice for monitoring process crash/restarts?

Post by scottwilkerson »

This is going to come down to preference, when do you want it marked all clear?

What I mean by that is, is it all clear if the process restarts correctly?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
cscholz
Posts: 36
Joined: Wed Mar 07, 2012 4:40 pm

Re: Best practice for monitoring process crash/restarts?

Post by cscholz »

scottwilkerson wrote:This is going to come down to preference, when do you want it marked all clear?

What I mean by that is, is it all clear if the process restarts correctly?
If it restarted cleanly and has been running that way for, say, 10 minutes without another restart I would consider that all clear. The most important thing is the email alert to the team so we know to pull the logs for the crash.

I am tempted to use logwatch for this, but I don't know if that can run every minute without adding to system load on production systems.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Best practice for monitoring process crash/restarts?

Post by scottwilkerson »

I found some 3rd party plugins that look like they could do the trick.

The first is at http://www.unixautomation.com/unix-log- ... alysis.htm and although it looks like the developer wants $9.95 for it, it does exactly what you want, you can set the amount of time and pattern. I haven't used this, just read the documentation.

The second option is open source, more comprehensive but I believe you may be able to use it to fulfill your needs
http://labs.consol.de/lang/en/nagios/check_logfiles/
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked