restart a service by SSH.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
lraymond
Posts: 29
Joined: Thu Jul 12, 2012 10:14 am

restart a service by SSH.

Post by lraymond »

ok, had a nice long thread which sadly never got working where I had a java server run out of RAM, can trip a memory alert, but using NRPE just can't kill and restart the service. Had some great help ideas, but just never got it to kill/restart.

So, still having the java issues, I am wondering can I kick a local process off, well I'm sure I can, but wondering what/how. I can setup some SSH key's, setup some port forwarding on my load balancer, so when a critical hit via the check_nrpe!check_mem, simply say run /usr/lib/nagios/plugins/restartremoteservice.sh

That would be a local bash script that would ssh in (use key's so no password) kill the java pid and restart the app!

Thanks to everyone who tried restart version 1, so now gonna try restart version 2 :)
yancy
Posts: 523
Joined: Thu Oct 06, 2011 10:12 am

Re: restart a service by SSH.

Post by yancy »

Iraymond,

It sounds like you already have a working script which will SSH to a machine and perform some actions.

You can setup event handling to execute your script upon a particular event (such as a critical check_nrpe)
http://nagios.sourceforge.net/docs/3_0/ ... dlers.html

Regards,

-Yancy
lraymond
Posts: 29
Joined: Thu Jul 12, 2012 10:14 am

Re: restart a service by SSH.

Post by lraymond »

Cool, got things going. The only issue it seems is when the event fires, it does find/kill java, writes and entry in the log and then fires it again 2 minutes later it seems. So I woud like in the host to say check every minute or two, but if something happens, fire the event handler, then wait 5 minutes (something like that). The service looks like this;

Code: Select all

define service{
          use                   generic-service
          host_name             GFS3
          service_description   Memory + Restart
          check_command         check_nrpe_lb!check_mem
          event_handler         restart_gfs3
          max_check_attempts    1
          check_interval        2   
          retry_interval        2
}
I looked http://nagios.sourceforge.net/docs/3_0/ ... val_length trying to see and use an interval_length but nagios complained on restart about it. So I did change the check interval to 4, but it's still not enough time it seems as the script does come back 4 minutes later and restarts it. The second time is enough and the 3rd pass all is green, but I would like to say;

check every minute. If critical, don't wait for a 2nd attempt, just fire the event handler, then goto sleep and wait 5 minutes (something like that).

Thanks
Locked