NRPE: Automatic restart of multiple services

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Automatic restart of multiple services

Post by mhixson2 »

jdalrymple wrote:Did you try it with only 1 failed service for sure? We have to crawl before we can walk.
I did. And just did again to be sure. Changed the check service and variable definition to only worry about one service as shown below.

Code: Select all

 host_name    [hostname]
 service_description  #TEST restart dead service
 use    #standard-service
 check_command   check_nrpe!check_service!service=mfcom!'crit=not state_is_ok()'!!!!!
 event_handler   restart_service
 event_handler_enabled  1
 _SERVICE    mfcom
 register    1
Stopped service on the host, confirmed Nagios picked up that it's stopped, no auto-restart through several forced immediate checks via the UI.

The script works when restarting the service alone via command line.

Code: Select all

./check_nrpe -H [hostname] -p 5666 -t 30 -c restart_service -a mfcom
The Citrix MFCOM Service service is stopping.
The Citrix MFCOM Service service was stopped successfully.

The Citrix MFCOM Service service is starting.
The Citrix MFCOM Service service was started successfully.|
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: NRPE: Automatic restart of multiple services

Post by jdalrymple »

debugging time - sorry to make you a Guinea Pig, you're already set up for it though.

Change your code to this:

Code: Select all

    #!/bin/sh
    # Event Handler for Restarting Windows Services
        case "$1" in
                OK)
                        ;;
                WARNING)
                        ;;
                UNKNOWN)
                        ;;
                CRITICAL)
                        crittext='CRITICAL: '
                        autodeltext=' (auto), delayed ()'
                        autotext=' (auto)'
                        stoppedtext='=stopped'
                        perfpipetext='|'
                        stripped=${3//$crittext/}
                        stripped=${stripped//$autodeltext/}
                        stripped=${stripped//$autotext/}
                        stripped=${stripped//$stoppedtext/}
                        stripped=${stripped//$perfpipetext/}
                        stripped=${stripped//, /,}
                        IFS=',' read -a array <<< "$stripped"
                        pattern='\ '
                        for ((i=0; i<${#array[@]}; i++));
                        do
                                if [[ ${array[$i]} =~ $pattern ]]; then
                                        array[$i]="'${array[$i]}'"
                                fi
                        done
                        services=$(IFS=,; echo "${array[*]}")
                        echo "/usr/local/nagios/libexec/check_nrpe -H \"$2\" -p 5666 -c restart_service -a \"$services\"" >> /tmp/foo
                ;;
        esac

        exit 0
Then share with us the output of /tmp/foo
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Automatic restart of multiple services

Post by mhixson2 »

Hmm... file 'foo' is not being created. I tried changing the target directory from /tmp (owned by root) to /home/nagios/tmp, but that made no change. I also tried running the check command calling restart_service (ran successfully) and the script directly (./restart_service.sh) and in no case was file 'foo' created.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: NRPE: Automatic restart of multiple services

Post by jdalrymple »

That puts me at something of a loss... I copied/pasted:

Code: Select all

[jdalrymple@localhost ~]$ ./foodbar.sh CRITICAL 127.0.0.1 "CRITICAL: Spooler=stopped (auto), delayed ()|"
[jdalrymple@localhost ~]$ cat /tmp/foo
/usr/local/nagios/libexec/check_nrpe -H "127.0.0.1" -p 5666 -c restart_service -a "Spooler"
I'm not sure what's missing. Using whatever user you're logged on as just try `touch /tmp/file` and see for sure if it's a permissions thing. If so there is no reason (for debugging purposes) this script couldn't be run as root, it's only writing a file.
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Automatic restart of multiple services

Post by mhixson2 »

Ok, thanks.
I will dig into this further this afternoon. I'll report back.
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Automatic restart of multiple services

Post by mhixson2 »

Ok, here are some results when manually running the script via command line like you did. My syntax was all wrong when I tried to do it.

Code: Select all

$ ./restart_service.sh CRITICAL [hostname] "CRITICAL: mfcom=stopped (auto), delayed ()|"
$ cat /tmp/foo
/usr/local/nagios/libexec/check_nrpe -H "[hostname]" -p 5666 -c restart_service -a "mfcom"
So the event handler isn't working? I've confirmed the event handler is specified in the check settings and 'event handler enabled' is on.
Last edited by mhixson2 on Thu Jul 16, 2015 12:21 pm, edited 1 time in total.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NRPE: Automatic restart of multiple services

Post by ssax »

Sorry, the original code is proper but my command was wrong (I updated the original post), change your command to this and it should work.

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ '$SERVICEOUTPUT$'
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Automatic restart of multiple services

Post by mhixson2 »

ssax wrote:Sorry, the original code is proper but my command was wrong (I updated the original post), change your command to this and it should work.

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ '$SERVICEOUTPUT$'
Awesome, thanks. I've made the change.

As of now, it's still not working on its own, though every component seems to work individually. I found that I had named a few of the items in this process the same, so I've named each (event handler command, batch file, shell script) uniquely and can confirm that on every check with a critical status, the file 'foo' is being populated with a working command entry:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H "[host IP]" -p 5666 -c restart_service_batch -a "mfcom"
Which, when run on it's own, works. But it still doesn't fire off automatically. Any thoughts on that?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NRPE: Automatic restart of multiple services

Post by ssax »

Let's check the permissions:

Code: Select all

ls -l /usr/local/nagios/libexec/restart_service.sh
Also, make sure that you have the event handler enabled and that you have the proper event handler selected (see the image below).
event_handler.png
You do not have the required permissions to view the files attached to this post.
mhixson2
Posts: 96
Joined: Wed Jun 24, 2015 3:02 pm

Re: NRPE: Automatic restart of multiple services

Post by mhixson2 »

Sure.

Permissions:

Code: Select all

$ ls -l /usr/local/nagios/libexec/restart_service_script.sh
-rwxrwxr-x 1 nagios nagios 901 Jul 16 13:45 /usr/local/nagios/libexec/restart_service_script.sh
Service check and template settings:
service-check-settings.png
service-template.png
You do not have the required permissions to view the files attached to this post.
Locked