Page 3 of 5

Re: NRPE: Automatic restart of multiple services

Posted: Wed Jul 15, 2015 3:47 pm
by mhixson2
jdalrymple wrote:Did you try it with only 1 failed service for sure? We have to crawl before we can walk.
I did. And just did again to be sure. Changed the check service and variable definition to only worry about one service as shown below.

Code: Select all

 host_name    [hostname]
 service_description  #TEST restart dead service
 use    #standard-service
 check_command   check_nrpe!check_service!service=mfcom!'crit=not state_is_ok()'!!!!!
 event_handler   restart_service
 event_handler_enabled  1
 _SERVICE    mfcom
 register    1
Stopped service on the host, confirmed Nagios picked up that it's stopped, no auto-restart through several forced immediate checks via the UI.

The script works when restarting the service alone via command line.

Code: Select all

./check_nrpe -H [hostname] -p 5666 -t 30 -c restart_service -a mfcom
The Citrix MFCOM Service service is stopping.
The Citrix MFCOM Service service was stopped successfully.

The Citrix MFCOM Service service is starting.
The Citrix MFCOM Service service was started successfully.|

Re: NRPE: Automatic restart of multiple services

Posted: Wed Jul 15, 2015 3:58 pm
by jdalrymple
debugging time - sorry to make you a Guinea Pig, you're already set up for it though.

Change your code to this:

Code: Select all

    #!/bin/sh
    # Event Handler for Restarting Windows Services
        case "$1" in
                OK)
                        ;;
                WARNING)
                        ;;
                UNKNOWN)
                        ;;
                CRITICAL)
                        crittext='CRITICAL: '
                        autodeltext=' (auto), delayed ()'
                        autotext=' (auto)'
                        stoppedtext='=stopped'
                        perfpipetext='|'
                        stripped=${3//$crittext/}
                        stripped=${stripped//$autodeltext/}
                        stripped=${stripped//$autotext/}
                        stripped=${stripped//$stoppedtext/}
                        stripped=${stripped//$perfpipetext/}
                        stripped=${stripped//, /,}
                        IFS=',' read -a array <<< "$stripped"
                        pattern='\ '
                        for ((i=0; i<${#array[@]}; i++));
                        do
                                if [[ ${array[$i]} =~ $pattern ]]; then
                                        array[$i]="'${array[$i]}'"
                                fi
                        done
                        services=$(IFS=,; echo "${array[*]}")
                        echo "/usr/local/nagios/libexec/check_nrpe -H \"$2\" -p 5666 -c restart_service -a \"$services\"" >> /tmp/foo
                ;;
        esac

        exit 0
Then share with us the output of /tmp/foo

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 9:46 am
by mhixson2
Hmm... file 'foo' is not being created. I tried changing the target directory from /tmp (owned by root) to /home/nagios/tmp, but that made no change. I also tried running the check command calling restart_service (ran successfully) and the script directly (./restart_service.sh) and in no case was file 'foo' created.

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 10:12 am
by jdalrymple
That puts me at something of a loss... I copied/pasted:

Code: Select all

[jdalrymple@localhost ~]$ ./foodbar.sh CRITICAL 127.0.0.1 "CRITICAL: Spooler=stopped (auto), delayed ()|"
[jdalrymple@localhost ~]$ cat /tmp/foo
/usr/local/nagios/libexec/check_nrpe -H "127.0.0.1" -p 5666 -c restart_service -a "Spooler"
I'm not sure what's missing. Using whatever user you're logged on as just try `touch /tmp/file` and see for sure if it's a permissions thing. If so there is no reason (for debugging purposes) this script couldn't be run as root, it's only writing a file.

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 10:29 am
by mhixson2
Ok, thanks.
I will dig into this further this afternoon. I'll report back.

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 11:58 am
by mhixson2
Ok, here are some results when manually running the script via command line like you did. My syntax was all wrong when I tried to do it.

Code: Select all

$ ./restart_service.sh CRITICAL [hostname] "CRITICAL: mfcom=stopped (auto), delayed ()|"
$ cat /tmp/foo
/usr/local/nagios/libexec/check_nrpe -H "[hostname]" -p 5666 -c restart_service -a "mfcom"
So the event handler isn't working? I've confirmed the event handler is specified in the check settings and 'event handler enabled' is on.

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 12:20 pm
by ssax
Sorry, the original code is proper but my command was wrong (I updated the original post), change your command to this and it should work.

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ '$SERVICEOUTPUT$'

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 1:06 pm
by mhixson2
ssax wrote:Sorry, the original code is proper but my command was wrong (I updated the original post), change your command to this and it should work.

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ '$SERVICEOUTPUT$'
Awesome, thanks. I've made the change.

As of now, it's still not working on its own, though every component seems to work individually. I found that I had named a few of the items in this process the same, so I've named each (event handler command, batch file, shell script) uniquely and can confirm that on every check with a critical status, the file 'foo' is being populated with a working command entry:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H "[host IP]" -p 5666 -c restart_service_batch -a "mfcom"
Which, when run on it's own, works. But it still doesn't fire off automatically. Any thoughts on that?

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 2:35 pm
by ssax
Let's check the permissions:

Code: Select all

ls -l /usr/local/nagios/libexec/restart_service.sh
Also, make sure that you have the event handler enabled and that you have the proper event handler selected (see the image below).
event_handler.png

Re: NRPE: Automatic restart of multiple services

Posted: Thu Jul 16, 2015 2:45 pm
by mhixson2
Sure.

Permissions:

Code: Select all

$ ls -l /usr/local/nagios/libexec/restart_service_script.sh
-rwxrwxr-x 1 nagios nagios 901 Jul 16 13:45 /usr/local/nagios/libexec/restart_service_script.sh
Service check and template settings:
service-check-settings.png
service-template.png