Page 2 of 5

Re: NRPE: Automatic restart of multiple services

Posted: Fri Jul 10, 2015 9:33 am
by mhixson2
ssax wrote:Please post the full check (and the output) that you are running initially to check the services so I can see if it has what we need.
service/check definition:

Code: Select all

define service {
        host_name                       [hostname]
        service_description             #TEST restart dead service
        use                             #standard-service
        check_command                   check_nrpe!check_service!service='Citrix Encryption Service' service=CitrixCseEngine service=IMAService service=MFCom service=cpsvc service=IMAAdvanceSrv service=RadeHlprSvc service=RadeSvc!'crit=not state_is_ok()'!!!!!
        event_handler                   restart_service
        event_handler_enabled           1
        _SERVICE                        "Citrix Encryption Service",CitrixCseEngine,IMAService,MFCom,cpsvc,IMAAdvanceSrv,RadeHlprSvc,RadeSvc
        register                        1
        }
command run to test check:

Code: Select all

./check_nrpe -H [hostname] -t 30 -c check_service -a service='Citrix Encryption Service' service=CitrixCseEngine service=IMAService service=MFCom service=cpsvc service=IMAAdvanceSrv service=RadeHlprSvc service=RadeSvc 'crit=not state_is_ok()'
output:

Code: Select all

OK: All 8 service(s) are ok.|
Let me know if you need more.
Thanks!

Re: NRPE: Automatic restart of multiple services

Posted: Fri Jul 10, 2015 12:45 pm
by ssax
What is the output when one of the services is stopped? What about this command?

You could try with the spooler service if you want something to test

Code: Select all

./check_nrpe -H [hostname] -t 30 -c check_service -a 'Citrix Encryption Service' CitrixCseEngine IMAService MFCom cpsvc IMAAdvanceSrv RadeHlprSvc

Re: NRPE: Automatic restart of multiple services

Posted: Mon Jul 13, 2015 9:48 am
by mhixson2
ssax wrote:What is the output when one of the services is stopped?
./check_nrpe -H [hostname] -t 30 -c check_service -a service=Spooler 'crit=not state_is_ok()' returns: CRITICAL: Spooler=stopped (auto), delayed ()|
What about this command?

Code: Select all

./check_nrpe -H [hostname] -t 30 -c check_service -a 'Citrix Encryption Service' CitrixCseEngine IMAService MFCom cpsvc IMAAdvanceSrv RadeHlprSvc
returned: CHECK_NRPE: Invalid packet type received from server.

Thanks

Re: NRPE: Automatic restart of multiple services

Posted: Mon Jul 13, 2015 1:34 pm
by ssax
What client (and version) are you using the remote host?

Re: NRPE: Automatic restart of multiple services

Posted: Mon Jul 13, 2015 1:42 pm
by mhixson2
ssax wrote:What client (and version) are you using the remote host?
It's a Windows host using NSClient++ (NSCP-0.4.3.143-x64.msi).

Re: NRPE: Automatic restart of multiple services

Posted: Mon Jul 13, 2015 5:08 pm
by ssax
Try this, it doesn't even use the _SERVICE variable, it works on the check_nrpe output.

Change your command to:

Code: Select all

$USER1$/servicerestart.sh $SERVICESTATE$ $HOSTADDRESS$ '$SERVICEOUTPUT$'
Then edit your servicerestart.sh script and change it to:

Code: Select all

#!/bin/sh
# Event Handler for Restarting Windows Services
case "$1" in
        OK)
                ;;
        WARNING)
                ;;
        UNKNOWN)
                ;;
        CRITICAL)
                crittext='CRITICAL: '
                autodeltext=' (auto), delayed ()'
                autotext=' (auto)'
                stoppedtext='=stopped'
                stripped=${3//$crittext/}
                stripped=${stripped//$autodeltext/}
                stripped=${stripped//$autotext/}
                stripped=${stripped//$stoppedtext/}
                stripped=${stripped//, /,}
                IFS=',' read -a array <<< "$stripped"
                pattern='\ '
                for ((i=0; i<${#array[@]}; i++));
                do
                        if [[ ${array[$i]} =~ $pattern ]]; then
                                array[$i]="'${array[$i]}'"
                        fi
                done
                services=$(IFS=,; echo "${array[*]}")
                /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$services"
        ;;
esac

exit 0
Then modify your windows bat file and change it to:

Code: Select all

@echo off
SET SERVICES=%1
SET SERVICESM=%SERVICES:'=%
:: Loop through services and restart them
:LOOP
 FOR /F "tokens=1,* delims=," %%F IN (%SERVICESM%) DO (
net stop "%%F"
net start "%%F"
SET SERVICESM="%%G"
GOTO LOOP
)

@exit 0
Let me know if that works for you.

Re: NRPE: Automatic restart of multiple services

Posted: Tue Jul 14, 2015 10:31 am
by mhixson2
Hmm.. no, not working at all.

I changed the name of the shell script on my end a while ago, so I updated it in the command to $USER1$/restart_service.sh $SERVICESTATE$ $HOSTADDRESS$ $SERVICEOUTPUT$ to reflect the name change.

The command name is restart_service as well, so I changed runcmd in the shell script you provided to restart_service.

I also simplified things for this test again, so the service is only monitoring Spooler. When stopped, the service does not auto-restart.

However, when running ./check_nrpe -H [hostname] -p 5666 -c restart_service -a Spooler manually on the Nagios server, the service restarts successfully:

Code: Select all

The Print Spooler service is stopping.
The Print Spooler service was stopped successfully.

The Print Spooler service is starting.
The Print Spooler service was started successfully.|
Thanks!

Re: NRPE: Automatic restart of multiple services

Posted: Wed Jul 15, 2015 9:46 am
by jdalrymple
Looks like the script isn't accounting for the pipe for perfdata. Try this:

Code: Select all

#!/bin/sh
# Event Handler for Restarting Windows Services
    case "$1" in
            OK)
                    ;;
            WARNING)
                    ;;
            UNKNOWN)
                    ;;
            CRITICAL)
                    crittext='CRITICAL: '
                    autodeltext=' (auto), delayed ()'
                    autotext=' (auto)'
                    stoppedtext='=stopped'
                    perfpipetext='|'
                    stripped=${3//$crittext/}
                    stripped=${stripped//$autodeltext/}
                    stripped=${stripped//$autotext/}
                    stripped=${stripped//$stoppedtext/}
                    stripped=${stripped//$perfpipetext/}
                    stripped=${stripped//, /,}
                    IFS=',' read -a array <<< "$stripped"
                    pattern='\ '
                    for ((i=0; i<${#array[@]}; i++));
                    do
                            if [[ ${array[$i]} =~ $pattern ]]; then
                                    array[$i]="'${array[$i]}'"
                            fi
                    done
                    services=$(IFS=,; echo "${array[*]}")
                    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c runcmd -a "$services"
            ;;
    esac

    exit 0
Also, try it with multiple stopped services, I have a feeling....

Re: NRPE: Automatic restart of multiple services

Posted: Wed Jul 15, 2015 3:18 pm
by mhixson2
Unfortunately it's still not working. Here are the details of my config in case I missed something along the way.

Service:

Code: Select all

 host_name    [hostname]
 service_description  #TEST restart dead service
 use    #standard-service
 check_command   check_nrpe!check_service!service='Citrix Encryption Service' service=CitrixCseEngine service=IMAService service=MFCom service=cpsvc service=IMAAdvanceSrv service=RadeHlprSvc service=RadeSvc!'crit=not state_is_ok()'!!!!!
 event_handler   restart_service
 event_handler_enabled  1
 _SERVICE    "Citrix Encryption Service",CitrixCseEngine,IMAService,MFCom,cpsvc,IMAAdvanceSrv,RadeHlprSvc,RadeSvc
 register    1
Batch: (location on host Windows server: C:\Program Files\NSClient++\scripts)

Code: Select all

@echo off
SET SERVICES=%1
SET SERVICESM=%SERVICES:'=%
:: Loop through services and restart them
:LOOP
FOR /F "tokens=1,* delims=," %%F IN (%SERVICESM%) DO (
net stop "%%F"
net start "%%F"
SET SERVICESM="%%G"
GOTO LOOP
)

@exit 0
Shell script (location on Nagios server: /usr/local/nagios/libexec)

Code: Select all

#!/bin/sh
# Event Handler for Restarting Windows Services
    case "$1" in
            OK)
                    ;;
            WARNING)
                    ;;
            UNKNOWN)
                    ;;
            CRITICAL)
                    crittext='CRITICAL: '
                    autodeltext=' (auto), delayed ()'
                    autotext=' (auto)'
                    stoppedtext='=stopped'
                    perfpipetext='|'
                    stripped=${3//$crittext/}
                    stripped=${stripped//$autodeltext/}
                    stripped=${stripped//$autotext/}
                    stripped=${stripped//$stoppedtext/}
                    stripped=${stripped//$perfpipetext/}
                    stripped=${stripped//, /,}
                    IFS=',' read -a array <<< "$stripped"
                    pattern='\ '
                    for ((i=0; i<${#array[@]}; i++));
                    do
                            if [[ ${array[$i]} =~ $pattern ]]; then
                                    array[$i]="'${array[$i]}'"
                            fi
                    done
                    services=$(IFS=,; echo "${array[*]}")
                    /usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a "$services"
            ;;
    esac

    exit 0
nsclient.ini

Code: Select all

[/settings/default]
;A comma separated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
	allowed hosts=[Nagios  server IP]

[/modules]
;Various system related checks, such as CPU load, process state, service state memory usage and PDH counters.
	CheckSystem=1
;Various file and disk related things.
	CheckDisk=1
;Listens for incoming NRPE connection and processes incoming requests.
	NRPEServer=1
;Execute external scripts
	CheckExternalScripts=enabled

[/settings/NRPE/server]
	allowed ciphers=ADH
;Allow characters in command definitions
	allow nasty characters=1
;Allow -a arguments in command definitions
	allow arguments=1

[/settings/external scripts]
;Allow arguments to be passed to external scripts
	allow arguments=1

[/settings/external scripts/scripts]
;Reboot machine event handler
	reboot_machine=scripts\reboot_machine.ps1
;Restart service event handler
	restart_service=scripts\restart_service.bat "$ARG1$"

[/settings/log]
;Enable debug level logging
	;file name = nsclient.log
	;level = debug
Additionally, this test is successful. So would that point to the event handler not doing what it's supposed to if the check and restart_service script are working properly?

Code: Select all

./check_nrpe -H [hostname] -p 5666 -t 30 -c restart_service -a MFCom,IMAService,'Citrix Encryption Service'
The Citrix MFCOM Service service is stopping.
The Citrix MFCOM Service service was stopped successfully.

The Citrix MFCOM Service service is starting.
The Citrix MFCOM Service service was started successfully.

The Citrix Independent Management Architecture service is stopping.
The Citrix Independent Management Architecture service was stopped successfully.

The Citrix Independent Management Architecture service is starting.
The Citrix Independent Management Architecture service was started successfully.

The Citrix Encryption Service service is stopping.
The Citrix Encryption Service service was stopped successfully.

The Citrix Encryption Service service is starting.
The Citrix Encryption Service service was started successfully.|

Re: NRPE: Automatic restart of multiple services

Posted: Wed Jul 15, 2015 3:25 pm
by jdalrymple
Did you try it with only 1 failed service for sure? We have to crawl before we can walk.