Page 1 of 1

Not working: Restart windows service with NRPE

Posted: Wed Jun 24, 2015 10:42 pm
by mhixson
[I have requested access to the Customer Support forum previously and do not yet have access.]

System profile attached. Nagios installed manually on a minimal CentOS 6 VM. Everything is vanilla.

I am trying to follow the guide here here for automatically restarting a windows service via NRPE when it enters a critical state, but I cannot get it to work.

1. The first point of confusion is the "Test The Command From The Nagios Server" step on page 2. I don't understand how this is supposed to work at this point in the guide. No commands have been defined yet in Nagios. What am I missing?

2. My windows servers already have NSClient++ installed and are configured to only use NRPE checks (nsclient.ini contains NRPEServer=1, no NSClientServer entry). Several checks are already up and running on each. The guide has you set up a host with the windows server config wizard and add the monitored service (spooler) in the process. That will use check_nt by default, which requires NSClientServer=1 in the nsclient.ini config, which I don't have, as I would prefer to use NRPE. Does this require the use of NSClientServer? Here's my current nsclient.ini. There are probably some unnecessary entries that I've added while troubleshooting, please advise on necessary edits.

Code: Select all

[/settings/default]
allowed hosts=<nagios server IP>

[/modules]
CheckSystem=1
CheckDisk=1
NRPEServer=1
CheckExternalScripts=enabled

[/settings/NRPE/server]
allowed ciphers=ADH
allow nasty characters=1
allow arguments=1

[/settings/external scripts]
allow arguments=1

[/settings/external scripts/scripts]
restart_service=scripts\restart_service.bat “$ARG1$”
allow_arguments=1
3. My batch file is named restart_service.bat, and my command in Nagios is named restart_service. Everything else is named and configured exactly matching the guide. When I execute this command on the Nagios server:

Code: Select all

./check_nrpe -H <hostname or IP> -p 5666 -c restart_service -a spooler
It returns:

Code: Select all

The service name is invalid.

More help is available by typing NET HELPMSG 2185.

The service name is invalid.

More help is available by typing NET HELPMSG 2185.|
When I execute:

Code: Select all

./servicerestart.sh CRITICAL <hostname or IP> spooler
It returns the same "service name is invalid" error above.

Needless to say, the service does not restart automatically once it's stopped.
I'm pretty new to Nagios so I hope I'm overlooking something really simple. I've been banging my head against this one for a while. Let me know if further info is needed.

Thanks!

Re: Not working: Restart windows service with NRPE

Posted: Thu Jun 25, 2015 9:28 am
by jdalrymple
I suspect there is an arguments issue - what that issue is isn't apparent right at the moment. I assume your batch file looks exactly like this:

Code: Select all

@echo off
net stop %1
net start %1
@exit 0
Let's change it to look like this just for debugging purposes:

Code: Select all

@echo off
net stop spooler
net start spooler
@exit 0
Also it may be useful to add this to nsclient.ini and restart nscp:

Code: Select all

[/settings/log]
file name = nsclient.log
level = debug

Re: Not working: Restart windows service with NRPE

Posted: Thu Jun 25, 2015 10:27 am
by mhixson
Excellent! Now when I run either of these commands, the service is bounced. And the event handler is restarting the stopped service automatically as it should.

Code: Select all

./check_nrpe -H <hostname or IP> -p 5666 -c restart_service -a spooler

Code: Select all

./servicerestart.sh CRITICAL <hostname or IP> spooler

Code: Select all

The Print Spooler service is stopping.
The Print Spooler service was stopped successfully.
The Print Spooler service is starting.
The Print Spooler service was started successfully.|
That is with the batch file specifying the spooler service as you noted. When I put it back to the %1 variable, it fails with that same "The service name is invalid." error.

What about my "allow arguments=1" options in my nsclient.ini file? I think I have it in three different places. Which should be necessary for this to work?

Thanks

Re: Not working: Restart windows service with NRPE

Posted: Thu Jun 25, 2015 10:35 am
by mhixson
Here's the entry in nsclient.log when it fails while having the %1 in the batch file:

Code: Select all

2015-06-25 11:32:28: debug:D:\source\nscp\modules\CheckExternalScripts\CheckExternalScripts.cpp:459: Command line: scripts\restart_service.bat “spooler”

Re: Not working: Restart windows service with NRPE

Posted: Thu Jun 25, 2015 10:55 am
by tgriep
Can you go to the Core Config Manager and edit that service.
Click on the Misc Settings Tab, click on the Manage Variable Definitions button.
Make sure the Variable Name and Variable definition are defined like below.

Code: Select all

Variable Name 	 Variable Definition
_SERVICE	         spooler

Re: Not working: Restart windows service with NRPE

Posted: Thu Jun 25, 2015 11:30 am
by mhixson
Confirmed. Here are screenshots of the service common and variable settings.
common.png
variable.png

Re: Not working: Restart windows service with NRPE

Posted: Thu Jun 25, 2015 11:48 am
by ssax
The quotes may be causing a problem.

Can you try changing it from:

Code: Select all

restart_service=scripts\restart_service.bat “$ARG1$”
To:

Code: Select all

restart_service=scripts\restart_service.bat "$ARG1$"
Notice the difference in the quotation marks.

Re: Not working: Restart windows service with NRPE

Posted: Thu Jun 25, 2015 12:04 pm
by mhixson
Nailed it! All is working. Thanks Sax!!