monitor device on internet?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: monitor device on internet?

Post by rkennedy »

Glad to see you got this working!

Do you mind if I mark this thread out as resolved, and close it out?
Former Nagios Employee
jriker1
Posts: 115
Joined: Tue Dec 15, 2015 8:40 pm

Re: monitor device on internet?

Post by jriker1 »

So if I wanted the server to reboot on failure to access the website, BUT, didn't want it to be reactionary and just do it the first one or two tries, how would I say if this fails over like 20 minutes time and doesn't return successful then run the reboot batch. Hate to have it triggering a reboot because of a minor hiccup in the network or something.

Thanks.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: monitor device on internet?

Post by rkennedy »

You should be able to use event handlers, see this link - https://assets.nagios.com/downloads/nag ... dlers.html

Keep in mind, they will trigger with every single state change (up -> down, down -> up), so you will want to build a script that detects what state it is in, and if critical send the reboot command to the windows command.
Former Nagios Employee
jriker1
Posts: 115
Joined: Tue Dec 15, 2015 8:40 pm

Re: monitor device on internet?

Post by jriker1 »

So if I set using that link's example:

max_check_attempts 4

It won't technically trigger the event handler until the 4th failure or no?

Thanks.

JR

EDIT: Never mind code in the article you sent explains things.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: monitor device on internet?

Post by rkennedy »

Glad to see you found your answer. Did you need any more assistance with this?
Former Nagios Employee
jriker1
Posts: 115
Joined: Tue Dec 15, 2015 8:40 pm

Re: monitor device on internet?

Post by jriker1 »

OK here's what I did. Let me know if it looks good or any issues that glare out. I am remote right now and don't want to actively test this and not be able to get to the server if something goes wrong so not restarting nagios:

On Windows server:
Downloaded PSShutdown so I have more robust reboot tools. Built in may work but used to using these.
Created a batch file called reboot.bat containing:

Code: Select all

"c:\Program Files (x86)\PSTools\PsShutdown.exe" -u administrator -p mypassword -r -t 0
Saved the file in c:\Program Files\NSClient++\scripts folder

Edited C:\Program Files\NSClient++\nsclient.ini and add the following to the end:

Code: Select all

[/settings/external scripts/scripts
reboot=scripts\reboot.bat
Restarted NSClient service

On Nagios Server:
Created reboot.sh file in /usr/local/nagios/libexec/eventhanders containing:

Code: Select all

#!/bin/sh
#
# Event handler script for restarting the server on the local machine
#
# Note: This script will only restart the web server if the service is
#       retried 3 times (in a "soft" state) or if the web service somehow
#       manages to fall into a "hard" error state.
#

# What state is the HTTP service in?
case "$1" in
OK)
	# The service just came back up, so don't do anything...
	;;
WARNING)
	# We don't really care about warning states, since the service is probably still running...
	;;
UNKNOWN)
	# We don't know what might be causing an unknown error, so don't do anything...
	;;
CRITICAL)
	# Aha!  The HTTP service appears to have a problem - perhaps we should restart the server...
	# Is this a "soft" or a "hard" state?
	case "$2" in
		
	# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
	# check before it turns into a "hard" state and contacts get notified...
	SOFT)
			
		# What check attempt are we on?  We don't want to restart the web server on the first
		# check, because it may just be a fluke!
		case "$3" in
				
		# Wait until the check has been tried 3 times before restarting the web server.
		# If the check fails on the 4th time (after we restart the web server), the state
		# type will turn to "hard" and contacts will be notified of the problem.
		# Hopefully this will restart the web server successfully, so the 4th check will
		# result in a "soft" recovery.  If that happens no one gets notified because we
		# fixed the problem!
		3)
			echo -n "Restarting HTTP service (3rd soft critical state)..."
			# Call the init script to restart the server
			/usr/local/nagios/libexec/check_nrpe -H "$4" -c reboot -a "$3"
			;;
			esac
		;;
				
	# The HTTP service somehow managed to turn into a hard error without getting fixed.
	# It should have been restarted by the code above, but for some reason it didn't.
	# Let's give it one last try, shall we?  
	# Note: Contacts have already been notified of a problem with the service at this
	# point (unless you disabled notifications for this service)
	HARD)
		echo -n "Restarting HTTP service..."
		# Call the init script to restart the HTTPD server
		/usr/local/nagios/libexec/check_nrpe -H "$4" -c reboot -a "$3"
		;;
	esac
	;;
esac
exit 0
Created command entry with:

Code: Select all

define command {
        command_name winsrv_restart_server
        command_line /usr/local/nagios/libexec/eventhandlers/reboot.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTNAME$
}
Created service entry in my windows cfg:

Code: Select all

define service {
        use                     generic-service
        service_description     Server_Restart
        host_name               homeserver
        max_check_attempts      4
        check_command           check_https!-H remote.iceonet.net -S
        event_handler           winsrv_restart_server
}
and restart nagios. Thoughts? Change something?

Thanks.

JR
Last edited by jriker1 on Mon Jan 18, 2016 10:16 am, edited 2 times in total.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: monitor device on internet?

Post by rkennedy »

This looks good to go. I'd say give it a try once you aren't working remote. One thing to note -
- Can you try executing the command from your Nagios machine, as the Nagios user, to check that permissions are proper / NSClient++ configuration is working?

Should you run into errors, let us know what happens and we'll work with you through it. Glad to see this coming together though!
Former Nagios Employee
jriker1
Posts: 115
Joined: Tue Dec 15, 2015 8:40 pm

Re: monitor device on internet?

Post by jriker1 »

Well so far no success. Not sure the right way to emulate the website not responding, however killed the world wide web service. After the first failure check went to WARNING status. After all four was still in WARNING state and then that was all they wrote. Nothing happened. Anything I'm missing or something I should be setting/doing? Also note after the fourth warning shows a HARD warning. Way the script states it looks like it's looking for a SOFT warning or hard failure. Neither which I think are happening. Thoughts?

Note working thru this by defining a -e parameter on check_http to force a particular code to be valid else critical. That said, if I manually do:

check_nrpe -H 192.168.0.1

i get

I (0.4.4.15 2015-11-25) seem to be doing fine...

If I do

/usr/local/nagios/libexec/check_nrpe -H 192.168.0.1 -c reboot

I get

CHECK_NRPE: Socket timeout after 10 seconds.

Also if I run the batch file manually to reboot on the Windows box it works, but only if I run it as administrator. Otherwise I get access denied. What is happening when check_nrpe would make a call to the batch file?

Thoughts?

Thanks.

JR
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: monitor device on internet?

Post by rkennedy »

This sounds like it could be related to UAC / permissions on the windows side.

Can you post your NSClient log file?
Former Nagios Employee
jriker1
Posts: 115
Joined: Tue Dec 15, 2015 8:40 pm

Re: monitor device on internet?

Post by jriker1 »

Will assume it's the nsclient.txt file. Here is everything in there:
2016-01-05 16:52:57: error:c:\source\nscp\modules\NRPEServer\NRPEServer.cpp:132: CA not found: C:\Program Files\NSClient++/security/ca.pem (generating a default CA)
2016-01-05 16:52:57: error:C:\source\build\x64\dist\modules\NRPEServer\module.cpp:34: Exception in Failed to load NRPEServer: : An invalid argument was supplied
2016-01-05 16:52:57: error:c:\source\nscp\service\NSClient++.cpp:739: Plugin refused to load: NRPEServer
2016-01-05 16:52:57: error:C:\source\build\x64\dist\modules\NSClientServer\module.cpp:39: Exception in Failed to load NSClientServer: : An invalid argument was supplied
2016-01-05 16:52:57: error:c:\source\nscp\service\NSClient++.cpp:739: Plugin refused to load: NSClientServer
2016-01-17 20:23:34: error:c:\source\nscp\include\socket/connection.hpp:149: Failed to send data: The file handle supplied is not valid
2016-01-17 20:23:55: error:c:\source\nscp\include\socket/connection.hpp:149: Failed to send data: The file handle supplied is not valid
2016-01-17 20:24:14: error:c:\source\nscp\include\socket/connection.hpp:149: Failed to send data: The file handle supplied is not valid
2016-01-17 20:25:08: error:c:\source\nscp\include\socket/connection.hpp:149: Failed to send data: The file handle supplied is not valid
2016-01-17 20:29:00: error:c:\source\nscp\include\socket/connection.hpp:149: Failed to send data: The file handle supplied is not valid
2016-01-17 20:29:10: error:c:\source\nscp\include\socket/connection.hpp:149: Failed to send data: The file handle supplied is not valid
2016-01-17 20:31:10: error:c:\source\nscp\include\socket/connection.hpp:149: Failed to send data: The file handle supplied is not valid
Thanks.

JR
Locked