Adjust NRPE Wait times

magna.vis · Post by **magna.vis** » Fri Nov 13, 2015 5:43 pm

TL;DR: Is there a way to adjust the amount of time NRPE waits for the return of a PowerShell script? I fully expect this to take hours.

Long Version: I have Veeam Backup and Recovery doing the backups of my Hyper-V VMs. The backup is scripted in PowerShell, and has some neat features (like cleanup, emailing me the status, doing backups of multiple VMs, etc). Because it runs in PS, I wanted to have Nagios trigger the backup using NRPE, and wait for the status that the script will eventually return. That way, I can have Nagios generate an email to me in the event of a failure rather than having to check the backups daily. The issue I'm having is that it will undoubtedly take at least 1.5 hours every night (possibly up to two). Really, NRPE will have to wait that long for a status returned from the server. Is this possible?

When I run a test PS Script, I get this:

Code: Select all

root@CAC-Nagios:/usr/local/nagios/libexec# ./check_nrpe -H 10.1.1.18
I (0.4.3.143 2015-04-29) seem to be doing fine...
root@CAC-Nagios:/usr/local/nagios/libexec# ./check_nrpe -H 10.1.1.18 -c backup_gabriel
SUCCESS: Everything is going to be fine!

And when I point that script at the actual backup, it gives a timeout error after 10 seconds. The backup, however, does continue on to completion, but then Nagios reports a "Warn" state for the service because (well I assume it's because) it doesn't wait long enough to get the status of the script. I have done a few other things, that I think are fine, but may be good to sanity check. Here's the timeperiod def. for the backup so that it runs in a particular interval:

Code: Select all

# This is the window for triggering the VEEAM backup job on CAC-NAS
define timeperiod{
	timeperiod_name		server_backup
	alias				Window for backups
	sunday				23:00-01:30
	monday				23:00-01:30
	tuesday				23:00-01:30
	wednesday			23:00-01:30
	thursday			23:00-01:30
	friday				23:00-01:30
	saturday			23:00-01:30
	}

Here's the service template that leverages the time period, and ensures that it won't run twice in that window (if someone could sanity check this, I would really appreciate it. I didn't know exactly what else needed to be changed):

Code: Select all

define service{
		name							backup-service		; The 'name' of this service template
		active_checks_enabled			1					; Active service checks are enabled
		passive_checks_enabled			0					; Passive service checks are enabled/accepted
		parallelize_check				1					; Active service checks should be parallelized (disabling this can lead to major performance problems)
		obsess_over_service				0					; We should obsess over this service (if necessary)
		check_freshness					0					; Default is to NOT check service 'freshness'
		notifications_enabled			1					; Service notifications are enabled
		event_handler_enabled			1       			; Service event handler is enabled
		flap_detection_enabled			0					; Flap detection is enabled
		process_perf_data				1					; Process performance data
		retain_status_information		1					; Retain status information across program restarts
		retain_nonstatus_information	1					; Retain non-status information across program restarts
		is_volatile						0					; The service is not volatile
		check_period					server_backup		; The service can be checked at any time of the day
		max_check_attempts				1					; Re-check the service up to 1 times in order to determine its final (hard) state
		normal_check_interval			360					; Check the service every 6 hours under normal conditions, this puts it well outside of it's run window
		retry_check_interval			180					; Re-check the service every two minutes until a hard state can be determined
		contact_groups					admins				; Notifications get sent out to everyone in the 'admins' group
		notification_options			w,u,c,r				; Send notifications about warning, unknown, critical, and recovery events
		notification_interval			1560				; Re-notify about service problems every day and 2 hours, prevents repeat notifications
		notification_period				24x7				; Notifications can be sent out at any time
		failure_prediction_enabled		0					; We will wait for it to actually fail thank you!!
		register						0					; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
		}

And here's the service def. for the actual check in the windows.cfg file:

Code: Select all

define service{
    use					backup-service
    host_name			CAC-NAS				
    service_description	Veeam Backup		
    check_command		check_nrpe!backup_gabriel
    }

All of these check out, and Nagios will run the script just fine when I have it pointing to the test script instead of to the actual backup script, so it appears like everything else is fine. Thank you for taking the time to look it over!

jrdalrymple · Post by **jrdalrymple** » Sat Nov 14, 2015 11:12 am

Your exact situation has been discussed on the forums before.

1) Take a look at this plugin, it looks backwards in time instead of looking at current time:
https://exchange.nagios.org/directory/P ... ps/details

2) In the other thread (probably in the customer forums) I recall the right solution to your problem being to modify the plugin to be used in a passive check. In this case it sounds like your powershell must have a start-sleep and a loop. Maybe just put a passive "this VM backed up successfully" nsca send in there. I think that's how I'd do it.

Post by **Box293** » Mon Nov 16, 2015 12:26 am

Theoretically it's possible. You need to increase the timeout in nagios.cfg to allow a plugin to run that long (globally).

You also need to use the -t argument with the NRPE command to define the timeout (by default it is 10 seconds).

You are probably better off looking at adjusting your script so it sends the check result back to nagios as a passive check and defining a freshness threshold. You could do this with NRPD and NCPA.

magna.vis · Post by **magna.vis** » Mon Nov 16, 2015 4:44 pm

Thank you both for your replies. I did search the forum before posting but it's quite saturated with NRPE stuff, and I didn't know exactly what I was looking for to be honest. Thanks for the link.

I'll post back eventually with what I've done. It may be easier to write the script to create an output file, then parse the output file several hours later for a status rather than involve other plugins than NRPE.

hsmith · Post by **hsmith** » Mon Nov 16, 2015 4:56 pm

Let us know what you come up with, or what we can do to help you.

Thanks!

Nagios Support Forum

Adjust NRPE Wait times

Adjust NRPE Wait times

Re: Adjust NRPE Wait times

Re: Adjust NRPE Wait times

Re: Adjust NRPE Wait times

Re: Adjust NRPE Wait times