Page 1 of 2

Passive Check Freshness Not Working

Posted: Tue Nov 10, 2015 1:42 pm
by jeremie.grund
We are running Nagios Core 3.2.3 on openSUSE 11.4. We recently started using passive checks in Nagios with the results being sent through NRDP.

My passive check template is defined as

Code: Select all

define service{
        name                            passive-service 	; The 'name' of this service template
        active_checks_enabled           0       		; Active service checks are enabled
        passive_checks_enabled          1    		   	; Passive service checks are enabled/accepted
        parallelize_check               1       		; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       		; We should obsess over this service (if necessary)
        check_freshness                 1       		; Default is to NOT check service 'freshness'
        notifications_enabled           1       		; Service notifications are enabled
        event_handler_enabled           1       		; Service event handler is enabled
        flap_detection_enabled          1       		; Flap detection is enabled
        failure_prediction_enabled      1       		; Failure prediction is enabled
        process_perf_data               1       		; Process performance data
        retain_status_information       1       		; Retain status information across program restarts
        retain_nonstatus_information    1       		; Retain non-status information across program restarts
        is_volatile                     0       		; The service is not volatile
        check_period                    24x7			; The service can be checked at any time of the day
        max_check_attempts              3			; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10			; Check the service every 10 minutes under normal conditions
        retry_check_interval            2			; Re-check the service every two minutes until a hard state can be determined
#        contact_groups                  admins			; Notifications get sent out to everyone in the 'admins' group
	notification_options		w,u,c,r			; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60			; Re-notify about service problems every hour
        notification_period             24x7			; Notifications can be sent out at any time
	check_command			check_dummy!2!"Service has not checked in"
         register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
And my check is defined as

Code: Select all

define service{
	use			passive-service
	host_name		PSVMDB07
	service_description	DYNCUSTOM Mirror
	freshness_threshold	600
	notification_period	24x7_except_sql_maint
	}
The check results are submitted via a SQL Agent Job that runs every minute. My intention was that if nothing has been heard within 10 minutes then check dummy would set an alert.

This was working fine until this weekend, the database server was patched and now even though the agent job runs every minute Nagios doesn't seem to be waiting the 10 minutes before saying that the service hasn't checked in.

Looking at the alert history it shows

Service Ok[11-10-2015 13:37:06] SERVICE ALERT: PSVMDB07;DYNCUSTOM Mirror;OK;SOFT;2;OK: Mirror OK
Service Critical[11-10-2015 13:36:56] SERVICE ALERT: PSVMDB07;DYNCUSTOM Mirror;CRITICAL;SOFT;1;CRITICAL: Service has not checked in
Service Ok[11-10-2015 13:35:06] SERVICE ALERT: PSVMDB07;DYNCUSTOM Mirror;OK;SOFT;2;OK: Mirror OK

Any thoughts as to why nagios isn't following the freshness_threshold?

Thanks,

Jeremie

Re: Passive Check Freshness Not Working

Posted: Tue Nov 10, 2015 2:09 pm
by scottwilkerson
Do both of these machines have the correct date/time and timezone?

Re: Passive Check Freshness Not Working

Posted: Tue Nov 10, 2015 2:23 pm
by jeremie.grund
scottwilkerson wrote:Do both of these machines have the correct date/time and timezone?
I was really hoping it would be something simple with DST but no, both the nagios server and the server submitting the passive check have the same time.

Re: Passive Check Freshness Not Working

Posted: Tue Nov 10, 2015 3:22 pm
by rkennedy
Can you verify the global option is on in your nagios.cfg with the below command, and post the output? -

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg|grep freshness

Re: Passive Check Freshness Not Working

Posted: Tue Nov 10, 2015 3:42 pm
by jeremie.grund
rkennedy wrote:Can you verify the global option is on in your nagios.cfg with the below command, and post the output? -

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg|grep freshness

# check the "freshness" of service results. Enabling this option
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
check_service_freshness=1
# check the "freshness" of service check results. If you have
# disabled service freshness checking, this option has no effect.
service_freshness_check_interval=60
# check the "freshness" of host results. Enabling this option
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
check_host_freshness=0
# check the "freshness" of host check results. If you have
# disabled host freshness checking, this option has no effect.
host_freshness_check_interval=60
# will add to any host and service freshness thresholds that
additional_freshness_latency=15


So if I understand correctly, service_freshness_check_interval is just how often nagios will check whether the service is stale, but the definition of how nagios would check if the service is stale is based on the freshness_threshold compared to the last passive check result? If nagios is checking every 60 seconds, and the passive check result is submitted every 60 seconds, but the threshold is 600 wouldn't nagios have to not get any type of check result back for 10 minutes before running the service check_command?

Re: Passive Check Freshness Not Working

Posted: Tue Nov 10, 2015 5:53 pm
by rkennedy
Taken from https://assets.nagios.com/downloads/nag ... hness.html - it looks like you don't have a check enabled when it is detected as stale.
"Configure the check_command option in your host or service definitions to reflect a valid command that should be used to actively check the host or service when it is detected as stale."
The part you'll be looking to add is -

Code: Select all

check_command	 no-backup-report	; this command is run only if the service results are "stale"

Re: Passive Check Freshness Not Working

Posted: Wed Nov 11, 2015 11:07 am
by jeremie.grund
I have a check_command in the passive-service template that calls check_dummy to set a critical alert.

Re: Passive Check Freshness Not Working

Posted: Wed Nov 11, 2015 12:02 pm
by jdalrymple
To clear some things up, the alert doesn't come from the freshness threshold at all, it comes from the active check. I think you probably already understood that, the thing we need to figure out is why the active check is occurring.

Can you check objects.cache to make sure active checks are truly disabled for that service?

The other thing to consider is Predictive Dependency Checks. I don't recall how much these honor active_checks_enabled 0. Let's start with objects.cache though.

Re: Passive Check Freshness Not Working

Posted: Wed Nov 11, 2015 12:46 pm
by jeremie.grund
Do you have a url on how to check the objects.cache? This is something I haven't done yet.

Thanks

Re: Passive Check Freshness Not Working

Posted: Wed Nov 11, 2015 1:42 pm
by jdalrymple
objects.cache is a file, usually at /usr/local/nagios/var/, however your environment may differ. It's location is explicitly defined at

Code: Select all

[jdalrymple@jrd-cent66-2 ~]$ grep object_cache_file /usr/local/nagios/etc/nagios.cfg
object_cache_file=/usr/local/nagios/var/objects.cache
Once you know the location of the objects.cache file try this horrible command on it:

Code: Select all

grep -Pzo "(?s)^define\s?(host|service)?\s?{.*?(DYNCUSTOM Mirror).*?}" /usr/local/nagios/var/objects.cache | tac | grep -Pzo "(?s)^.?}.*?(DYNCUSTOM Mirror).*?\{" | tac
Replace "/usr/local/nagios/var/objects.cache" with your path.