Passive Check Freshness Not Working

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
jeremie.grund
Posts: 10
Joined: Mon Nov 09, 2015 1:41 pm

Passive Check Freshness Not Working

Post by jeremie.grund »

We are running Nagios Core 3.2.3 on openSUSE 11.4. We recently started using passive checks in Nagios with the results being sent through NRDP.

My passive check template is defined as

Code: Select all

define service{
        name                            passive-service 	; The 'name' of this service template
        active_checks_enabled           0       		; Active service checks are enabled
        passive_checks_enabled          1    		   	; Passive service checks are enabled/accepted
        parallelize_check               1       		; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       		; We should obsess over this service (if necessary)
        check_freshness                 1       		; Default is to NOT check service 'freshness'
        notifications_enabled           1       		; Service notifications are enabled
        event_handler_enabled           1       		; Service event handler is enabled
        flap_detection_enabled          1       		; Flap detection is enabled
        failure_prediction_enabled      1       		; Failure prediction is enabled
        process_perf_data               1       		; Process performance data
        retain_status_information       1       		; Retain status information across program restarts
        retain_nonstatus_information    1       		; Retain non-status information across program restarts
        is_volatile                     0       		; The service is not volatile
        check_period                    24x7			; The service can be checked at any time of the day
        max_check_attempts              3			; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10			; Check the service every 10 minutes under normal conditions
        retry_check_interval            2			; Re-check the service every two minutes until a hard state can be determined
#        contact_groups                  admins			; Notifications get sent out to everyone in the 'admins' group
	notification_options		w,u,c,r			; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60			; Re-notify about service problems every hour
        notification_period             24x7			; Notifications can be sent out at any time
	check_command			check_dummy!2!"Service has not checked in"
         register                        0       		; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
And my check is defined as

Code: Select all

define service{
	use			passive-service
	host_name		PSVMDB07
	service_description	DYNCUSTOM Mirror
	freshness_threshold	600
	notification_period	24x7_except_sql_maint
	}
The check results are submitted via a SQL Agent Job that runs every minute. My intention was that if nothing has been heard within 10 minutes then check dummy would set an alert.

This was working fine until this weekend, the database server was patched and now even though the agent job runs every minute Nagios doesn't seem to be waiting the 10 minutes before saying that the service hasn't checked in.

Looking at the alert history it shows

Service Ok[11-10-2015 13:37:06] SERVICE ALERT: PSVMDB07;DYNCUSTOM Mirror;OK;SOFT;2;OK: Mirror OK
Service Critical[11-10-2015 13:36:56] SERVICE ALERT: PSVMDB07;DYNCUSTOM Mirror;CRITICAL;SOFT;1;CRITICAL: Service has not checked in
Service Ok[11-10-2015 13:35:06] SERVICE ALERT: PSVMDB07;DYNCUSTOM Mirror;OK;SOFT;2;OK: Mirror OK

Any thoughts as to why nagios isn't following the freshness_threshold?

Thanks,

Jeremie
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Passive Check Freshness Not Working

Post by scottwilkerson »

Do both of these machines have the correct date/time and timezone?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
jeremie.grund
Posts: 10
Joined: Mon Nov 09, 2015 1:41 pm

Re: Passive Check Freshness Not Working

Post by jeremie.grund »

scottwilkerson wrote:Do both of these machines have the correct date/time and timezone?
I was really hoping it would be something simple with DST but no, both the nagios server and the server submitting the passive check have the same time.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Passive Check Freshness Not Working

Post by rkennedy »

Can you verify the global option is on in your nagios.cfg with the below command, and post the output? -

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg|grep freshness
Former Nagios Employee
jeremie.grund
Posts: 10
Joined: Mon Nov 09, 2015 1:41 pm

Re: Passive Check Freshness Not Working

Post by jeremie.grund »

rkennedy wrote:Can you verify the global option is on in your nagios.cfg with the below command, and post the output? -

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg|grep freshness

# check the "freshness" of service results. Enabling this option
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
check_service_freshness=1
# check the "freshness" of service check results. If you have
# disabled service freshness checking, this option has no effect.
service_freshness_check_interval=60
# check the "freshness" of host results. Enabling this option
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
check_host_freshness=0
# check the "freshness" of host check results. If you have
# disabled host freshness checking, this option has no effect.
host_freshness_check_interval=60
# will add to any host and service freshness thresholds that
additional_freshness_latency=15


So if I understand correctly, service_freshness_check_interval is just how often nagios will check whether the service is stale, but the definition of how nagios would check if the service is stale is based on the freshness_threshold compared to the last passive check result? If nagios is checking every 60 seconds, and the passive check result is submitted every 60 seconds, but the threshold is 600 wouldn't nagios have to not get any type of check result back for 10 minutes before running the service check_command?
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Passive Check Freshness Not Working

Post by rkennedy »

Taken from https://assets.nagios.com/downloads/nag ... hness.html - it looks like you don't have a check enabled when it is detected as stale.
"Configure the check_command option in your host or service definitions to reflect a valid command that should be used to actively check the host or service when it is detected as stale."
The part you'll be looking to add is -

Code: Select all

check_command	 no-backup-report	; this command is run only if the service results are "stale"
Former Nagios Employee
jeremie.grund
Posts: 10
Joined: Mon Nov 09, 2015 1:41 pm

Re: Passive Check Freshness Not Working

Post by jeremie.grund »

I have a check_command in the passive-service template that calls check_dummy to set a critical alert.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Passive Check Freshness Not Working

Post by jdalrymple »

To clear some things up, the alert doesn't come from the freshness threshold at all, it comes from the active check. I think you probably already understood that, the thing we need to figure out is why the active check is occurring.

Can you check objects.cache to make sure active checks are truly disabled for that service?

The other thing to consider is Predictive Dependency Checks. I don't recall how much these honor active_checks_enabled 0. Let's start with objects.cache though.
jeremie.grund
Posts: 10
Joined: Mon Nov 09, 2015 1:41 pm

Re: Passive Check Freshness Not Working

Post by jeremie.grund »

Do you have a url on how to check the objects.cache? This is something I haven't done yet.

Thanks
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Passive Check Freshness Not Working

Post by jdalrymple »

objects.cache is a file, usually at /usr/local/nagios/var/, however your environment may differ. It's location is explicitly defined at

Code: Select all

[jdalrymple@jrd-cent66-2 ~]$ grep object_cache_file /usr/local/nagios/etc/nagios.cfg
object_cache_file=/usr/local/nagios/var/objects.cache
Once you know the location of the objects.cache file try this horrible command on it:

Code: Select all

grep -Pzo "(?s)^define\s?(host|service)?\s?{.*?(DYNCUSTOM Mirror).*?}" /usr/local/nagios/var/objects.cache | tac | grep -Pzo "(?s)^.?}.*?(DYNCUSTOM Mirror).*?\{" | tac
Replace "/usr/local/nagios/var/objects.cache" with your path.
Locked