Page 1 of 2

Service checks go stale 30s after passive check received.

Posted: Wed Nov 22, 2017 6:16 pm
by invade
Hi. We have a fresh build of Nagios 4.3.4 on CentOS 7 that is receiving passive host / service checks from numerous systems via gearman.

Everything works fine except that, for a handful of systems, all the service checks go stale 30 seconds after the check is received (the host checks are fine).

Below are the log entries for one of the services on one of the systems:

Code: Select all

[Wed Nov 22 22:34:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:34:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:35:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 30s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:35:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:39:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:39:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:40:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 30s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:40:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:44:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:44:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:45:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 30s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:45:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:49:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:49:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:50:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 29s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:50:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:54:31 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:54:31 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:55:00 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 27s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:55:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
As you can see, the passive service check is received, the service alert is set to OK, and then 30 seconds later there is a warning that the service checks are stale and the service alert is set to CRITICAL.

As I say, we have exactly the same host and service checks on numerous systems that don't exhibit this behaviour.

Does anyone know why this is happening for the handful of systems?

Thanks in advance.

Re: Service checks go stale 30s after passive check received

Posted: Mon Nov 27, 2017 3:54 pm
by tgriep
Can you run the following as root on the Nagios server and post the output?

Code: Select all

ps -ef --cols=300
Can you post that services settings from the objects.cache file so we can view it's settings?
It is typically located here.

Code: Select all

/usr/local/nagios/var/objects.cache
Thanks

Re: Service checks go stale 30s after passive check received

Posted: Thu Nov 30, 2017 7:55 am
by invade
ps -ef --cols=300
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Nov22 ? 00:00:14 /usr/lib/systemd/systemd --system --deserialize 16
root 2 0 0 Nov22 ? 00:00:00 [kthreadd]
root 3 2 0 Nov22 ? 00:00:01 [ksoftirqd/0]
root 5 2 0 Nov22 ? 00:00:00 [kworker/0:0H]
root 7 2 0 Nov22 ? 00:00:00 [migration/0]
root 8 2 0 Nov22 ? 00:00:00 [rcu_bh]
root 9 2 0 Nov22 ? 00:00:04 [rcu_sched]
root 10 2 0 Nov22 ? 00:00:03 [watchdog/0]
root 12 2 0 Nov22 ? 00:00:00 [kdevtmpfs]
root 13 2 0 Nov22 ? 00:00:00 [netns]
root 14 2 0 Nov22 ? 00:00:00 [xenwatch]
root 15 2 0 Nov22 ? 00:00:00 [xenbus]
root 17 2 0 Nov22 ? 00:00:00 [khungtaskd]
root 18 2 0 Nov22 ? 00:00:00 [writeback]
root 19 2 0 Nov22 ? 00:00:00 [kintegrityd]
root 20 2 0 Nov22 ? 00:00:00 [bioset]
root 21 2 0 Nov22 ? 00:00:00 [kblockd]
root 22 2 0 Nov22 ? 00:00:00 [md]
root 27 2 0 Nov22 ? 00:00:00 [kswapd0]
root 28 2 0 Nov22 ? 00:00:00 [ksmd]
root 29 2 0 Nov22 ? 00:00:02 [khugepaged]
root 30 2 0 Nov22 ? 00:00:00 [crypto]
root 38 2 0 Nov22 ? 00:00:00 [kthrotld]
root 40 2 0 Nov22 ? 00:00:00 [kmpath_rdacd]
root 41 2 0 Nov22 ? 00:00:00 [kpsmoused]
root 43 2 0 Nov22 ? 00:00:00 [ipv6_addrconf]
root 62 2 0 Nov22 ? 00:00:00 [deferwq]
root 117 2 0 Nov22 ? 00:00:00 [kauditd]
root 180 2 0 Nov22 ? 00:00:00 [rpciod]
root 181 2 0 Nov22 ? 00:00:00 [xprtiod]
root 249 2 0 Nov22 ? 00:00:00 [ata_sff]
root 251 2 0 Nov22 ? 00:00:00 [scsi_eh_0]
root 252 2 0 Nov22 ? 00:00:00 [scsi_tmf_0]
root 255 2 0 Nov22 ? 00:00:00 [scsi_eh_1]
root 256 2 0 Nov22 ? 00:00:00 [scsi_tmf_1]
root 269 2 0 Nov22 ? 00:00:00 [bioset]
root 270 2 0 Nov22 ? 00:00:00 [xfsalloc]
root 271 2 0 Nov22 ? 00:00:00 [xfs_mru_cache]
root 272 2 0 Nov22 ? 00:00:00 [xfs-buf/xvda1]
root 273 2 0 Nov22 ? 00:00:00 [xfs-data/xvda1]
root 274 2 0 Nov22 ? 00:00:00 [xfs-conv/xvda1]
root 275 2 0 Nov22 ? 00:00:00 [xfs-cil/xvda1]
root 276 2 0 Nov22 ? 00:00:00 [xfs-reclaim/xvd]
root 277 2 0 Nov22 ? 00:00:00 [xfs-log/xvda1]
root 278 2 0 Nov22 ? 00:00:00 [xfs-eofblocks/x]
root 279 2 0 Nov22 ? 00:02:33 [xfsaild/xvda1]
root 356 1 0 Nov22 ? 00:05:59 /usr/lib/systemd/systemd-journald
root 439 1 0 Nov22 ? 00:00:00 /sbin/auditd
root 479 2 0 Nov22 ? 00:00:00 [ttm_swap]
root 517 2 0 Nov22 ? 00:00:00 [edac-poller]
root 553 1 0 Nov22 ? 00:01:53 /usr/sbin/rsyslogd -n
root 554 1 0 Nov22 ? 00:00:01 /usr/lib/systemd/systemd-logind
polkitd 559 1 0 Nov22 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
dbus 561 1 0 Nov22 ? 00:00:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
chrony 563 1 0 Nov22 ? 00:00:01 /usr/sbin/chronyd
root 776 1 0 Nov22 ? 00:00:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0
root 843 1 0 Nov22 ? 00:00:51 /usr/bin/python -Es /usr/sbin/tuned -l -P
root 846 2 0 Nov22 ? 00:00:01 [kworker/0:1H]
root 1017 1 0 Nov22 ? 00:00:01 /usr/sbin/crond -n
root 1018 1 0 Nov22 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
root 1022 1 0 Nov22 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
root 1374 2 0 Nov22 ? 00:00:01 [kworker/u30:2]
root 1438 1 0 Nov22 ? 00:00:00 /usr/lib/systemd/systemd-udevd
root 1598 1 0 Nov22 ? 00:00:00 /usr/sbin/gssproxy -D
root 2186 1 0 Nov22 ? 00:00:00 /usr/sbin/sshd -D
nagios 7008 1 0 Nov24 ? 00:10:59 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
nagios 7009 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7010 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7011 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7012 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7013 7008 0 Nov24 ? 00:00:28 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
root 13650 1 0 Nov23 ? 00:00:23 /usr/sbin/httpd -DFOREGROUND
root 13923 2 0 Nov24 ? 00:00:01 [kworker/u30:1]
apache 17439 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17440 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17441 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17442 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17443 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
postfix 25952 26248 0 10:59 ? 00:00:00 pickup -l -t unix -u
root 26046 2186 0 11:58 ? 00:00:00 sshd: centos [priv]
centos 26049 26046 0 11:58 ? 00:00:00 sshd: centos@pts/0
centos 26050 26049 0 11:58 pts/0 00:00:00 -bash
root 26162 26050 0 12:03 pts/0 00:00:00 sudo su
root 26163 26162 0 12:03 pts/0 00:00:00 su
root 26164 26163 0 12:03 pts/0 00:00:00 bash
root 26248 1 0 Nov22 ? 00:00:01 /usr/libexec/postfix/master -w
postfix 26250 26248 0 Nov22 ? 00:00:00 qmgr -l -t unix -u
root 26589 2 0 12:12 ? 00:00:00 [kworker/0:2]
root 27104 2 0 12:22 ? 00:00:00 [kworker/0:0]
root 27370 2 0 12:28 ? 00:00:00 [kworker/0:1]
root 27867 26164 0 12:37 pts/0 00:00:00 ps -ef --cols=300
---------------------------------------------------------------------------------------------------------------------
object.cache file is quite big... so just showing relevant information for one site to monitor.
########################################
# NAGIOS OBJECT CACHE FILE
#
# THIS FILE IS AUTOMATICALLY GENERATED
# BY NAGIOS. DO NOT MODIFY THIS FILE!
#
# Created: Fri Nov 24 10:26:29 2017
########################################

define timeperiod {
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

define timeperiod {
timeperiod_name 24x7_sans_holidays
alias 24x7 Sans Holidays
december 25 00:00-00:00
july 4 00:00-00:00
january 1 00:00-00:00
thursday 4 november 00:00-00:00
monday 1 september 00:00-00:00
monday -1 may 00:00-00:00
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

define timeperiod {
timeperiod_name none
alias No Time Is A Good Time
}

define timeperiod {
timeperiod_name us-holidays
alias U.S. Holidays
january 1 00:00-00:00
july 4 00:00-00:00
december 25 00:00-00:00
monday -1 may 00:00-00:00
monday 1 september 00:00-00:00
thursday 4 november 00:00-00:00
}

define timeperiod {
timeperiod_name workhours
alias Normal Work Hours
monday 09:00-17:00
tuesday 09:00-17:00
wednesday 09:00-17:00
thursday 09:00-17:00
friday 09:00-17:00
}
define command {
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}

define command {
command_name check_dhcp
command_line $USER1$/check_dhcp $ARG1$
}

define command {
command_name check_ftp
command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_hpjd
command_line $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_imap
command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}

define command {
command_name check_local_load
command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
}

define command {
command_name check_local_mrtgtraf
command_line $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$
}

define command {
command_name check_local_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
}

define command {
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}

define command {
command_name check_local_users
command_line $USER1$/check_users -w $ARG1$ -c $ARG2$
}

define command {
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}

define command {
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}

define command {
command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}

define command {
command_name check_tcp
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}

define command {
command_name check_udp
command_line $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}

define command {
command_name host_stale
command_line /usr/local/nagios/libexec/check_dummy 2 "No Recent Passive Host Checks."
}

define command {
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}

define command {
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}

define command {
command_name notifyhost-{Host}
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" -r {Host}@{Nagios server} $CONTACTEMAIL$
}

define command {
command_name notifyservice-{Host}
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -r {Host}@{Nagios server} -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}

define command {
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /var/log/nagios/host-perfdata.out
}

define command {
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /var/log/nagios/service-perfdata.out
}

define command {
command_name service_stale
command_line /usr/local/nagios/libexec/check_dummy 2 "No Recent Passive Service Checks."
}

define contactgroup {
contactgroup_name {Host}
alias {Host}
members {Host}
}

define hostgroup {
hostgroup_name 0-Diallers
alias 0-Diallers
members {Host}
}

define hostgroup {
hostgroup_name {Host group}
alias {Host group}
members {Host}
}

define contact {
contact_name {Host}
alias {Host}
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,c
host_notification_options d
service_notification_commands notifyservice-{Host}
host_notification_commands notifyhost-{Host}
email {Email address}
minimum_importance 0
host_notifications_enabled 1
service_notifications_enabled 1
can_submit_commands 1
retain_status_information 1
retain_nonstatus_information 1
}

define host {
host_name {Host}
alias {Host}
address {Host}
check_command host_stale
contact_groups {Host}
notification_period 24x7
initial_state o
importance 0
check_interval 0.000000
retry_interval 1.000000
max_check_attempts 1
active_checks_enabled 0
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 0
flap_detection_options a
freshness_threshold 600
check_freshness 1
notification_options r,d,u
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
notes latlng:43.774441,-79.367734
retain_status_information 1
retain_nonstatus_information 1
_GTYPE D
}

define service {
host_name {Host}
service_description {Service}
check_period 24x7
check_command service_stale
contact_groups {Host}
notification_period 24x7
initial_state o
importance 0
check_interval 0.000000
retry_interval 1440.000000
max_check_attempts 1
is_volatile 0
parallelize_check 1
active_checks_enabled 0
passive_checks_enabled 1
obsess 0
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 0
flap_detection_options a
freshness_threshold 3600
check_freshness 1
notification_options r,w,u,c
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}

Re: Service checks go stale 30s after passive check received

Posted: Fri Dec 01, 2017 9:21 am
by tgriep
The settings for those checks look like they should work. You should check the settings for Mod Gearman, maybe that is sending old checks in causing the Freshness to be triggered.

Re: Service checks go stale 30s after passive check received

Posted: Tue Dec 05, 2017 8:23 am
by invade
As far as I can tell there is nothing wrong with mod_gearman.

Is the freshness checking based on when the checks were received, or some timestamp included in the check?

According to the logs, the checks are received less than a minute before the "stale" warning. If Nagios is not using the time the check was received to determine the freshness, what else could it be using?

Thanks.

Re: Service checks go stale 30s after passive check received

Posted: Tue Dec 05, 2017 2:44 pm
by tgriep
The freshness should be checked against from when the check was received.
Can you pm me your oblects.cache file and that status.dat file from your server as well as the host name and service name that is having the issue?
If they are large, you will have to zip them up first.
Thanks

Note: PM Received and shared with the other Techs.

Re: Service checks go stale 30s after passive check received

Posted: Wed Dec 06, 2017 4:48 am
by invade
tgriep wrote:The freshness should be checked against from when the check was received.
Can you pm me your oblects.cache file and that status.dat file from your server as well as the host name and service name that is having the issue?
If they are large, you will have to zip them up first.
Thanks
PM sent as requested. Many thanks.

Re: Service checks go stale 30s after passive check received

Posted: Wed Dec 06, 2017 10:20 am
by tgriep
What looks like is happening is that the service check is not updating with the current status of the check when an OK state come in.
It could be caused by a bad entry in Nagios's status files.

To fix that, you would have to stop the nagios process and delete the retention.dat file and then start the nagios process so it can be rebuilt.

Couple of things that happen when this it done.
Any notes added to an object and any downtime will be lost.
Also, the system will act like is is first starting so it will recheck all hosts and services so be prepared for that.

Re: Service checks go stale 30s after passive check received

Posted: Wed Dec 06, 2017 11:43 am
by invade
tgriep wrote:What looks like is happening is that the service check is not updating with the current status of the check when an OK state come in.
It could be caused by a bad entry in Nagios's status files.

To fix that, you would have to stop the nagios process and delete the retention.dat file and then start the nagios process so it can be rebuilt.

Couple of things that happen when this it done.
Any notes added to an object and any downtime will be lost.
Also, the system will act like is is first starting so it will recheck all hosts and services so be prepared for that.
Thanks for the assistance. I have implemented the changes as requested but unfortunately the issue continues to occur.

Re: Service checks go stale 30s after passive check received

Posted: Wed Dec 06, 2017 3:42 pm
by tgriep
Do you ever see that service go in to an OK state after receiving the Passive check?