Service checks go stale 30s after passive check received.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Service checks go stale 30s after passive check received.

Post by invade »

Hi. We have a fresh build of Nagios 4.3.4 on CentOS 7 that is receiving passive host / service checks from numerous systems via gearman.

Everything works fine except that, for a handful of systems, all the service checks go stale 30 seconds after the check is received (the host checks are fine).

Below are the log entries for one of the services on one of the systems:

Code: Select all

[Wed Nov 22 22:34:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:34:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:35:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 30s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:35:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:39:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:39:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:40:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 30s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:40:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:44:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:44:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:45:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 30s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:45:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:49:21 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:49:21 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:50:01 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 29s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:50:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
[Wed Nov 22 22:54:31 2017] PASSIVE SERVICE CHECK: host;System-Partitions;0;DISK OK
[Wed Nov 22 22:54:31 2017] SERVICE ALERT: host;System-Partitions;OK;HARD;1;DISK OK
[Wed Nov 22 22:55:00 2017] Warning: The results of service 'System-Partitions' on host 'host' are stale by 0d 1h 15m 27s (threshold=0d 0h 14m 0s).  I'm forcing an immediate check of the service.
[Wed Nov 22 22:55:11 2017] SERVICE ALERT: host;System-Partitions;CRITICAL;HARD;1;CRITICAL: No Recent Passive Service Checks.
As you can see, the passive service check is received, the service alert is set to OK, and then 30 seconds later there is a warning that the service checks are stale and the service alert is set to CRITICAL.

As I say, we have exactly the same host and service checks on numerous systems that don't exhibit this behaviour.

Does anyone know why this is happening for the handful of systems?

Thanks in advance.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service checks go stale 30s after passive check received

Post by tgriep »

Can you run the following as root on the Nagios server and post the output?

Code: Select all

ps -ef --cols=300
Can you post that services settings from the objects.cache file so we can view it's settings?
It is typically located here.

Code: Select all

/usr/local/nagios/var/objects.cache
Thanks
Be sure to check out our Knowledgebase for helpful articles and solutions!
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service checks go stale 30s after passive check received

Post by invade »

ps -ef --cols=300
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Nov22 ? 00:00:14 /usr/lib/systemd/systemd --system --deserialize 16
root 2 0 0 Nov22 ? 00:00:00 [kthreadd]
root 3 2 0 Nov22 ? 00:00:01 [ksoftirqd/0]
root 5 2 0 Nov22 ? 00:00:00 [kworker/0:0H]
root 7 2 0 Nov22 ? 00:00:00 [migration/0]
root 8 2 0 Nov22 ? 00:00:00 [rcu_bh]
root 9 2 0 Nov22 ? 00:00:04 [rcu_sched]
root 10 2 0 Nov22 ? 00:00:03 [watchdog/0]
root 12 2 0 Nov22 ? 00:00:00 [kdevtmpfs]
root 13 2 0 Nov22 ? 00:00:00 [netns]
root 14 2 0 Nov22 ? 00:00:00 [xenwatch]
root 15 2 0 Nov22 ? 00:00:00 [xenbus]
root 17 2 0 Nov22 ? 00:00:00 [khungtaskd]
root 18 2 0 Nov22 ? 00:00:00 [writeback]
root 19 2 0 Nov22 ? 00:00:00 [kintegrityd]
root 20 2 0 Nov22 ? 00:00:00 [bioset]
root 21 2 0 Nov22 ? 00:00:00 [kblockd]
root 22 2 0 Nov22 ? 00:00:00 [md]
root 27 2 0 Nov22 ? 00:00:00 [kswapd0]
root 28 2 0 Nov22 ? 00:00:00 [ksmd]
root 29 2 0 Nov22 ? 00:00:02 [khugepaged]
root 30 2 0 Nov22 ? 00:00:00 [crypto]
root 38 2 0 Nov22 ? 00:00:00 [kthrotld]
root 40 2 0 Nov22 ? 00:00:00 [kmpath_rdacd]
root 41 2 0 Nov22 ? 00:00:00 [kpsmoused]
root 43 2 0 Nov22 ? 00:00:00 [ipv6_addrconf]
root 62 2 0 Nov22 ? 00:00:00 [deferwq]
root 117 2 0 Nov22 ? 00:00:00 [kauditd]
root 180 2 0 Nov22 ? 00:00:00 [rpciod]
root 181 2 0 Nov22 ? 00:00:00 [xprtiod]
root 249 2 0 Nov22 ? 00:00:00 [ata_sff]
root 251 2 0 Nov22 ? 00:00:00 [scsi_eh_0]
root 252 2 0 Nov22 ? 00:00:00 [scsi_tmf_0]
root 255 2 0 Nov22 ? 00:00:00 [scsi_eh_1]
root 256 2 0 Nov22 ? 00:00:00 [scsi_tmf_1]
root 269 2 0 Nov22 ? 00:00:00 [bioset]
root 270 2 0 Nov22 ? 00:00:00 [xfsalloc]
root 271 2 0 Nov22 ? 00:00:00 [xfs_mru_cache]
root 272 2 0 Nov22 ? 00:00:00 [xfs-buf/xvda1]
root 273 2 0 Nov22 ? 00:00:00 [xfs-data/xvda1]
root 274 2 0 Nov22 ? 00:00:00 [xfs-conv/xvda1]
root 275 2 0 Nov22 ? 00:00:00 [xfs-cil/xvda1]
root 276 2 0 Nov22 ? 00:00:00 [xfs-reclaim/xvd]
root 277 2 0 Nov22 ? 00:00:00 [xfs-log/xvda1]
root 278 2 0 Nov22 ? 00:00:00 [xfs-eofblocks/x]
root 279 2 0 Nov22 ? 00:02:33 [xfsaild/xvda1]
root 356 1 0 Nov22 ? 00:05:59 /usr/lib/systemd/systemd-journald
root 439 1 0 Nov22 ? 00:00:00 /sbin/auditd
root 479 2 0 Nov22 ? 00:00:00 [ttm_swap]
root 517 2 0 Nov22 ? 00:00:00 [edac-poller]
root 553 1 0 Nov22 ? 00:01:53 /usr/sbin/rsyslogd -n
root 554 1 0 Nov22 ? 00:00:01 /usr/lib/systemd/systemd-logind
polkitd 559 1 0 Nov22 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
dbus 561 1 0 Nov22 ? 00:00:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
chrony 563 1 0 Nov22 ? 00:00:01 /usr/sbin/chronyd
root 776 1 0 Nov22 ? 00:00:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid eth0
root 843 1 0 Nov22 ? 00:00:51 /usr/bin/python -Es /usr/sbin/tuned -l -P
root 846 2 0 Nov22 ? 00:00:01 [kworker/0:1H]
root 1017 1 0 Nov22 ? 00:00:01 /usr/sbin/crond -n
root 1018 1 0 Nov22 ttyS0 00:00:00 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
root 1022 1 0 Nov22 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
root 1374 2 0 Nov22 ? 00:00:01 [kworker/u30:2]
root 1438 1 0 Nov22 ? 00:00:00 /usr/lib/systemd/systemd-udevd
root 1598 1 0 Nov22 ? 00:00:00 /usr/sbin/gssproxy -D
root 2186 1 0 Nov22 ? 00:00:00 /usr/sbin/sshd -D
nagios 7008 1 0 Nov24 ? 00:10:59 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
nagios 7009 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7010 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7011 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7012 7008 0 Nov24 ? 00:00:00 /usr/sbin/nagios --worker /var/spool/nagios/cmd/nagios.qh
nagios 7013 7008 0 Nov24 ? 00:00:28 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
root 13650 1 0 Nov23 ? 00:00:23 /usr/sbin/httpd -DFOREGROUND
root 13923 2 0 Nov24 ? 00:00:01 [kworker/u30:1]
apache 17439 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17440 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17441 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17442 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 17443 13650 0 Nov26 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
postfix 25952 26248 0 10:59 ? 00:00:00 pickup -l -t unix -u
root 26046 2186 0 11:58 ? 00:00:00 sshd: centos [priv]
centos 26049 26046 0 11:58 ? 00:00:00 sshd: centos@pts/0
centos 26050 26049 0 11:58 pts/0 00:00:00 -bash
root 26162 26050 0 12:03 pts/0 00:00:00 sudo su
root 26163 26162 0 12:03 pts/0 00:00:00 su
root 26164 26163 0 12:03 pts/0 00:00:00 bash
root 26248 1 0 Nov22 ? 00:00:01 /usr/libexec/postfix/master -w
postfix 26250 26248 0 Nov22 ? 00:00:00 qmgr -l -t unix -u
root 26589 2 0 12:12 ? 00:00:00 [kworker/0:2]
root 27104 2 0 12:22 ? 00:00:00 [kworker/0:0]
root 27370 2 0 12:28 ? 00:00:00 [kworker/0:1]
root 27867 26164 0 12:37 pts/0 00:00:00 ps -ef --cols=300
---------------------------------------------------------------------------------------------------------------------
object.cache file is quite big... so just showing relevant information for one site to monitor.
########################################
# NAGIOS OBJECT CACHE FILE
#
# THIS FILE IS AUTOMATICALLY GENERATED
# BY NAGIOS. DO NOT MODIFY THIS FILE!
#
# Created: Fri Nov 24 10:26:29 2017
########################################

define timeperiod {
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

define timeperiod {
timeperiod_name 24x7_sans_holidays
alias 24x7 Sans Holidays
december 25 00:00-00:00
july 4 00:00-00:00
january 1 00:00-00:00
thursday 4 november 00:00-00:00
monday 1 september 00:00-00:00
monday -1 may 00:00-00:00
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

define timeperiod {
timeperiod_name none
alias No Time Is A Good Time
}

define timeperiod {
timeperiod_name us-holidays
alias U.S. Holidays
january 1 00:00-00:00
july 4 00:00-00:00
december 25 00:00-00:00
monday -1 may 00:00-00:00
monday 1 september 00:00-00:00
thursday 4 november 00:00-00:00
}

define timeperiod {
timeperiod_name workhours
alias Normal Work Hours
monday 09:00-17:00
tuesday 09:00-17:00
wednesday 09:00-17:00
thursday 09:00-17:00
friday 09:00-17:00
}
define command {
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}

define command {
command_name check_dhcp
command_line $USER1$/check_dhcp $ARG1$
}

define command {
command_name check_ftp
command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_hpjd
command_line $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_imap
command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}

define command {
command_name check_local_load
command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
}

define command {
command_name check_local_mrtgtraf
command_line $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$
}

define command {
command_name check_local_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
}

define command {
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}

define command {
command_name check_local_users
command_line $USER1$/check_users -w $ARG1$ -c $ARG2$
}

define command {
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}

define command {
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}

define command {
command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}

define command {
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}

define command {
command_name check_tcp
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}

define command {
command_name check_udp
command_line $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}

define command {
command_name host_stale
command_line /usr/local/nagios/libexec/check_dummy 2 "No Recent Passive Host Checks."
}

define command {
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}

define command {
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}

define command {
command_name notifyhost-{Host}
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" -r {Host}@{Nagios server} $CONTACTEMAIL$
}

define command {
command_name notifyservice-{Host}
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -r {Host}@{Nagios server} -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}

define command {
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /var/log/nagios/host-perfdata.out
}

define command {
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /var/log/nagios/service-perfdata.out
}

define command {
command_name service_stale
command_line /usr/local/nagios/libexec/check_dummy 2 "No Recent Passive Service Checks."
}

define contactgroup {
contactgroup_name {Host}
alias {Host}
members {Host}
}

define hostgroup {
hostgroup_name 0-Diallers
alias 0-Diallers
members {Host}
}

define hostgroup {
hostgroup_name {Host group}
alias {Host group}
members {Host}
}

define contact {
contact_name {Host}
alias {Host}
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,c
host_notification_options d
service_notification_commands notifyservice-{Host}
host_notification_commands notifyhost-{Host}
email {Email address}
minimum_importance 0
host_notifications_enabled 1
service_notifications_enabled 1
can_submit_commands 1
retain_status_information 1
retain_nonstatus_information 1
}

define host {
host_name {Host}
alias {Host}
address {Host}
check_command host_stale
contact_groups {Host}
notification_period 24x7
initial_state o
importance 0
check_interval 0.000000
retry_interval 1.000000
max_check_attempts 1
active_checks_enabled 0
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 0
flap_detection_options a
freshness_threshold 600
check_freshness 1
notification_options r,d,u
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
notes latlng:43.774441,-79.367734
retain_status_information 1
retain_nonstatus_information 1
_GTYPE D
}

define service {
host_name {Host}
service_description {Service}
check_period 24x7
check_command service_stale
contact_groups {Host}
notification_period 24x7
initial_state o
importance 0
check_interval 0.000000
retry_interval 1440.000000
max_check_attempts 1
is_volatile 0
parallelize_check 1
active_checks_enabled 0
passive_checks_enabled 1
obsess 0
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 0
flap_detection_options a
freshness_threshold 3600
check_freshness 1
notification_options r,w,u,c
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
}
Last edited by invade on Tue Dec 05, 2017 6:24 am, edited 1 time in total.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service checks go stale 30s after passive check received

Post by tgriep »

The settings for those checks look like they should work. You should check the settings for Mod Gearman, maybe that is sending old checks in causing the Freshness to be triggered.
Be sure to check out our Knowledgebase for helpful articles and solutions!
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service checks go stale 30s after passive check received

Post by invade »

As far as I can tell there is nothing wrong with mod_gearman.

Is the freshness checking based on when the checks were received, or some timestamp included in the check?

According to the logs, the checks are received less than a minute before the "stale" warning. If Nagios is not using the time the check was received to determine the freshness, what else could it be using?

Thanks.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service checks go stale 30s after passive check received

Post by tgriep »

The freshness should be checked against from when the check was received.
Can you pm me your oblects.cache file and that status.dat file from your server as well as the host name and service name that is having the issue?
If they are large, you will have to zip them up first.
Thanks

Note: PM Received and shared with the other Techs.
Be sure to check out our Knowledgebase for helpful articles and solutions!
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service checks go stale 30s after passive check received

Post by invade »

tgriep wrote:The freshness should be checked against from when the check was received.
Can you pm me your oblects.cache file and that status.dat file from your server as well as the host name and service name that is having the issue?
If they are large, you will have to zip them up first.
Thanks
PM sent as requested. Many thanks.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service checks go stale 30s after passive check received

Post by tgriep »

What looks like is happening is that the service check is not updating with the current status of the check when an OK state come in.
It could be caused by a bad entry in Nagios's status files.

To fix that, you would have to stop the nagios process and delete the retention.dat file and then start the nagios process so it can be rebuilt.

Couple of things that happen when this it done.
Any notes added to an object and any downtime will be lost.
Also, the system will act like is is first starting so it will recheck all hosts and services so be prepared for that.
Be sure to check out our Knowledgebase for helpful articles and solutions!
invade
Posts: 29
Joined: Thu Nov 16, 2017 7:45 am

Re: Service checks go stale 30s after passive check received

Post by invade »

tgriep wrote:What looks like is happening is that the service check is not updating with the current status of the check when an OK state come in.
It could be caused by a bad entry in Nagios's status files.

To fix that, you would have to stop the nagios process and delete the retention.dat file and then start the nagios process so it can be rebuilt.

Couple of things that happen when this it done.
Any notes added to an object and any downtime will be lost.
Also, the system will act like is is first starting so it will recheck all hosts and services so be prepared for that.
Thanks for the assistance. I have implemented the changes as requested but unfortunately the issue continues to occur.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service checks go stale 30s after passive check received

Post by tgriep »

Do you ever see that service go in to an OK state after receiving the Passive check?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked