Nagios Support Forum

Posted: **Mon Nov 30, 2015 3:24 am**

Hello,

I'm using Nagios Core 4.1.1.

Whenever Nagios detects a host as being down, it will check the next check of the host for a couple days in the future.

For example :

Host Status: DOWN (for 1d 14h 33m 37s)
Status Information: (Host check timed out after 15.00 seconds)
Performance Data:
Current Attempt: 2/4 (SOFT state)
Last Check Time: 11-28-2015 18:45:21
Check Type: ACTIVE
Check Latency / Duration: 0.000 / 15.004 seconds
Next Scheduled Active Check: 11-30-2015 09:21:35
Last State Change: 11-28-2015 18:45:36
Last Notification: N/A (notification 0)

The host, which is still reported as down at the moment, is up and running...

This host uses the following template :

define host{
name windows-template ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_interval 3 ; Actively check the server every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 4 ; Check each server 10 times (max)
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}

And the chained template is :

define host{
name generic-host ; The name of this host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_period 24x7 ; Send host notifications at any time
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
check_command check-host-alive ; Default command to check Linux hosts
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

In my nagios.cfg file I have :

interval_length=60

This problem only occurs for hosts, it doesn't seem to occur for service checks...

Would someone have an idea?

Posted: **Mon Nov 30, 2015 1:10 pm**

Can you attach a screenshot of your Performance Info from the Nagios Core home page?

Posted: **Mon Nov 30, 2015 7:53 pm**

Can you find the host object in objects.cache (/usr/local/nagios/var/objects.cache) and post it here please.

Is the date and time correct on the server?

Code: Select all

date

Posted: **Tue Dec 01, 2015 7:00 am**

Here are the Perf Info.

The time is properly configured on the Nagios server.

Posted: **Tue Dec 01, 2015 7:04 am**

And here is the host object :

define host {
host_name VMWNEOIEM
alias Monitoring Néonat – serveur alarmes
address 192.168.164.100
check_period 24x7
check_command check-host-alive
contact_groups winadmins
notification_period 24x7
initial_state o
importance 0
check_interval 3.000000
retry_interval 1.000000
max_check_attempts 4
active_checks_enabled 1
passive_checks_enabled 1
obsess 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options a
freshness_threshold 0
check_freshness 0
notification_options r,d,u
notifications_enabled 1
notification_interval 120.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=_HOST_
retain_status_information 1
retain_nonstatus_information 1
}

Posted: **Tue Dec 01, 2015 5:56 pm**

Are you seeing anything in the nagios logs pertaining to time drift? Particularly the word "Compensating" as that is used when time drift is detected. Not sure where you have your log file, but usually it is under /usr/local/nagios/var/nagios.log.

Posted: **Tue Dec 01, 2015 5:59 pm**

Thanks for that, nothing stands out at this point.

Can you find the host object in retention.dat (/usr/local/nagios/var/retention.dat) and post it here please.

Posted: **Wed Dec 02, 2015 9:03 am**

I indeed see 3 time drifts in the history, but they occurred about a month ago, and the issue I'm concerned about happened, for the last time, about 4 days ago.

Code: Select all

nagios-11-05-2015-00.log:[1446652806] Warning: A system time change of 2992 seconds (0d 0h 49m 52s forwards in time) has been detected.  Compensating...
nagios-11-06-2015-00.log:[1446731808] Warning: A system time change of 4439 seconds (0d 1h 13m 59s forwards in time) has been detected.  Compensating...
nagios-11-07-2015-00.log:[1446809013] Warning: A system time change of 2247 seconds (0d 0h 37m 27s forwards in time) has been detected.  Compensating...

From the retention.dat file :

Code: Select all

host {
host_name=VMWNEOIEM
modified_attributes=0
check_command=check-host-alive
check_period=24x7
notification_period=24x7
event_handler=
has_been_checked=1
check_execution_time=4.006
check_latency=0.000
check_type=0
current_state=0
last_state=0
last_hard_state=0
last_event_id=14525906
current_event_id=14525916
current_problem_id=0
last_problem_id=6645594
plugin_output=PING OK - Packet loss = 0%, RTA = 0.39 ms
long_plugin_output=
performance_data=rta=0.388000ms;800.000000;4000.000000;0.000000 pl=0%;10;100;0
last_check=1449064303
next_check=1449064487
check_options=0
current_attempt=1
max_attempts=4
normal_check_interval=3.000000
retry_check_interval=3.000000
state_type=1
last_state_change=1448906608
last_hard_state_change=1443796210
last_time_up=1449064307
last_time_down=1448906608
last_time_unreachable=0
notified_on_down=0
notified_on_unreachable=0
last_notification=0
current_notification_number=0
current_notification_id=0
notifications_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
flap_detection_enabled=1
process_performance_data=1
obsess=1
is_flapping=0
percent_state_change=0.00
check_flapping_recovery_notification=0
state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
}

Unfortunately the host has since been checked as being up, and it is now actively checked properly on a regular basis.

I can't reproduce the problem at will : it only occurs from time to time when a host is detected as being DOWN, not always...

But next time it happens I'll get the same data you asked for up to now, we might then see something different...

Posted: **Wed Dec 02, 2015 5:33 pm**

OK great, next time it happens grab the info from retention.dat and objects.cache and we'll see what can be identified. Also if possible any info from nagios.log

Posted: **Thu Dec 03, 2015 5:38 am**

Ok... I'm facing at the moment an issue that might be related.

One of my host has been down for about 30 minutes. And it hasn't been detected by Nagios yet.

When I look at the web interface for this host :

Last Check Time: 12-03-2015 08:12:46 (About 4 hours ago).
Next Scheduled Check: within 3 minutes, but keeps being delayed!

If I reload the page one minute later, I will see that the next scheduled check has been delayed by a minute. So it keeps delaying the check of this host...

: Next_scheduled_check_A

: Next_scheduled_check_B

Here is a view of the performances page :

: Performances

Top from the Nagios server :

Code: Select all


VMWMON001 : 

top - 11:19:04 up 17 days,  1:59,  2 users,  load average: 0.45, 0.58, 0.50
Tasks: 201 total,   1 running, 194 sleeping,   0 stopped,   6 zombie
Cpu(s):  1.0%us,  1.5%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3107088k total,  2912016k used,   195072k free,   159316k buffers
Swap:   524280k total,      916k used,   523364k free,  1782572k cached

retention.dat from the problematic host :

Code: Select all

host {
host_name=EXPSDYN001
modified_attributes=0
check_command=check-host-alive
check_period=24x7
notification_period=24x7
event_handler=
has_been_checked=1
check_execution_time=4.008
check_latency=0.001
check_type=0
current_state=0
last_state=0
last_hard_state=0
last_event_id=0
current_event_id=0
current_problem_id=0
last_problem_id=0
plugin_output=PING OK - Packet loss = 0%, RTA = 0.35 ms
long_plugin_output=
performance_data=rta=0.348000ms;800.000000;4000.000000;0.000000 pl=0%;10;100;0
last_check=1449126766
next_check=1449137503
check_options=8
current_attempt=1
max_attempts=4
normal_check_interval=3.000000
retry_check_interval=3.000000
state_type=1
last_state_change=1448462865
last_hard_state_change=1448462865
last_time_up=1449126770
last_time_down=0
last_time_unreachable=0
notified_on_down=0
notified_on_unreachable=0
last_notification=0
current_notification_number=0
current_notification_id=0
notifications_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
flap_detection_enabled=1
process_performance_data=1
obsess=1
is_flapping=0
percent_state_change=0.00
check_flapping_recovery_notification=0
state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
}

object.cache for the problematic host :

Code: Select all


define host {
        host_name       EXPSDYN001
        alias   Syngo Dynamics
        address 192.168.163.28
        check_period    24x7
        check_command   check-host-alive
        contact_groups  medtech,dba,admins
        notification_period     24x7
        initial_state   o
        importance      0
        check_interval  3.000000
        retry_interval  1.000000
        max_check_attempts      4
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     0
        check_freshness 0
        notification_options    r,d,u
        notifications_enabled   1
        notification_interval   120.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        action_url      /pnp4nagios/graph?host=$HOSTNAME$&srv=_HOST_
        retain_status_information       1
        retain_nonstatus_information    1
        }

grep EXPSDYN001 from nagios.log

Code: Select all

[1449097200] CURRENT HOST STATE: EXPSDYN001;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.31 ms
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;CPU_load;OK;HARD;1;OK: CPU load is ok.
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive C:\ busy time;OK;HARD;1;OK: DiskTime = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive C:\ queue;OK;HARD;1;OK: DiskQueue = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive C:\ used space;OK;HARD;1;OK C: Total: 99.999GB - Used: 47.179GB (48%) - Free: 52.82GB (52%)
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive F:\ busy time;OK;HARD;1;OK: DiskTime = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive F:\ queue;OK;HARD;1;OK: DiskQueue = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive F:\ used space;OK;HARD;1;OK F: Total: 100GB - Used: 18.716GB (19%) - Free: 81.284GB (81%)
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive G:\ busy time;OK;HARD;1;OK: DiskTime = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive G:\ queue;OK;HARD;1;OK: DiskQueue = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive G:\ used space;OK;HARD;1;OK G: Total: 102GB - Used: 128.801MB (1%) - Free: 101.874GB (99%)
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive L:\ busy time;OK;HARD;1;OK: DiskTime = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive L:\ queue;OK;HARD;1;OK: DiskQueue = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive L:\ used space;OK;HARD;1;OK L: Total: 1TB - Used: 859.752GB (84%) - Free: 164.121GB (16%)
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive M:\ busy time;OK;HARD;1;OK: DiskTime = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive M:\ queue;OK;HARD;1;OK: DiskQueue = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive M:\ used space;OK;HARD;1;OK M: Total: 1TB - Used: 936.361GB (92%) - Free: 87.512GB (8%)
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive T:\ busy time;OK;HARD;1;OK: DiskTime = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive T:\ queue;OK;HARD;1;OK: DiskQueue = 0
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Drive T:\ used space;OK;HARD;1;OK T: Total: 100GB - Used: 765.629MB (1%) - Free: 99.252GB (99%)
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Memory_load;WARNING;HARD;15;WARNING: physical: Total: 7.99GB - Used: 6.605GB (82%) - Free: 1.385GB (17%), virtual: Total: 8TB - Used: 108.445MB (0%) - Free: 8TB (99%), committed: Total: 1
9.973GB - Used: 8.758GB (43%) - Free: 11.215GB (56%), committed: Total: 19.973GB - Used: 8.758GB (43%) - Free: 11.215GB (56%)
[1449097200] CURRENT SERVICE STATE: EXPSDYN001;Memory_pages_by_second;OK;HARD;1;OK: MemoryPagingBySecond = 0
[1449097665] Unable to run check for service 'CPU_load' on host 'EXPSDYN001'
[1449098806] Unable to run check for service 'Drive T:\ used space' on host 'EXPSDYN001'
[1449099465] Unable to run check for service 'CPU_load' on host 'EXPSDYN001'
[1449099538] Unable to run check for service 'Drive M:\ queue' on host 'EXPSDYN001'
[1449100006] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive T:\ used space by 13 seconds...
[1449100916] Unable to run check for service 'Memory_pages_by_second' on host 'EXPSDYN001'
[1449100960] Unable to run check for service 'Drive M:\ used space' on host 'EXPSDYN001'
[1449100968] SERVICE ALERT: EXPSDYN001;Memory_load;OK;HARD;15;OK: physical: Total: 7.99GB - Used: 6.348GB (79%) - Free: 1.642GB (20%), virtual: Total: 8TB - Used: 108.445MB (0%) - Free: 8TB (99%), committed: Total: 19.973GB - Used: 8.
506GB (42%) - Free: 11.467GB (57%), committed: Total: 19.973GB - Used: 8.506GB (42%) - Free: 11.467GB (57%)
[1449101267] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ busy time by 9 seconds...
[1449101625] Warning: The check of service 'Drive F:\ used space' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449100660; next_check=1449100960).  I'm scheduling an immediate check of the servi
ce...
[1449103731] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive G:\ queue by 12 seconds...
[1449103785] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:CPU_load by 9 seconds...
[1449103797] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_pages_by_second by 6 seconds...
[1449103844] Warning: The check of service 'Drive L:\ busy time' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449103020; next_check=1449103200).  I'm scheduling an immediate check of the servic
e...
[1449103847] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 11 seconds...
[1449103851] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ busy time by 12 seconds...
[1449103857] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 12 seconds...
[1449103858] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ queue by 8 seconds...
[1449103919] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive T:\ used space by 5 seconds...
[1449103924] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive T:\ used space by 6 seconds...
[1449103930] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ queue by 5 seconds...
[1449103935] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ queue by 13 seconds...
[1449103948] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ queue by 13 seconds...
[1449103974] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:CPU_load by 11 seconds...
[1449104026] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ used space by 13 seconds...
[1449105062] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_pages_by_second by 16 seconds...
[1449105129] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 6 seconds...
[1449105135] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 7 seconds...
[1449105142] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 13 seconds...
[1449105439] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_pages_by_second by 5 seconds...
[1449105477] Unable to run check for service 'Drive C:\ used space' on host 'EXPSDYN001'
[1449105600] Unable to run check for service 'Drive F:\ busy time' on host 'EXPSDYN001'
[1449106320] Unable to run check for service 'Drive F:\ busy time' on host 'EXPSDYN001'
[1449106955] SERVICE ALERT: EXPSDYN001;Memory_load;WARNING;HARD;15;WARNING: physical: Total: 7.99GB - Used: 6.422GB (80%) - Free: 1.568GB (19%), virtual: Total: 8TB - Used: 109.551MB (0%) - Free: 8TB (99%), committed: Total: 19.973GB - Used: 8.582GB (42%) - Free: 11.39GB (57%), committed: Total: 19.973GB - Used: 8.582GB (42%) - Free: 11.39GB (57%)
[1449107565] Unable to run check for service 'Drive M:\ used space' on host 'EXPSDYN001'
[1449107830] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ queue by 12 seconds...
[1449107830] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive T:\ used space by 12 seconds...
[1449107843] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ queue by 15 seconds...
[1449107843] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive T:\ used space by 10 seconds...
[1449107865] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ used space by 10 seconds...
[1449107940] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ busy time by 9 seconds...
[1449107941] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive T:\ busy time by 12 seconds...
[1449107943] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ busy time by 14 seconds...
[1449107949] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ busy time by 6 seconds...
[1449107953] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive T:\ busy time by 13 seconds...
[1449107981] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive L:\ busy time by 10 seconds...
[1449108002] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ busy time by 15 seconds...
[1449108034] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 5 seconds...
[1449108035] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ queue by 14 seconds...
[1449108039] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 9 seconds...
[1449108285] Warning: The check of service 'Drive F:\ used space' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449107340; next_check=1449107640).  I'm scheduling an immediate check of the service...
[1449108868] Unable to run check for service 'Drive F:\ queue' on host 'EXPSDYN001'
[1449109124] Warning: The check of service 'Drive C:\ used space' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449108178; next_check=1449108478).  I'm scheduling an immediate check of the service...
[1449111165] Unable to run check for service 'Drive C:\ queue' on host 'EXPSDYN001'
[1449111950] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449113685] Unable to run check for service 'Drive C:\ queue' on host 'EXPSDYN001'
[1449115080] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449116205] Unable to run check for service 'Drive C:\ queue' on host 'EXPSDYN001'
[1449117676] Unable to run check for service 'Drive M:\ busy time' on host 'EXPSDYN001'
[1449117687] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ queue by 11 seconds...
[1449118665] Warning: The check of service 'CPU_load' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449117842; next_check=1449118022).  I'm scheduling an immediate check of the service...
[1449118846] Unable to run check for service 'Memory_load' on host 'EXPSDYN001'
[1449118905] Unable to run check for service 'Drive C:\ queue' on host 'EXPSDYN001'
[1449118940] Unable to run check for service 'Drive M:\ busy time' on host 'EXPSDYN001'
[1449119085] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ queue by 12 seconds...
[1449119092] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive L:\ queue by 16 seconds...
[1449119300] Unable to run check for service 'Drive M:\ busy time' on host 'EXPSDYN001'
[1449119385] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ used space by 7 seconds...
[1449119745] Unable to run check for service 'CPU_load' on host 'EXPSDYN001'
[1449121005] Unable to run check for service 'CPU_load' on host 'EXPSDYN001'
[1449121286] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449122204] Warning: The check of service 'Drive M:\ queue' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449121366; next_check=1449121546).  I'm scheduling an immediate check of the service...
[1449124365] Warning: The check of service 'CPU_load' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449123525; next_check=1449123705).  I'm scheduling an immediate check of the service...
[1449124725] Warning: The check of service 'Drive C:\ busy time' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449123860; next_check=1449124040).  I'm scheduling an immediate check of the service...
[1449124725] Unable to run check for service 'Drive C:\ busy time' on host 'EXPSDYN001'
[1449125146] Unable to run check for service 'Memory_load' on host 'EXPSDYN001'
[1449125520] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449125587] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive L:\ queue by 9 seconds...
[1449125587] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive G:\ used space by 7 seconds...
[1449125625] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:CPU_load by 14 seconds...
[1449125966] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449126705] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ busy time by 8 seconds...
[1449127046] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449127066] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive L:\ busy time by 14 seconds...
[1449127126] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Memory_load by 5 seconds...
[1449127226] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449127545] Warning: The check of service 'Drive F:\ used space' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449126590; next_check=1449126890).  I'm scheduling an immediate check of the service...
[1449127680] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449128160] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449128391] SERVICE ALERT: EXPSDYN001;Memory_load;OK;HARD;15;OK: physical: Total: 7.99GB - Used: 6.377GB (79%) - Free: 1.613GB (20%), virtual: Total: 8TB - Used: 109.551MB (0%) - Free: 8TB (99%), committed: Total: 19.973GB - Used: 8.547GB (42%) - Free: 11.426GB (57%), committed: Total: 19.973GB - Used: 8.547GB (42%) - Free: 11.426GB (57%)
[1449129960] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449130680] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449130920] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449131145] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ used space by 9 seconds...
[1449131684] Warning: The check of service 'Drive M:\ busy time' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449130820; next_check=1449131000).  I'm scheduling an immediate check of the service...
[1449131906] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449131925] Warning: The check of service 'Drive G:\ queue' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449131100; next_check=1449131280).  I'm scheduling an immediate check of the service...
[1449132404] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ busy time by 14 seconds...
[1449132416] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ queue by 12 seconds...
[1449132550] SERVICE ALERT: EXPSDYN001;Drive L:\ used space;UNKNOWN;HARD;3;Filter processing failed: Error: Failed to get size for: 3: The system cannot find the path specified.
[1449133526] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449133762] SERVICE ALERT: EXPSDYN001;Memory_pages_by_second;CRITICAL;HARD;6;CRITICAL: MemoryPagingBySecond = 124
[1449133942] SERVICE ALERT: EXPSDYN001;Memory_pages_by_second;OK;HARD;6;OK: MemoryPagingBySecond = 13
[1449134625] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449134767] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ busy time by 15 seconds...
[1449134953] SERVICE ALERT: EXPSDYN001;Drive L:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449134959] SERVICE ALERT: EXPSDYN001;Drive L:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449135003] SERVICE ALERT: EXPSDYN001;Drive L:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449135116] SERVICE ALERT: EXPSDYN001;Drive F:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135136] Unable to run check for service 'Drive L:\ queue' on host 'EXPSDYN001'
[1449135210] SERVICE ALERT: EXPSDYN001;Drive T:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135236] SERVICE ALERT: EXPSDYN001;Drive M:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135256] SERVICE ALERT: EXPSDYN001;Drive G:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135263] SERVICE ALERT: EXPSDYN001;Memory_pages_by_second;CRITICAL;HARD;6;Connection refused or timed out
[1449135286] SERVICE ALERT: EXPSDYN001;Drive C:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449136065] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive G:\ queue by 15 seconds...
[1449136236] Unable to run check for service 'Drive F:\ queue' on host 'EXPSDYN001'
[1449136440] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449136492] SERVICE ALERT: EXPSDYN001;Memory_load;CRITICAL;HARD;15;Connection refused or timed out
[1449136800] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449137280] SERVICE ALERT: EXPSDYN001;Drive M:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137289] SERVICE ALERT: EXPSDYN001;Drive C:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137304] SERVICE ALERT: EXPSDYN001;Drive F:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137308] SERVICE ALERT: EXPSDYN001;Drive G:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137323] SERVICE ALERT: EXPSDYN001;Drive M:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137334] SERVICE ALERT: EXPSDYN001;Drive C:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137340] SERVICE ALERT: EXPSDYN001;CPU_load;CRITICAL;HARD;15;Connection refused or timed out
[1449137343] SERVICE ALERT: EXPSDYN001;Drive G:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137349] SERVICE ALERT: EXPSDYN001;Drive T:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449127545] Warning: The check of service 'Drive F:\ used space' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449126590; next_check=1449126890).  I'm scheduling an immediate check of the service...
[1449127680] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449128160] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449128391] SERVICE ALERT: EXPSDYN001;Memory_load;OK;HARD;15;OK: physical: Total: 7.99GB - Used: 6.377GB (79%) - Free: 1.613GB (20%), virtual: Total: 8TB - Used: 109.551MB (0%) - Free: 8TB (99%), committed: Total: 19.973GB - Used: 8.547GB (42%) - Free: 11.426GB (57%), committed: Total: 19.973GB - Used: 8.547GB (42%) - Free: 11.426GB (57%)
[1449129960] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449130680] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449130920] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449131145] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ used space by 9 seconds...
[1449131684] Warning: The check of service 'Drive M:\ busy time' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449130820; next_check=1449131000).  I'm scheduling an immediate check of the service...
[1449131906] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449131925] Warning: The check of service 'Drive G:\ queue' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449131100; next_check=1449131280).  I'm scheduling an immediate check of the service...
[1449132404] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ busy time by 14 seconds...
[1449132416] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive C:\ queue by 12 seconds...
[1449132550] SERVICE ALERT: EXPSDYN001;Drive L:\ used space;UNKNOWN;HARD;3;Filter processing failed: Error: Failed to get size for: 3: The system cannot find the path specified.
[1449133526] Unable to run check for service 'Drive T:\ busy time' on host 'EXPSDYN001'
[1449133762] SERVICE ALERT: EXPSDYN001;Memory_pages_by_second;CRITICAL;HARD;6;CRITICAL: MemoryPagingBySecond = 124
[1449133942] SERVICE ALERT: EXPSDYN001;Memory_pages_by_second;OK;HARD;6;OK: MemoryPagingBySecond = 13
[1449134625] Unable to run check for service 'Drive G:\ queue' on host 'EXPSDYN001'
[1449134767] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive F:\ busy time by 15 seconds...
[1449134953] SERVICE ALERT: EXPSDYN001;Drive L:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449134959] SERVICE ALERT: EXPSDYN001;Drive L:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449135003] SERVICE ALERT: EXPSDYN001;Drive L:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449135116] SERVICE ALERT: EXPSDYN001;Drive F:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135136] Unable to run check for service 'Drive L:\ queue' on host 'EXPSDYN001'
[1449135210] SERVICE ALERT: EXPSDYN001;Drive T:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135236] SERVICE ALERT: EXPSDYN001;Drive M:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135256] SERVICE ALERT: EXPSDYN001;Drive G:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449135263] SERVICE ALERT: EXPSDYN001;Memory_pages_by_second;CRITICAL;HARD;6;Connection refused or timed out
[1449135286] SERVICE ALERT: EXPSDYN001;Drive C:\ used space;CRITICAL;HARD;3;Connection refused or timed out
[1449136065] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive G:\ queue by 15 seconds...
[1449136236] Unable to run check for service 'Drive F:\ queue' on host 'EXPSDYN001'
[1449136440] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449136492] SERVICE ALERT: EXPSDYN001;Memory_load;CRITICAL;HARD;15;Connection refused or timed out
[1449136800] Unable to run check for service 'Drive L:\ busy time' on host 'EXPSDYN001'
[1449137280] SERVICE ALERT: EXPSDYN001;Drive M:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137289] SERVICE ALERT: EXPSDYN001;Drive C:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137304] SERVICE ALERT: EXPSDYN001;Drive F:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137308] SERVICE ALERT: EXPSDYN001;Drive G:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137323] SERVICE ALERT: EXPSDYN001;Drive M:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137334] SERVICE ALERT: EXPSDYN001;Drive C:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449137340] SERVICE ALERT: EXPSDYN001;CPU_load;CRITICAL;HARD;15;Connection refused or timed out
[1449137343] SERVICE ALERT: EXPSDYN001;Drive G:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137349] SERVICE ALERT: EXPSDYN001;Drive T:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137499] SERVICE ALERT: EXPSDYN001;Drive F:\ queue;CRITICAL;HARD;15;Connection refused or timed out
[1449137520] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive L:\ busy time by 8 seconds...
[1449137625] Warning: The check of service 'Drive T:\ busy time' on host 'EXPSDYN001' looks like it was orphaned (results never came back; last_check=1449136766; next_check=1449136946).  I'm scheduling an immediate check of the service...
[1449137697] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:CPU_load by 9 seconds...
[1449137706] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:CPU_load by 15 seconds...
[1449137986] SERVICE ALERT: EXPSDYN001;Drive T:\ busy time;CRITICAL;HARD;15;Connection refused or timed out
[1449138040] Max concurrent service checks (80) has been reached.  Nudging EXPSDYN001:Drive M:\ queue by 10 seconds...

I can see that I am hitting the Max concurrent service checks quite often. I'll increase this value. But is a "Host check" impacted by this limit?

Nagios Support Forum

Nagios takes days to retry a DOWN HOST

Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST

Re: Nagios takes days to retry a DOWN HOST