High Service Check Latency
Posted: Thu Oct 24, 2013 6:06 am
Hi we are experiencing high Service Check Latency on one of our distributed nagios core servers. The nagios configuration is quite small, see tactical overview output below, but before I restarted nagios the service check latency figures were in the 1000’s !
Service Check Execution Time: 0.01 / 6.52 / 1.295 sec
Service Check Latency: 0.62 / 223.07 / 118.967 sec
Host Check Execution Time: 4.00 / 4.22 / 4.080 sec
Host Check Latency: 0.01 / 306.80 / 135.256 sec
# Active Host / Service Checks: 50 / 488
# Passive Host / Service Checks: 0 / 0
The server is a small VM with 4cpus and 4GB of memory and the load average is consistently about zero, see top snapshot.
top - 20:19:44 up 87 days, 1:34, 2 users, load average: 0.32, 0.14, 0.05
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4043664k total, 1353964k used, 2689700k free, 184248k buffers
Swap: 2097144k total, 96k used, 2097048k free, 587180k cached
I know I could try altering the following parameters
Max_concurrent_checks=0
max_check_result_reaper_time=30
check_result_reaper_frequency=10
but is that required for such a small number of hosts/services? Is there something fundamental I've missed in the basic configuration?
Thanks
Steve.
Service Check Execution Time: 0.01 / 6.52 / 1.295 sec
Service Check Latency: 0.62 / 223.07 / 118.967 sec
Host Check Execution Time: 4.00 / 4.22 / 4.080 sec
Host Check Latency: 0.01 / 306.80 / 135.256 sec
# Active Host / Service Checks: 50 / 488
# Passive Host / Service Checks: 0 / 0
The server is a small VM with 4cpus and 4GB of memory and the load average is consistently about zero, see top snapshot.
top - 20:19:44 up 87 days, 1:34, 2 users, load average: 0.32, 0.14, 0.05
Tasks: 155 total, 1 running, 154 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4043664k total, 1353964k used, 2689700k free, 184248k buffers
Swap: 2097144k total, 96k used, 2097048k free, 587180k cached
I know I could try altering the following parameters
Max_concurrent_checks=0
max_check_result_reaper_time=30
check_result_reaper_frequency=10
but is that required for such a small number of hosts/services? Is there something fundamental I've missed in the basic configuration?
Thanks
Steve.