Page 1 of 1

high service check latency

Posted: Fri Oct 28, 2011 9:23 am
by raggmopp
Hi all:

Dell PE2950, 16GB ram, plenty of disk space, etc
Just upgraded to Nagios 3.3.1 from Nagios 3.2.3
MySQL 5.0.77
NDO2DB 1.4b9
RRDTool 1.4.5
NRPE 2.8.1

Been nagios for a while (nagios 2.x) and I have been upgrading, the latest upgrade from 3.2.3. Every other upgrade has gone off without problems except this one to Nagios 3.3.1.

The Service Check Latency has jumped from being about 1 sec to 200+ seconds. I have searched for tuning tips and have made the following changes in nagios.cfg but with little effect.
max_concurrent_checks=100
check_result_reaper_frequency=15
max_check_result_reaper_time=25


The out below is the result of nagiosstats.
Nagios Stats 3.3.1
Copyright (c) 2003-2008 Ethan Galstad (http://www.nagios.org)
Last Modified: 07-25-2011
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File: /usr/local/nagios/var/status.dat
Status File Age: 0d 0h 0m 11s
Status File Version: 3.3.1

Program Running Time: 0d 16h 27m 5s
Nagios PID: 6224
Used/High/Total Command Buffers: 0 / 3 / 4096

Total Services: 2023
Services Checked: 2023
Services Scheduled: 2020
Services Actively Checked: 2023
Services Passively Checked: 0
Total Service State Change: 0.000 / 9.870 / 0.020 %
Active Service Latency: 0.008 / 324.337 / 242.021 sec
Active Service Execution Time: 0.011 / 52.108 / 0.874 sec
Active Service State Change: 0.000 / 9.870 / 0.020 %
Active Services Last 1/5/15/60 min: 128 / 902 / 1850 / 1935
Passive Service Latency: 0.000 / 0.000 / 0.000 sec
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit: 2021 / 1 / 0 / 1
Services Flapping: 0
Services In Downtime: 0

Total Hosts: 152
Hosts Checked: 152
Hosts Scheduled: 28
Hosts Actively Checked: 152
Host Passively Checked: 0
Total Host State Change: 0.000 / 0.000 / 0.000 %
Active Host Latency: 0.000 / 471.193 / 284.723 sec
Active Host Execution Time: 0.007 / 0.162 / 0.044 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min: 4 / 16 / 29 / 29
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 152 / 0 / 0
Hosts Flapping: 0
Hosts In Downtime: 0

Active Host Checks Last 1/5/15 min: 5 / 17 / 49
Scheduled: 5 / 16 / 44
On-demand: 0 / 1 / 5
Parallel: 5 / 16 / 46
Serial: 0 / 0 / 0
Cached: 0 / 1 / 3
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 179 / 939 / 2898
Scheduled: 179 / 939 / 2898
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min: 0 / 0 / 0


I have been unable to find any reasons why or solutions. Anybody else?

Thanks

Re: high service check latency

Posted: Sat Oct 29, 2011 8:02 am
by xvvivan
Hi,

In the past I had a similar problem and the cause was ndoutils.
You could try disabling "broker_module" temporarily?
This test is only to identify a possible cause.

Regards

Ivan

Re: high service check latency

Posted: Mon Oct 31, 2011 12:04 pm
by raggmopp
Tried disabling the ndo and mysql - no change. The nagios process cruises along and then after some time starts eating memory, spawning a bunch of chrildren which become defuncts, and the %sys and LOAD start going up. At this time, the Service Latency Checks start climbing up, dramatically.

Re: high service check latency

Posted: Tue Nov 01, 2011 1:06 pm
by mguthrie
Try:

Code: Select all

max_concurrent_checks=0
check_result_reaper_frequency=5
max_check_result_reaper_time=15
If this the latency problems still exist, post your entire nagios.cfg. There's either a bad setting that's blocking the loop that executes new checks, or your system is grossly underpowered for the checks that have been scheduled.

Re: high service check latency

Posted: Wed Nov 02, 2011 8:46 am
by raggmopp
Hi all:

Good news. Did a recompile and resinstall, the Service Check Latency has fallen to 0.5 sec on average (from 600 sec) and it is remaining stable.
My previous compile did not include the --with-perlcache option.

A recompile (with the perlcache option) and reinstall has made a dramatic improvement.

Many thanks!

Re: high service check latency

Posted: Thu Nov 03, 2011 6:08 am
by AndersKarl
I tried that and it worked great. Thanks!