Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Post by Guest »

This is a multipart message in MIME format.
--=_alternative 002CFD99C125735A_=
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="ISO-8859-1"

From: SCHAER Frederic cea.fr>
Subject: nagios 3 host checks logic problem on some kernels/distros
Newsgroups: gmane.network.nagios.devel
Date: 2007-09-10 16:17:30 GMT (1 week, 15 hours and 23 minutes ago)
Hi,=20
=20
I think I identified a problem (but not and the solution) on the nagios 3=
=20
source tree?=20
I tried with both the 3.0b3 and cvs HEAD source files and could not get=20
rid of the problem.=20
I?m running a 2.4.21 kernel on a RHEL3 box.=20
=20
What happens is that as soon as I start nagios 3, it starts eating all of=
=20
the CPU.=20
Stracing the nagios process shows this (and almost only this):=20
gettimeofday({1189419621, 161574}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 183742}, NULL) =3D 0=20
gettimeofday({1189419621, 183780}, NULL) =3D 0=20
gettimeofday({1189419621, 183814}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 184172}, NULL) =3D 0=20
gettimeofday({1189419621, 184326}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 184734}, NULL) =3D 0=20
gettimeofday({1189419621, 184861}, NULL) =3D 0=20
=20
I tried stracing nagios on a Ubuntu feisty (7.04) box, and the output is=20
much different : there are nanosleep calls?=20
I tried activating and deactivating nanosleeps at nagios compile time, but=
=20
this did not solve my problem.=20
=20
Having full debug, I have this kind of output at the nagios start :=20
[1189438977.881574] [016.0] [pid=3D18234] Attempting to run scheduled check=
=20
of host 'wn010': check options=3D0, latency=3D0.874000=20
[1189438977.881651] [001.0] [pid=3D18234] run_async_host_check_3x()=20
[1189438977.881665] [016.0] [pid=3D18234] ** Running async check of host=20
'wn010'...=20
[1189438977.881678] [001.0] [pid=3D18234] check_host_check_viability_3x()=
=20
[1189438977.881691] [001.0] [pid=3D18234] check_time_against_period()=20
[1189438977.881712] [001.0] [pid=3D18234] check_host_dependencies()=20
[1189438977.881726] [016.1] [pid=3D18234] A check of this host is already=
=20
being executed, so we'll pass for the moment...=20
[1189438977.881739] [016.1] [pid=3D18234] Unable to run scheduled host chec=
k=20
at this time=20
=20
If I run nagios just for 2 seconds and then hit CTRL+C, I still see this=
=20
:=20
>grep "A check of this host is already being executed"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
=20
>grep "Attempting to run scheduled check of host 'wn010'"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
>grep "Attempting to run scheduled check of host"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
=20
I have 53 hosts defined, I don?t understand why nagios is checking ever=20
and ever the same host? and why this is not happening on all systems.=20
=20
De-activating host checks magically ?solves? the problem.=20
=20
I just found out that commenting hosts ?check_command? caused this=20
behaviour (with host_checks_enabled=3Dtrue), and that defining a correct=20
check_command prevented nagios from being so CPU hungry?=20
=20
Hope I helped?=20
=20
Cheers=20

Dear List,

I can confirm the problem Frederic reported.
I am using Nagios 3.0b3 on CentOS 4.4
After starting nagios, the process catches nearly 100 % CPU (See=20
top-output below)
Disableing hostchecks let the process return to normal values.
As far as I can remember, the problem did not occour with nagios3.0a (but=
=20
I can not verify at the moment)

Tasks: 89 total, 3 running, 86 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.0% us, 1.3% sy, 0.0% ni, 72.6% id, 0.0% wa, 0.1% hi, 0.0%=
=20
si
Mem: 4041580k total, 1373844k used, 2667736k free, 60200k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1137348k cached

PID US

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: frederic.schaer cea.fr
Locked