Posted: Tue Sep 18, 2007 12:11 am
This is a multipart message in MIME format.
--=_alternative 002CFD99C125735A_=
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="ISO-8859-1"
From: SCHAER Frederic cea.fr>
Subject: nagios 3 host checks logic problem on some kernels/distros
Newsgroups: gmane.network.nagios.devel
Date: 2007-09-10 16:17:30 GMT (1 week, 15 hours and 23 minutes ago)
Hi,=20
=20
I think I identified a problem (but not and the solution) on the nagios 3=
=20
source tree?=20
I tried with both the 3.0b3 and cvs HEAD source files and could not get=20
rid of the problem.=20
I?m running a 2.4.21 kernel on a RHEL3 box.=20
=20
What happens is that as soon as I start nagios 3, it starts eating all of=
=20
the CPU.=20
Stracing the nagios process shows this (and almost only this):=20
gettimeofday({1189419621, 161574}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 183742}, NULL) =3D 0=20
gettimeofday({1189419621, 183780}, NULL) =3D 0=20
gettimeofday({1189419621, 183814}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 184172}, NULL) =3D 0=20
gettimeofday({1189419621, 184326}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 184734}, NULL) =3D 0=20
gettimeofday({1189419621, 184861}, NULL) =3D 0=20
=20
I tried stracing nagios on a Ubuntu feisty (7.04) box, and the output is=20
much different : there are nanosleep calls?=20
I tried activating and deactivating nanosleeps at nagios compile time, but=
=20
this did not solve my problem.=20
=20
Having full debug, I have this kind of output at the nagios start :=20
[1189438977.881574] [016.0] [pid=3D18234] Attempting to run scheduled check=
=20
of host 'wn010': check options=3D0, latency=3D0.874000=20
[1189438977.881651] [001.0] [pid=3D18234] run_async_host_check_3x()=20
[1189438977.881665] [016.0] [pid=3D18234] ** Running async check of host=20
'wn010'...=20
[1189438977.881678] [001.0] [pid=3D18234] check_host_check_viability_3x()=
=20
[1189438977.881691] [001.0] [pid=3D18234] check_time_against_period()=20
[1189438977.881712] [001.0] [pid=3D18234] check_host_dependencies()=20
[1189438977.881726] [016.1] [pid=3D18234] A check of this host is already=
=20
being executed, so we'll pass for the moment...=20
[1189438977.881739] [016.1] [pid=3D18234] Unable to run scheduled host chec=
k=20
at this time=20
=20
If I run nagios just for 2 seconds and then hit CTRL+C, I still see this=
=20
:=20
>grep "A check of this host is already being executed"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
=20
>grep "Attempting to run scheduled check of host 'wn010'"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
>grep "Attempting to run scheduled check of host"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
=20
I have 53 hosts defined, I don?t understand why nagios is checking ever=20
and ever the same host? and why this is not happening on all systems.=20
=20
De-activating host checks magically ?solves? the problem.=20
=20
I just found out that commenting hosts ?check_command? caused this=20
behaviour (with host_checks_enabled=3Dtrue), and that defining a correct=20
check_command prevented nagios from being so CPU hungry?=20
=20
Hope I helped?=20
=20
Cheers=20
Dear List,
I can confirm the problem Frederic reported.
I am using Nagios 3.0b3 on CentOS 4.4
After starting nagios, the process catches nearly 100 % CPU (See=20
top-output below)
Disableing hostchecks let the process return to normal values.
As far as I can remember, the problem did not occour with nagios3.0a (but=
=20
I can not verify at the moment)
Tasks: 89 total, 3 running, 86 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.0% us, 1.3% sy, 0.0% ni, 72.6% id, 0.0% wa, 0.1% hi, 0.0%=
=20
si
Mem: 4041580k total, 1373844k used, 2667736k free, 60200k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1137348k cached
PID US
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: frederic.schaer cea.fr
--=_alternative 002CFD99C125735A_=
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="ISO-8859-1"
From: SCHAER Frederic cea.fr>
Subject: nagios 3 host checks logic problem on some kernels/distros
Newsgroups: gmane.network.nagios.devel
Date: 2007-09-10 16:17:30 GMT (1 week, 15 hours and 23 minutes ago)
Hi,=20
=20
I think I identified a problem (but not and the solution) on the nagios 3=
=20
source tree?=20
I tried with both the 3.0b3 and cvs HEAD source files and could not get=20
rid of the problem.=20
I?m running a 2.4.21 kernel on a RHEL3 box.=20
=20
What happens is that as soon as I start nagios 3, it starts eating all of=
=20
the CPU.=20
Stracing the nagios process shows this (and almost only this):=20
gettimeofday({1189419621, 161574}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 183742}, NULL) =3D 0=20
gettimeofday({1189419621, 183780}, NULL) =3D 0=20
gettimeofday({1189419621, 183814}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 184172}, NULL) =3D 0=20
gettimeofday({1189419621, 184326}, NULL) =3D 0=20
time([1189419621]) =3D 1189419621=20
time([1189419621]) =3D 1189419621=20
gettimeofday({1189419621, 184734}, NULL) =3D 0=20
gettimeofday({1189419621, 184861}, NULL) =3D 0=20
=20
I tried stracing nagios on a Ubuntu feisty (7.04) box, and the output is=20
much different : there are nanosleep calls?=20
I tried activating and deactivating nanosleeps at nagios compile time, but=
=20
this did not solve my problem.=20
=20
Having full debug, I have this kind of output at the nagios start :=20
[1189438977.881574] [016.0] [pid=3D18234] Attempting to run scheduled check=
=20
of host 'wn010': check options=3D0, latency=3D0.874000=20
[1189438977.881651] [001.0] [pid=3D18234] run_async_host_check_3x()=20
[1189438977.881665] [016.0] [pid=3D18234] ** Running async check of host=20
'wn010'...=20
[1189438977.881678] [001.0] [pid=3D18234] check_host_check_viability_3x()=
=20
[1189438977.881691] [001.0] [pid=3D18234] check_time_against_period()=20
[1189438977.881712] [001.0] [pid=3D18234] check_host_dependencies()=20
[1189438977.881726] [016.1] [pid=3D18234] A check of this host is already=
=20
being executed, so we'll pass for the moment...=20
[1189438977.881739] [016.1] [pid=3D18234] Unable to run scheduled host chec=
k=20
at this time=20
=20
If I run nagios just for 2 seconds and then hit CTRL+C, I still see this=
=20
:=20
>grep "A check of this host is already being executed"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
=20
>grep "Attempting to run scheduled check of host 'wn010'"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
>grep "Attempting to run scheduled check of host"=20
/var/log/nagios/nagios.debug | wc -l=20
971=20
=20
I have 53 hosts defined, I don?t understand why nagios is checking ever=20
and ever the same host? and why this is not happening on all systems.=20
=20
De-activating host checks magically ?solves? the problem.=20
=20
I just found out that commenting hosts ?check_command? caused this=20
behaviour (with host_checks_enabled=3Dtrue), and that defining a correct=20
check_command prevented nagios from being so CPU hungry?=20
=20
Hope I helped?=20
=20
Cheers=20
Dear List,
I can confirm the problem Frederic reported.
I am using Nagios 3.0b3 on CentOS 4.4
After starting nagios, the process catches nearly 100 % CPU (See=20
top-output below)
Disableing hostchecks let the process return to normal values.
As far as I can remember, the problem did not occour with nagios3.0a (but=
=20
I can not verify at the moment)
Tasks: 89 total, 3 running, 86 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.0% us, 1.3% sy, 0.0% ni, 72.6% id, 0.0% wa, 0.1% hi, 0.0%=
=20
si
Mem: 4041580k total, 1373844k used, 2667736k free, 60200k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1137348k cached
PID US
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: frederic.schaer cea.fr