Page 1 of 1

[Nagios-devel] nagios 3 host checks logic problem on some

Posted: Mon Sep 10, 2007 8:17 am
by Guest
This is a multi-part message in MIME format.

------_=_NextPart_001_01C7F3C6.15BEBEE0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi,

=20

I think I identified a problem (but not and the solution) on the nagios
3 source tree...

I tried with both the 3.0b3 and cvs HEAD source files and could not get
rid of the problem.

I'm running a 2.4.21 kernel on a RHEL3 box.

=20

What happens is that as soon as I start nagios 3, it starts eating all
of the CPU.

Stracing the nagios process shows this (and almost only this):

gettimeofday({1189419621, 161574}, NULL) =3D 0

time([1189419621]) =3D 1189419621

time([1189419621]) =3D 1189419621

gettimeofday({1189419621, 183742}, NULL) =3D 0

gettimeofday({1189419621, 183780}, NULL) =3D 0

gettimeofday({1189419621, 183814}, NULL) =3D 0

time([1189419621]) =3D 1189419621

gettimeofday({1189419621, 184172}, NULL) =3D 0

gettimeofday({1189419621, 184326}, NULL) =3D 0

time([1189419621]) =3D 1189419621

time([1189419621]) =3D 1189419621

gettimeofday({1189419621, 184734}, NULL) =3D 0

gettimeofday({1189419621, 184861}, NULL) =3D 0

=20

I tried stracing nagios on a Ubuntu feisty (7.04) box, and the output is
much different : there are nanosleep calls...

I tried activating and deactivating nanosleeps at nagios compile time,
but this did not solve my problem.

=20

Having full debug, I have this kind of output at the nagios start :

[1189438977.881574] [016.0] [pid=3D18234] Attempting to run scheduled
check of host 'wn010': check options=3D0, latency=3D0.874000

[1189438977.881651] [001.0] [pid=3D18234] run_async_host_check_3x()

[1189438977.881665] [016.0] [pid=3D18234] ** Running async check of host
'wn010'...

[1189438977.881678] [001.0] [pid=3D18234] =
check_host_check_viability_3x()

[1189438977.881691] [001.0] [pid=3D18234] check_time_against_period()

[1189438977.881712] [001.0] [pid=3D18234] check_host_dependencies()

[1189438977.881726] [016.1] [pid=3D18234] A check of this host is =
already
being executed, so we'll pass for the moment...

[1189438977.881739] [016.1] [pid=3D18234] Unable to run scheduled host
check at this time

=20

If I run nagios just for 2 seconds and then hit CTRL+C, I still see
this :

>grep "A check of this host is already being executed"
/var/log/nagios/nagios.debug | wc -l

971

=20

>grep "Attempting to run scheduled check of host 'wn010'"
/var/log/nagios/nagios.debug | wc -l

971

>grep "Attempting to run scheduled check of host"
/var/log/nagios/nagios.debug | wc -l

971

=20

I have 53 hosts defined, I don't understand why nagios is checking ever
and ever the same host... and why this is not happening on all systems.

=20

De-activating host checks magically "solves" the problem.

=20

I just found out that commenting hosts "check_command" caused this
behaviour (with host_checks_enabled=3Dtrue), and that defining a correct
check_command prevented nagios from being so CPU hungry...

=20

Hope I helped...

=20

Cheers


------_=_NextPart_001_01C7F3C6.15BEBEE0
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable








</styl

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]