[Nagios-devel] nagios 3 host checks logic problem on some

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] nagios 3 host checks logic problem on some

Post by Guest »

This is a multi-part message in MIME format.

------_=_NextPart_001_01C7F3C6.15BEBEE0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi,

=20

I think I identified a problem (but not and the solution) on the nagios
3 source tree...

I tried with both the 3.0b3 and cvs HEAD source files and could not get
rid of the problem.

I'm running a 2.4.21 kernel on a RHEL3 box.

=20

What happens is that as soon as I start nagios 3, it starts eating all
of the CPU.

Stracing the nagios process shows this (and almost only this):

gettimeofday({1189419621, 161574}, NULL) =3D 0

time([1189419621]) =3D 1189419621

time([1189419621]) =3D 1189419621

gettimeofday({1189419621, 183742}, NULL) =3D 0

gettimeofday({1189419621, 183780}, NULL) =3D 0

gettimeofday({1189419621, 183814}, NULL) =3D 0

time([1189419621]) =3D 1189419621

gettimeofday({1189419621, 184172}, NULL) =3D 0

gettimeofday({1189419621, 184326}, NULL) =3D 0

time([1189419621]) =3D 1189419621

time([1189419621]) =3D 1189419621

gettimeofday({1189419621, 184734}, NULL) =3D 0

gettimeofday({1189419621, 184861}, NULL) =3D 0

=20

I tried stracing nagios on a Ubuntu feisty (7.04) box, and the output is
much different : there are nanosleep calls...

I tried activating and deactivating nanosleeps at nagios compile time,
but this did not solve my problem.

=20

Having full debug, I have this kind of output at the nagios start :

[1189438977.881574] [016.0] [pid=3D18234] Attempting to run scheduled
check of host 'wn010': check options=3D0, latency=3D0.874000

[1189438977.881651] [001.0] [pid=3D18234] run_async_host_check_3x()

[1189438977.881665] [016.0] [pid=3D18234] ** Running async check of host
'wn010'...

[1189438977.881678] [001.0] [pid=3D18234] =
check_host_check_viability_3x()

[1189438977.881691] [001.0] [pid=3D18234] check_time_against_period()

[1189438977.881712] [001.0] [pid=3D18234] check_host_dependencies()

[1189438977.881726] [016.1] [pid=3D18234] A check of this host is =
already
being executed, so we'll pass for the moment...

[1189438977.881739] [016.1] [pid=3D18234] Unable to run scheduled host
check at this time

=20

If I run nagios just for 2 seconds and then hit CTRL+C, I still see
this :

>grep "A check of this host is already being executed"
/var/log/nagios/nagios.debug | wc -l

971

=20

>grep "Attempting to run scheduled check of host 'wn010'"
/var/log/nagios/nagios.debug | wc -l

971

>grep "Attempting to run scheduled check of host"
/var/log/nagios/nagios.debug | wc -l

971

=20

I have 53 hosts defined, I don't understand why nagios is checking ever
and ever the same host... and why this is not happening on all systems.

=20

De-activating host checks magically "solves" the problem.

=20

I just found out that commenting hosts "check_command" caused this
behaviour (with host_checks_enabled=3Dtrue), and that defining a correct
check_command prevented nagios from being so CPU hungry...

=20

Hope I helped...

=20

Cheers


------_=_NextPart_001_01C7F3C6.15BEBEE0
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable








</styl

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked