[Nagios-devel] Nagios FIFO bug - out of order service results

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Nagios FIFO bug - out of order service results

Post by Guest »

--=-x1Dm6yVOkSa24dx7pBAX
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I posted this on Nagios Devel Mailing list but i dont know if it going
to be accepted. I also contacted you long time ago concerning this bug
although I did not any a clear idea of what was happening. I hope btw
that everything OK with you. Nice slides from FOSDEM, pitty I missed but
my company dont spend many founds in this..


Hi.
Im using Nagios for long time and there is a bug that persists for many
time.
I have services which are only passive and basically they reflect a some
script that runs periodicaly. When the script starts it puts the service
in an UNKNOWN state and when it finishes it puts it in OK.
Sometimes the OK state was not processed.. well i went back into the
source code and i have put some debug which logs into syslog and I found
that even though the external commands are sent in the correct order
(first the Unknown and then later the OK state) sometimes Nagios reads
the service check results from the FIFO in incorrect order and not as
expected in a FIFO.

What I did was writing to syslog "###write_svc" everytime
write_svc_message is called (which happens when a passive check result
is submited) and write to syslog "###read_svc" when Nagios reads the
service check result from the FIFO.
I also write "###reap_checkresults" when Nagios processes a check result
(which should happen after a read_svc_message from the FIFO).

What I was expecting was seeing Nagios process the results as in a FIFO
(older/first result is processed first) and not like a LIFO (most
recent service result is processed first).
This happens most of the times but not always!!

This is clear to see in the few log lines bellow:


Mar 12 19:26:20 bill2 nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;bill2;BOASV;3;BOASV STARTED
Mar 12 19:26:20 bill2 nagios: ###write_svc: service 'BOASV' on host
'bill2' | ret=3D3 | out=3DBOASV STARTED .
Mar 12 19:26:25 bill2 nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;bill2;BOASV;0;BOASV STOP
Mar 12 19:26:25 bill2 nagios: ###write_svc: service 'BOASV' on host
'bill2' | ret=3D0 | out=3DBOASV STOP .

Mar 12 19:26:35 bill2 nagios: ###read_svc: service 'BOASV' on host
'bill2' | ret=3D0 | out=3DBOASV STOP .
Mar 12 19:26:35 bill2 nagios: ###reap_checkresults: service 'BOASV' on
host 'bill2' | ret=3D0 | out=3DBOASV STOP .

Mar 12 19:26:35 bill2 nagios: ###read_svc: service 'BOASV' on host
'bill2' | ret=3D3 | out=3DBOASV STARTED .
Mar 12 19:26:35 bill2 nagios: ###reap_checkresults: service 'BOASV' on
host 'bill2' | ret=3D3 | out=3DBOASV STARTED .


This happened on a RedHat 7.3 kernel 2.4.20-20.7smp with Nagios 1.2 but
I also saw it happening on RHAS 3.0. We have many different plataforms
with Nagios deployed but they are all RedHat based.
I have seen this problem since Nagios 1.0 but I also tried the latest
CVS 1.x branch and I am still seeing it happening.


Anyone, Ethan, any ideas of whats happening?
There is an obvious problem with the FIFO which olds the service check
results but is it due to some OS bug or some incorrect use/initalization
of the FIFO in the source code?


Regards,
Sergio Freire


--=20
Sergio Freire
Servi=C3=A7os e Redes M=C3=B3veis=20
PT Inova=C3=A7=C3=A3o, SA
Rua Eng. Jos=C3=A9 Ferreira Pinto Basto
3810 - 106 Aveiro - Portugal
Tel. +351 234403609
WwW http://www.ptinovacao.pt
Jabber [email protected]
Blog http://blog.globalpt.net/nelito

--=-x1Dm6yVOkSa24dx7pBAX
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQBCM0sCQjInVjWO5NgRAs4qAJ92gM99hj2TtJNo5MDt30vqoSa+RACdFPV3
8HkLn1mP0WM9GcYRM8p/EVw=
=J7mv
-----END PGP SIGNATURE-----

--=-x1Dm6yVOkSa24dx7pBAX--






This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked