[Nagios-devel] Proposed functionality added to event loop.
Posted: Tue May 19, 2009 5:28 pm
Hi Everyone,
I'm working on a latency issue in nagios 2.12 and I was reviewing the event=
_loop one more time.
It has struck me that we could make a quick instrumentation to find out why=
a service check is not being run.
You see if we are have 27,000 checks in the system, and we are running at a=
rate of 27,000 checks every 15 minutes that means that each check makes it=
to the top of the event_list_low queue at least 4 times in an hour.
If latencies are running 3000 (which they are on dev) and we have 27,000 ch=
ecks executed per hour that means that the event is only making it to the t=
op of the queue once every 50 minutes.
What we could do, is to add an "I wasn't executed" flag to the check and se=
t it as a bit field so that if a check doesn't execute we know why it didn'=
t execute.
In this way we could pare down the reasons for latencies pretty quickly, f=
or example on the checks that have say 500 latency when the average is 100.
Obviously if all checks are executing when they reach the top of the queue =
then this does nothing, however if some are falling through the cracks, thi=
s could pretty quickly explain why.
If I do this would there be any interest in a patch, or would this type of =
thing be too specific?
Let me know what you think.
Sincerely,
Steve=20=20
NOTICE: This email message is for the sole use of the intended recipient(s=
) and may contain confidential and privileged information. Any unauthorized=
review, use, disclosure or distribution is prohibited. If you are not the =
intended recipient, please contact the sender by reply email and destroy al=
l copies of the original message.
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
I'm working on a latency issue in nagios 2.12 and I was reviewing the event=
_loop one more time.
It has struck me that we could make a quick instrumentation to find out why=
a service check is not being run.
You see if we are have 27,000 checks in the system, and we are running at a=
rate of 27,000 checks every 15 minutes that means that each check makes it=
to the top of the event_list_low queue at least 4 times in an hour.
If latencies are running 3000 (which they are on dev) and we have 27,000 ch=
ecks executed per hour that means that the event is only making it to the t=
op of the queue once every 50 minutes.
What we could do, is to add an "I wasn't executed" flag to the check and se=
t it as a bit field so that if a check doesn't execute we know why it didn'=
t execute.
In this way we could pare down the reasons for latencies pretty quickly, f=
or example on the checks that have say 500 latency when the average is 100.
Obviously if all checks are executing when they reach the top of the queue =
then this does nothing, however if some are falling through the cracks, thi=
s could pretty quickly explain why.
If I do this would there be any interest in a patch, or would this type of =
thing be too specific?
Let me know what you think.
Sincerely,
Steve=20=20
NOTICE: This email message is for the sole use of the intended recipient(s=
) and may contain confidential and privileged information. Any unauthorized=
review, use, disclosure or distribution is prohibited. If you are not the =
intended recipient, please contact the sender by reply email and destroy al=
l copies of the original message.
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]