Re: [Nagios-devel] Instrumenting Nagios

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Instrumenting Nagios

Post by Guest »


Sounds like what you want is a way to wait=0Auntil you know the program is =
in a bad way,=0Aand only then turn on the profiling. Perhaps=0Athere is so=
me mechanism such as described here:=0Ahttp://www.cs.utah.edu/flux/oskit/ht=
ml/oskit-wwwch32.html=0Athat might do the trick. (I'm not saying that=0Apa=
rticular implementation is appropriate or=0Aset up in the standard gprof pa=
ckage; I have=0Anot looked in detail. All I'm saying is that=0Ayou might l=
ook for such a facility.)=0A=0AIf you can find such a mechanism, you might=
=0Aneed to send Nagios a custom command to get it=0Ato enable or disable th=
e profiling, so you=0Ahave control from the outside.=0A=0A--- On Thu, 5/21/=
09, Steven D. Morrey wrote:=0A=0A> From: Steven D. =
Morrey =0A> Subject: Re: [Nagios-devel] Instrumentin=
g Nagios=0A> To: "Nagios Developers List" =0A> Date: Thursday, May 21, 2009, 6:48 AM=0A> gprof doesn't like Nagios=
.=0A> It generates a new profile data for each fork.=0A> I have 30,000 serv=
ice checks on 3,000 hosts that run each=0A> hour.=0A> Even then it's ok for=
30 minutes or an hour, but when you=0A> are trying to debug something that=
takes 2 or 3 days to=0A> show, it becomes nearly impossible to manage.=0A>=
oprofile buggered the entire system on my development boxes=0A> (SLES 9 on=
VMWare).=0A> Hence the need to instrument just the important parts.=0A> Un=
less you folks know of some switch or another I can pass=0A> in at compile =
time to get the profile data to be=0A> manageable.=0A> =0A> Thanks!=0A> =0A=
> Sincerely,=0A> Steve=0A> =0A> ________________________________________=0A=
> From: eponymous alias [[email protected]]=0A> Sent: Wednesday, May=
20, 2009 7:50 PM=0A> To: Nagios Developers List=0A> Subject: Re: [Nagios-d=
evel] Instrumenting Nagios=0A> =0A> To the extent that such delays may be p=
artly=0A> due to general cost of computing, profiling the=0A> entire nagios=
binary would not be a bad idea.=0A> gprof is your friend.=0A> =0A> --- On =
Tue, 5/19/09, Steven D. Morrey =0A> wrote:=0A> =0A> =
> From: Steven D. Morrey =0A> > Subject: [Nagios-dev=
el] Instrumenting Nagios=0A> > To: "[email protected]"=0A>=
=0A> > Date: Tuesday, May 19, 2009, 11=
:11 AM=0A> > Hi Everyone,=0A> >=0A> > We're trying to track down a high lat=
ency issue we're=0A> > having with our Nagios system and I'm hoping to get=
=0A> some=0A> > advice from folks.=0A> > Here's what=E2=80=99s going on.=0A=
> >=0A> > We have a system running Nagios 2.12 and DNX 0.19=0A> (latest)=0A=
> > This setup is comprised of 1 main nagios server and 3=0A> DNX=0A> > "wo=
rker nodes".=0A> >=0A> > We have 29000+ service checks across about 2500 ho=
sts.=0A> Over=0A> > the last year we average about 250 or more services=0A>=
alarming=0A> > at any given time. We also have on average about 10=0A> hos=
ts=0A> > down at any given time.=0A> >=0A> > My original thought was that p=
erhaps DNX was slowing=0A> down,=0A> > maybe a leak or something so I instr=
umented DNX, by=0A> timing=0A> > from the moment it's handed a job until it=
posts the=0A> results=0A> > into the circular results buffer.=0A> > This f=
igure holds steady at 3.5s.=0A> >=0A> > I am pretty sure all checks are get=
ting executed (at=0A> least,=0A> > all the ones that are enabled) eventuall=
y. Just more=0A> and=0A> > more slowly over time.=0A> > Clearly, some check=
s are being delayed by something or=0A> even=0A> > many things. What I'd l=
ike to do is to perhaps=0A> extend=0A> > nagiostats to gather information a=
bout why latency is=0A> at the=0A> > level it is, to see if we can't determ=
ine why Nagios=0A> is=0A> > waiting to run these checks.=0A> >=0A> > What s=
hould we be looking at, either in the event loop=0A> or=0A> > outside of it=
, to get a good overview of how what and=0A> why=0A> > nagios is doing what=
it's doing?=0A> >=0A> > We are thinking of adding counters to the differen=
t=0A> events=0A> > (both h

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked