Re: [Nagios-devel] Instrumenting Nagios

Guest · Post by **Guest** » Thu May 21, 2009 3:32 pm

Sounds like what you want is a way to wait=0Auntil you know the program is =
in a bad way,=0Aand only then turn on the profiling. Perhaps=0Athere is so=
me mechanism such as described here:=0Ahttp://www.cs.utah.edu/flux/oskit/ht=
ml/oskit-wwwch32.html=0Athat might do the trick. (I'm not saying that=0Apa=
rticular implementation is appropriate or=0Aset up in the standard gprof pa=
ckage; I have=0Anot looked in detail. All I'm saying is that=0Ayou might l=
ook for such a facility.)=0A=0AIf you can find such a mechanism, you might=
=0Aneed to send Nagios a custom command to get it=0Ato enable or disable th=
e profiling, so you=0Ahave control from the outside.=0A=0A--- On Thu, 5/21/=
09, Steven D. Morrey wrote:=0A=0A> From: Steven D. =
Morrey =0A> Subject: Re: [Nagios-devel] Instrumentin=
g Nagios=0A> To: "Nagios Developers List" =0A> Date: Thursday, May 21, 2009, 6:48 AM=0A> gprof doesn't like Nagios=
.=0A> It generates a new profile data for each fork.=0A> I have 30,000 serv=
ice checks on 3,000 hosts that run each=0A> hour.=0A> Even then it's ok for=
30 minutes or an hour, but when you=0A> are trying to debug something that=
takes 2 or 3 days to=0A> show, it becomes nearly impossible to manage.=0A>=
oprofile buggered the entire system on my development boxes=0A> (SLES 9 on=
VMWare).=0A> Hence the need to instrument just the important parts.=0A> Un=
less you folks know of some switch or another I can pass=0A> in at compile =
time to get the profile data to be=0A> manageable.=0A> =0A> Thanks!=0A> =0A=
> Sincerely,=0A> Steve=0A> =0A> ________________________________________=0A=
> From: eponymous alias [[email protected]]=0A> Sent: Wednesday, May=
20, 2009 7:50 PM=0A> To: Nagios Developers List=0A> Subject: Re: [Nagios-d=
evel] Instrumenting Nagios=0A> =0A> To the extent that such delays may be p=
artly=0A> due to general cost of computing, profiling the=0A> entire nagios=
binary would not be a bad idea.=0A> gprof is your friend.=0A> =0A> --- On =
Tue, 5/19/09, Steven D. Morrey =0A> wrote:=0A> =0A> =
> From: Steven D. Morrey =0A> > Subject: [Nagios-dev=
el] Instrumenting Nagios=0A> > To: "[email protected]"=0A>=
=0A> > Date: Tuesday, May 19, 2009, 11=
:11 AM=0A> > Hi Everyone,=0A> >=0A> > We're trying to track down a high lat=
ency issue we're=0A> > having with our Nagios system and I'm hoping to get=
=0A> some=0A> > advice from folks.=0A> > Here's what=E2=80=99s going on.=0A=
> >=0A> > We have a system running Nagios 2.12 and DNX 0.19=0A> (latest)=0A=
> > This setup is comprised of 1 main nagios server and 3=0A> DNX=0A> > "wo=
rker nodes".=0A> >=0A> > We have 29000+ service checks across about 2500 ho=
sts.=0A> Over=0A> > the last year we average about 250 or more services=0A>=
alarming=0A> > at any given time. We also have on average about 10=0A> hos=
ts=0A> > down at any given time.=0A> >=0A> > My original thought was that p=
erhaps DNX was slowing=0A> down,=0A> > maybe a leak or something so I instr=
umented DNX, by=0A> timing=0A> > from the moment it's handed a job until it=
posts the=0A> results=0A> > into the circular results buffer.=0A> > This f=
igure holds steady at 3.5s.=0A> >=0A> > I am pretty sure all checks are get=
ting executed (at=0A> least,=0A> > all the ones that are enabled) eventuall=
y. Just more=0A> and=0A> > more slowly over time.=0A> > Clearly, some check=
s are being delayed by something or=0A> even=0A> > many things. What I'd l=
ike to do is to perhaps=0A> extend=0A> > nagiostats to gather information a=
bout why latency is=0A> at the=0A> > level it is, to see if we can't determ=
ine why Nagios=0A> is=0A> > waiting to run these checks.=0A> >=0A> > What s=
hould we be looking at, either in the event loop=0A> or=0A> > outside of it=
, to get a good overview of how what and=0A> why=0A> > nagios is doing what=
it's doing?=0A> >=0A> > We are thinking of adding counters to the differen=
t=0A> events=0A> > (both h

...[email truncated]...

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]