RE: [Nagios-devel] Help! I have tons of orphans!
Posted: Tue Mar 11, 2003 12:01 pm
--------------Boundary-00=_VXOLR9M4ARSICSOMCWSM
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
I followed your directions below, and I do seem to be accumulating=20
orphans, or at least processes that aren't doing anything. After ten=20
minutes, my numbers stay pretty much pegged at 196 processes. It would=20
dip to 7 processes, then 9, then 61, then 91 (while staying in the=20
hundreds the rest of the time, except at the beginning). Looking at the=20
start time with ps axfu, I see processes as old as 17 minutes, with the=20
rest somewhere in the range from then to now. Also right after that I sa=
w=20
all those older processes go away, and nagios wrote them off in the log a=
s=20
orphans.
I also believe that this is an actual performance problem as the "last=20
check" timestamp for my services looks similar to the age of the processe=
s=20
themselves. Also something interesting about that is that after the olde=
r=20
processes went away, the timestamps were updated pretty soon afterwards. =
=20
That makes it look more like nagios isn't able to process their results=20
during periods than that my box isn't fast enough
If this does turn out to be related to speed, I would be kinda suprised. =
=20
I was running netsaint on a sparc II 400 (which typically stays at 5-7=20
load average), and the "last check" number stays within a couple of=20
minutes.
I don't know if this will shed light, but enclosed is my nagios.cfg.
Thanks,
Geoff
On Tuesday 11 March 2003 11:38, Carroll, Jim P [Contractor] wrote:
> Are you sure that the processes are being orphaned?
>
> You do realize that the main Nagios process spawns quite a few children=
,
> depending on the number of checks being done, right?
>
> Try this:
>
> Stop Nagios. Completely. Make sure it's a graceful shutdown. Check
> for any remaining nagios processes. Kill them without mercy.
>
> Start Nagios. Run this script:
>
> while :
> do
> ps -e | grep [n]agios | wc -l
> sleep 1
> done
>
> (This assumes you're running Nagios under the username 'nagios'.)
>
> Now, watch the count rise and fall. The count might start to climb, bu=
t
> it should always eventually return to 1, or at least close to 1 (taking
> into account race conditions).
>
> If, on the other hand, you've been watching it for 10 minutes and the
> count just seems to continue to climb, yes, you might be accumulating
> orphans. I suspect that's not the case, however. Yours is a concern
> many of us have experienced with Nagios at one time or another. It
> shouldn't be anything to fret about, unless you're genuinely
> experiencing performance problems.
>
> Let us know how that works out.
>
> jc
>
> > -----Original Message-----
> > From: Geoff Lovett [mailto:[email protected]]
> > Sent: Tuesday, March 11, 2003 11:24 AM
> > To: Jeremy T. Bouse
> > Cc: [email protected]
> > Subject: Re: [Nagios-devel] Help! I have tons of orphans!
> >
> >
> > I just turned it on. I should know in about an hour if it
> > works or not.
> >
> > Thanks,
> > Geoff
> >
> > On Tuesday 11 March 2003 11:12, Jeremy T. Bouse wrote:
> > > Have you tried turning on the obsess over services option and
> > > see if the problem presists? I've found some better performance wit=
h
> > > heavy testing with this option enabled...
> > >
> > > =09Jeremy
> > >
> > > On Tue, Mar 11, 2003 at 10:40:34AM -0600, Geoff Lovett wrote:
> > > > I am currently upgrading from Netsaint 0.0.7 to Nagios 1.0, and I
> > > > am finding that after Nagios runs for an hour or so, lots of
> > > > services start getting orphaned. The resources on this box aren'=
t
> > > > exhausted at all (that I can tell). It tried lowering the
> > > > max_concurrent_checks from 306 to 200, which helped for a little
> > > > while. Now it's exhibiting the same behaviour.
> > > >
> > > > The specs for the box are Debian Linux, kernel 2.4.20,
> >
> > PIII 650, 250M.
> >
> > > > I don't go into swap and the load generally stays
> >
> > between 1 and 5. I
> >
> > > > am montoring 620 services on 95 boxes.
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
I followed your directions below, and I do seem to be accumulating=20
orphans, or at least processes that aren't doing anything. After ten=20
minutes, my numbers stay pretty much pegged at 196 processes. It would=20
dip to 7 processes, then 9, then 61, then 91 (while staying in the=20
hundreds the rest of the time, except at the beginning). Looking at the=20
start time with ps axfu, I see processes as old as 17 minutes, with the=20
rest somewhere in the range from then to now. Also right after that I sa=
w=20
all those older processes go away, and nagios wrote them off in the log a=
s=20
orphans.
I also believe that this is an actual performance problem as the "last=20
check" timestamp for my services looks similar to the age of the processe=
s=20
themselves. Also something interesting about that is that after the olde=
r=20
processes went away, the timestamps were updated pretty soon afterwards. =
=20
That makes it look more like nagios isn't able to process their results=20
during periods than that my box isn't fast enough
If this does turn out to be related to speed, I would be kinda suprised. =
=20
I was running netsaint on a sparc II 400 (which typically stays at 5-7=20
load average), and the "last check" number stays within a couple of=20
minutes.
I don't know if this will shed light, but enclosed is my nagios.cfg.
Thanks,
Geoff
On Tuesday 11 March 2003 11:38, Carroll, Jim P [Contractor] wrote:
> Are you sure that the processes are being orphaned?
>
> You do realize that the main Nagios process spawns quite a few children=
,
> depending on the number of checks being done, right?
>
> Try this:
>
> Stop Nagios. Completely. Make sure it's a graceful shutdown. Check
> for any remaining nagios processes. Kill them without mercy.
>
> Start Nagios. Run this script:
>
> while :
> do
> ps -e | grep [n]agios | wc -l
> sleep 1
> done
>
> (This assumes you're running Nagios under the username 'nagios'.)
>
> Now, watch the count rise and fall. The count might start to climb, bu=
t
> it should always eventually return to 1, or at least close to 1 (taking
> into account race conditions).
>
> If, on the other hand, you've been watching it for 10 minutes and the
> count just seems to continue to climb, yes, you might be accumulating
> orphans. I suspect that's not the case, however. Yours is a concern
> many of us have experienced with Nagios at one time or another. It
> shouldn't be anything to fret about, unless you're genuinely
> experiencing performance problems.
>
> Let us know how that works out.
>
> jc
>
> > -----Original Message-----
> > From: Geoff Lovett [mailto:[email protected]]
> > Sent: Tuesday, March 11, 2003 11:24 AM
> > To: Jeremy T. Bouse
> > Cc: [email protected]
> > Subject: Re: [Nagios-devel] Help! I have tons of orphans!
> >
> >
> > I just turned it on. I should know in about an hour if it
> > works or not.
> >
> > Thanks,
> > Geoff
> >
> > On Tuesday 11 March 2003 11:12, Jeremy T. Bouse wrote:
> > > Have you tried turning on the obsess over services option and
> > > see if the problem presists? I've found some better performance wit=
h
> > > heavy testing with this option enabled...
> > >
> > > =09Jeremy
> > >
> > > On Tue, Mar 11, 2003 at 10:40:34AM -0600, Geoff Lovett wrote:
> > > > I am currently upgrading from Netsaint 0.0.7 to Nagios 1.0, and I
> > > > am finding that after Nagios runs for an hour or so, lots of
> > > > services start getting orphaned. The resources on this box aren'=
t
> > > > exhausted at all (that I can tell). It tried lowering the
> > > > max_concurrent_checks from 306 to 200, which helped for a little
> > > > while. Now it's exhibiting the same behaviour.
> > > >
> > > > The specs for the box are Debian Linux, kernel 2.4.20,
> >
> > PIII 650, 250M.
> >
> > > > I don't go into swap and the load generally stays
> >
> > between 1 and 5. I
> >
> > > > am montoring 620 services on 95 boxes.
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]