Re: [Nagios-devel] Unexplained nagios crashes
Posted: Tue Aug 28, 2007 1:13 pm
Duncan Ferguson wrote:
> On 27 Aug 2007, at 12:19, Andreas Ericsson wrote:
>
>> It's not much to go on, but if the crash happens again, do check
>> if it's the same chain of events for the same host that triggers it.
>
> We do have another core file that appears to have the same issue;
> I'll go through it again and see what I can find out.
>
It's possible that a sufficiently long delay f*cks up the threading
somehow, so even if it's not the same host, a similar-looking chain
of events is still interesting.
You'll have to 'up' to the various stack frames to be able to print
the function parameters sent to check_host_reachability() (or we it's
called. I'm much too tired to go check the code right now).
>
>> You could try upgrading to the very latest nagios-2-x-bugfixes
>> off of cvs. It has quite a few bugfixes. Looking at commit-messages,
>> I can't really say if this particular bug has been fixed though.
>
> We are upgrading from 2.8 to 2.9 tonight, but as you have said there
> is nothing in any of the recent commit messages that looks even
> vaguely related.
>
True that, but writing crystal clear commit messages isn't Ethan's
primary strength
Anyways, some git fiddling shows this diffstat between 2.8 and 2.9
base/checks.c | 13 ++++++++++-
base/nagios.c | 15 ++++---------
base/nagiostats.c | 4 +-
base/utils.c | 30 ++++++++++++++++++++++++++-
Some of those changes are related to signal-handling, which has been
known to screw threaded programs over pretty thoroughly in the past,
especially those that fork() post-thread-creation.
Let's keep the hopes high, shall we?
In the meantime, you'll have to examine whatever coredumps you've
managed to salvage.
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> On 27 Aug 2007, at 12:19, Andreas Ericsson wrote:
>
>> It's not much to go on, but if the crash happens again, do check
>> if it's the same chain of events for the same host that triggers it.
>
> We do have another core file that appears to have the same issue;
> I'll go through it again and see what I can find out.
>
It's possible that a sufficiently long delay f*cks up the threading
somehow, so even if it's not the same host, a similar-looking chain
of events is still interesting.
You'll have to 'up' to the various stack frames to be able to print
the function parameters sent to check_host_reachability() (or we it's
called. I'm much too tired to go check the code right now).
>
>> You could try upgrading to the very latest nagios-2-x-bugfixes
>> off of cvs. It has quite a few bugfixes. Looking at commit-messages,
>> I can't really say if this particular bug has been fixed though.
>
> We are upgrading from 2.8 to 2.9 tonight, but as you have said there
> is nothing in any of the recent commit messages that looks even
> vaguely related.
>
True that, but writing crystal clear commit messages isn't Ethan's
primary strength
Anyways, some git fiddling shows this diffstat between 2.8 and 2.9
base/checks.c | 13 ++++++++++-
base/nagios.c | 15 ++++---------
base/nagiostats.c | 4 +-
base/utils.c | 30 ++++++++++++++++++++++++++-
Some of those changes are related to signal-handling, which has been
known to screw threaded programs over pretty thoroughly in the past,
especially those that fork() post-thread-creation.
Let's keep the hopes high, shall we?
In the meantime, you'll have to examine whatever coredumps you've
managed to salvage.
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]