[Nagios-devel] Continuing issues with retention file causing schedule/actions to
Posted: Wed Mar 08, 2006 12:13 pm
I'm continuing to have problems when retention.dat file gets into a
state where the nagios process stops functioning properly. The problems
I've had in the past were increasing numbers of hosts or entire
hostgroups no longer executing their service checks, and now (today)
that the event handler for one particular service stopped being executed
(while all others continue to work).
In this and all previous cases, stopping nagios and moving the retention
file out of the way resolves the issue. Reloading or a hard stop/start
of nagios doesn't have any effect. There has never appeared to be
anything "wrong" with the retention file.
The only issues with my installation are this issue, and the
all-too-frequent "premature end of script headers" in all the CGI's, and
"Warning: Size of service_message struct (528 bytes) is >
POSIX-guaranteed atomic write size (512 bytes). " due to compiling
x86_64. That being said, I have enough issues that there dozens of
daily "premature script header/Internal Server Error" wreaking havoc
with production, and these instances of event failures that are
extremely critical. The script header problem came into being
immediately upon upgrading from 2.0b6 to 2.0rc2+, and the
scheduling/retention problem has been present to varying degrees in
every 2.0b+ I've tried.
I am happy to find these are configuration/optimization issues on my end
I can resolve, but my suspicion is they are bugs. I will do anything I
can to help provide a debug testbed for identifying and tracking them
down. Attached is my main nagios config (objects are not included), and
I can provide any other data (object configs, logs, retention.dat, etc)
privately if needed (security concerns).
Please let me know what I can do to help address this and find a resolution.
Regards,
/eli
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
state where the nagios process stops functioning properly. The problems
I've had in the past were increasing numbers of hosts or entire
hostgroups no longer executing their service checks, and now (today)
that the event handler for one particular service stopped being executed
(while all others continue to work).
In this and all previous cases, stopping nagios and moving the retention
file out of the way resolves the issue. Reloading or a hard stop/start
of nagios doesn't have any effect. There has never appeared to be
anything "wrong" with the retention file.
The only issues with my installation are this issue, and the
all-too-frequent "premature end of script headers" in all the CGI's, and
"Warning: Size of service_message struct (528 bytes) is >
POSIX-guaranteed atomic write size (512 bytes). " due to compiling
x86_64. That being said, I have enough issues that there dozens of
daily "premature script header/Internal Server Error" wreaking havoc
with production, and these instances of event failures that are
extremely critical. The script header problem came into being
immediately upon upgrading from 2.0b6 to 2.0rc2+, and the
scheduling/retention problem has been present to varying degrees in
every 2.0b+ I've tried.
I am happy to find these are configuration/optimization issues on my end
I can resolve, but my suspicion is they are bugs. I will do anything I
can to help provide a debug testbed for identifying and tracking them
down. Attached is my main nagios config (objects are not included), and
I can provide any other data (object configs, logs, retention.dat, etc)
privately if needed (security concerns).
Please let me know what I can do to help address this and find a resolution.
Regards,
/eli
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]