Page 1 of 1

Nagios core 4.4.10 status.dat and retention.dat missing

Posted: Mon Aug 14, 2023 5:48 am
by Butters
Hello guys, looking for help where the status.dat and retention.dat file are not being re-created after nagios service restart.

- when running pre-fly check with -v and nagios.cfg I get all ok no errors or warnings
- turned on debug mode and there are no warnings or errors in nagios.debug log file
- Nagios core was running absolutely fine without any issue but after a restart I'm seeing this issue
- permissions are ok on the path and as well on the files when I try to touch the files with nagios user
- tried to re-install nagios core to same and to newer version, still no luck (with prior cleanup of nagios path)
- when I restart the nagios service it does not return any error it restarts the service however it does not re-create the status.dat and retention.dat file
- the service restart itself take too long for sure compared to what it was before
- path for status.dat and retention.dat are configured ok in nagios.cfg

There are no perms issues, pre-fly check does not return any error or warning.Of course the gui returns Error: Could not read host and service status information! when there are these two dat files missing which are being read by cgi/php.

Thank you for any input.

Re: Nagios core 4.4.10 status.dat and retention.dat missing

Posted: Mon Aug 14, 2023 10:08 am
by swolf
Hi @Butters, thanks for reaching out. Let's start by just looking at the retention.dat issue.

Since you have debugging turned on, can you post the part of the log starting

Code: Select all

xrddefault_save_state_information()
?

Also, is there anything in nagios.log referencing "retention"?

Lastly, I would make sure there's a temp_file directive in your nagios.cfg, and I'd verify whether there's a file with a name like nagios.tmpXXXXXX in that directory, and what the permissions are on that file. I could maybe see a situation where that temporary file gets incorrect permissions and then we fail to write to that file, stopping the retention/status data from being written altogether.

Let me know if that gets you anywhere interesting, and feel free to post any other comments/questions/concerns.

-Sebastian Wolf

Re: Nagios core 4.4.10 status.dat and retention.dat missing

Posted: Mon Aug 14, 2023 12:01 pm
by Butters
Identified the issue is with bug 861

https://github.com/NagiosEnterprises/na ... issues/861

don't want to say but seems like this bug fix was maybe missed in releases version 4.4.10???

strace shows

kill(15009, 0) = -1 ESRCH (No such process)
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f29b75d2a10) = 15318
close(3) = 0
close(3) = -1 EBADF (Bad file descriptor)

when a extra line added

check_for_updates=0 into nagios.cfg

then the nagios pid is able to generate status.dat and retention.dat files. Once again I\m running 4.4.10 which should include the previous bug fixes.

Re: Nagios core 4.4.10 status.dat and retention.dat missing

Posted: Tue Aug 15, 2023 11:11 am
by swolf
That's interesting. The original bug there is that Core actually SIGSEGV's when trying to check for updates. That issue is definitely fixed.

If check_for_updates=0 fixes your problem, then there's certainly some follow-up work I'll need to do. Is there any chance I could have you DM me the full strace output?

Re: Nagios core 4.4.10 status.dat and retention.dat missing

Posted: Wed Aug 16, 2023 12:32 am
by Butters
Hello Sebastian, shared the full strace over PM.

Re: Nagios core 4.4.10 status.dat and retention.dat missing

Posted: Wed Aug 16, 2023 10:36 am
by swolf
Hi @Butters, acknowledging receipt of your strace log - thanks! I've also created an issue at https://github.com/NagiosEnterprises/na ... issues/927 to help track this.

Re: Nagios core 4.4.10 status.dat and retention.dat missing

Posted: Thu Apr 18, 2024 8:46 pm
by florencepugh
I recommend using tools like `strace` or `lsof` to trace system calls or check for open files respectively during Nagios startup. Retro Bowl