Segfault during startup on Nagios Core - Related to "check_for_nagios_updates" call.
Posted: Fri Oct 24, 2025 4:27 pm
Today we had several Nagios daemons segfault randomly (all running RHEL8.10 and Nagios Core 4.4.13). We tried restarting them immediately, but they all continued to segfault. Looking in /var/log/messages, we found the following...
Oct 24 10:45:58 tucnag02 kernel: nagios[1345618]: segfault Oct 24 10:45:58 tucnag02 systemd-coredump[1345671]: Process 1345618 (nagios) of user 100 dumped core.#012#012Stack trace of thread 1345618:#012#0 0x0000000000444cbf query_update_api (nagios)#012#1 0x0000000000444e0d check_for_nagios_updates (nagios)#012#2 0x0000000000413df0 main (nagios)#012#3 0x00007ffff6d2a7e5 __libc_start_main (libc.so.6)#012#4 0x00000000004146be _start (nagios)at 0 ip 0000000000444cbf sp 00007fffffffe550 error 4 in nagios[400000+aa000]
Looking at the stack trace reference in the 2nd line above, I went to the source code and found that function. It looks like it checks for updates at startup and also periodically during run time. It seemed to be failing because of this check. To fix, I set the following values in the retention.dat and status.dat files...
last_update_check=<current epoch time>
Once updated, the daemons started just fine. So I went ahead and created a script and a cron job to periodically set those values to the current time. Is there a way to disable the update check via config files? I don't understand why today all of the instances failed like they did since they had been running with no issue for years before this. Also, I realize that we're fairly behind on the Nagios Core version, so if this has already been fixed in a newer version, I apologize. Any assistance is much appreciated.
Oct 24 10:45:58 tucnag02 kernel: nagios[1345618]: segfault Oct 24 10:45:58 tucnag02 systemd-coredump[1345671]: Process 1345618 (nagios) of user 100 dumped core.#012#012Stack trace of thread 1345618:#012#0 0x0000000000444cbf query_update_api (nagios)#012#1 0x0000000000444e0d check_for_nagios_updates (nagios)#012#2 0x0000000000413df0 main (nagios)#012#3 0x00007ffff6d2a7e5 __libc_start_main (libc.so.6)#012#4 0x00000000004146be _start (nagios)at 0 ip 0000000000444cbf sp 00007fffffffe550 error 4 in nagios[400000+aa000]
Looking at the stack trace reference in the 2nd line above, I went to the source code and found that function. It looks like it checks for updates at startup and also periodically during run time. It seemed to be failing because of this check. To fix, I set the following values in the retention.dat and status.dat files...
last_update_check=<current epoch time>
Once updated, the daemons started just fine. So I went ahead and created a script and a cron job to periodically set those values to the current time. Is there a way to disable the update check via config files? I don't understand why today all of the instances failed like they did since they had been running with no issue for years before this. Also, I realize that we're fairly behind on the Nagios Core version, so if this has already been fixed in a newer version, I apologize. Any assistance is much appreciated.