Page 1 of 2

Nagios segfault shortly after startup

Posted: Tue Jan 03, 2017 10:18 pm
by rlacasse
My Nagios system has been running fine for years using the downloaded VMware image. Yesterday, I upgraded to 5.4.0. Today, out of the blue, services stopped reporting. Upon investigation, the system is reporting that the nagios service isn't running. Restarting it from the NagiosXI has no effect.

I logged in to the backend and ran that manual startup "/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg", this start up normally, runs a few seconds and segfaults.

I ran the same again with an strace and after many pages of output, I got this at the end:

Code: Select all

write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 519) = 519
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 522) = 522
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 553) = 553
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 500) = 500
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 488) = 488
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 533) = 533
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 535) = 535
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 528) = 528
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 520) = 520
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 520) = 520
write(20, "\n402:\n4=1483498445.664276\n174=vm"..., 508) = 508
write(20, "\n402:\n4=1483498445.664276\n174=ww"..., 481) = 481
write(20, "\n402:\n4=1483498445.664276\n174=ww"..., 483) = 483
write(20, "\n402:\n4=1483498445.664276\n174=ww"..., 508) = 508
write(20, "\n402:\n4=1483498445.664276\n174=ww"..., 511) = 511
write(20, "\n402:\n4=1483498445.664276\n174=ww"..., 477) = 477
write(20, "\n402:\n4=1483498445.664276\n174=ww"..., 482) = 482
write(20, "\n403:\n4=1483498445.664276\n220=AD"..., 109) = 109
write(20, "\n403:\n4=1483498445.664276\n220=AS"..., 281) = 281
write(20, "\n403:\n4=1483498445.664276\n220=Ba"..., 781) = 781
write(20, "\n403:\n4=1483498445.664276\n220=DR"..., 31828) = 31828
write(20, "\n403:\n4=1483498445.664276\n220=Me"..., 2738) = 2738
write(20, "\n403:\n4=1483498445.664276\n220=NC"..., 597) = 597
write(20, "\n403:\n4=1483498445.664276\n220=NF"..., 295) = 295
write(20, "\n403:\n4=1483498445.664276\n220=PB"..., 2023) = 2023
write(20, "\n403:\n4=1483498445.664276\n220=Po"..., 154) = 154
write(20, "\n403:\n4=1483498445.664276\n220=iC"..., 834) = 834
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault
Please help!

Re: Nagios segfault shortly after startup

Posted: Wed Jan 04, 2017 10:29 am
by rlacasse
The strace I posted is cryptic but what I'm seeing outside of the strace is the warnings about services, such as duplicates or missing notification emails and such, then host warnings, then segfault. Not sure what the nagios process is trying to accomplish at this stage so I don't know how to further troubleshoot the issue.

Any assistance is appreciated.

Thank you

Re: Nagios segfault shortly after startup

Posted: Wed Jan 04, 2017 11:10 am
by dwhitfield
If you look through your objects.cache, or any of your .cfg files, do you see duplicates?

We can help look for duplicates, but to do that you'll need to PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the Download Profile button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info).

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.

UPDATE: Profile shared with techs.

Re: Nagios segfault shortly after startup

Posted: Wed Jan 04, 2017 1:40 pm
by rlacasse
I've attached the results of the System Profile.

Going through the configuration and the object.cache, the only duplicates I found are expected and were present prior to the 5.4.0 upgrade.

Re: Nagios segfault shortly after startup

Posted: Wed Jan 04, 2017 2:16 pm
by dwhitfield
rlacasse wrote:what I'm seeing outside of the strace is the warnings about services, such as duplicates or missing notification emails and such, then host warnings, then segfault.
If you are seeing these in the GUI, could you post screenshots?

I'm noticing several files in your profile that have not been recently upgrading. Could you post your upgrade.log? Thanks!

Re: Nagios segfault shortly after startup

Posted: Wed Jan 04, 2017 4:20 pm
by rlacasse
I'm not seeing the warnings in the GUI, only when I look at the logs related to writing the configuration but as I've said, I know I have duplicates and they're on purpose.

Attached is the upgrade log.

Thank you,

Re: Nagios segfault shortly after startup

Posted: Wed Jan 04, 2017 4:44 pm
by dwhitfield
Your upgrade.log makes it look like you were upgrading from 5.1.8 but I've seen that in a couple recently, which makes me thing that might not be accurate. From what version were you upgrading? Could you roll back and then try upgrading to 5.3.4 (https://assets.nagios.com/downloads/nag ... 3.4.tar.gz). Then you could upgrade to 5.4.0. I'm just wondering if we didn't catch a bug upgrading from earlier versions.

Re: Nagios segfault shortly after startup

Posted: Fri Jan 06, 2017 12:24 pm
by rlacasse
I've performed the recommended steps, reverted to 5.2.7, manually upgraded to 5.3.4, then used the GUI to upgrade to 5.4.

The upgrade was successful as expect.

I've manually restarted the nagios service post upgrade and didn't have any issues. I'll keep an eye on it now for a few days to see if the problem re-occurs. Last time it took a little over 24 hours before there was any issue.

Thank you for your assistance.

Re: Nagios segfault shortly after startup

Posted: Fri Jan 06, 2017 12:27 pm
by dwhitfield
We'll be closed in 24 hours for the weekend, but definitely on Monday we can resume things if you run into issues.

Glad things look like they are working so far!

Re: Nagios segfault shortly after startup

Posted: Mon Jan 09, 2017 9:41 am
by rlacasse
No issues over the weekend, that appears to have resolved the issue.

Many thanks!