Segfault Troubleshooting Tips

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
derekbrewer
Posts: 8
Joined: Thu Mar 15, 2012 3:18 pm

Segfault Troubleshooting Tips

Post by derekbrewer »

Over the years I've experienced some random segfaults with Nagios Core and I've struggled to figure out the best way to troubleshoot. Recently, I've had two incidents within 3 days on two separate servers, but in the past it was once every 6 months or so on just one of my servers.

What I'm really looking for is suggestions on how to debug or do a post mortem. My servers are running Nagios Core 4.3.4 on RHEL6.4. What I've done so far is set "daemon_dumps_core=1" in the nagios.cfg file, but it hasn't generated a core file yet. I have plans to update to RHEL7.5 and the latest Nagios Core, but I'm really looking for any other tips on how to track down what might be causing the issue. The other thing I've toyed around with is setting the "debug_level=-1" and "debug_verbosity=2", but I'm a little bit concerned with the size of the log file being generated. Plus, since it has only happened in the past every six months or so, that's potentially a long time to run debug mode. I guess maybe a better log rotation scheme is in the cards for me.

I'll keep digging at this and continue with my upgrade plans, but if anyone has other ideas on things to check, I'm all ears. Thank You for your time.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Segfault Troubleshooting Tips

Post by npolovenko »

Hello, @derekbrewer. Can you tell us a little bit more about your architecture? How many hosts and services are you monitoring? Are you using mod gearman?
Do you recall any changes made to the server in the resent days that might've indirectly increased the number segfaults?
Would you consider upgrading Core to the latest version 4.4.2? There were quite a few bug fixes and there is a good chance that your problem has been already resolved.
But to answer your question, I'd start the troubleshooting process by checking the nagios debug output and the syslog. The nagios debug log size can be adjusted in the nagios.cfg file:
max_debug_file_size=1000000
So you don't have to worry about it taking up too much space.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
derekbrewer
Posts: 8
Joined: Thu Mar 15, 2012 3:18 pm

Re: Segfault Troubleshooting Tips

Post by derekbrewer »

npolovenko, thanks for responding. We have various servers running at our sites, but the two most recent crashes happened running with 656/12161 hosts/services and 1591/29822 hosts/services respectively. We are not using mod gearman, however we are using the livestatus portion of check_mk.

I don't recall any significant changes leading up to the crashes, however, there is a lot of automation and other admins making normal day to day changes on a regular basis.

Yes, I have plans to upgrade to the latest version, however, we're currently having issues getting it to work correctly with the latest version of check_mk, so that won't happen until early '19.

Thanks for the tip on the max debug file size. I played around with it for a while and I think I'm going to configure that for my servers so I can look through it if/when this happens again.

Any thoughts on how to get a core dump file if/when crashes do occur? My biggest concern right now is having everything I possibly can have in place, so I have the best chance of capturing the right information when this happens again.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Segfault Troubleshooting Tips

Post by tgriep »

I found a few notes on the daemon_dumps_core option.

Note: core dumps (if any) will go in $HOME if set, or the directory containing the nagios.log file, or "/".

Check all of those folders to see if the core dump file is there.

Also, depending on what caused the Segfault, the system may not be able to create a dump file so checking the debug log may be the only options.
Be sure to check out our Knowledgebase for helpful articles and solutions!
derekbrewer
Posts: 8
Joined: Thu Mar 15, 2012 3:18 pm

Re: Segfault Troubleshooting Tips

Post by derekbrewer »

Thanks for the tips and info. I now have my Nagios servers all configured for debug and I capped my log file size accordingly. I'm hoping I have everything in place if/when this happens again.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Segfault Troubleshooting Tips

Post by tgriep »

Thanks for the update, let us know what you find.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked