Page 1 of 3
Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 8:19 am
by Mortus
I am currently running Nagios Core 4.2.2 on a CentOS 5 VM. This has been operating without incident for months. Yet suddenly come Monday morning, it decided that it no longer wants to start. I have tried a wide variety of tips found around the web, but nothing seems to work. No changes were made to the VM or the configs. I even tried rolling back to a prior known-good snapshot. Any help would be appreciated.
Here's what I receive when I try to start the service:
Starting nagios:
(SciTE:30406): Gtk-WARNING **: cannot open display:
done.
I can log into the localhost/nagios site, but it shows as not running, and any attempt to access something other than the homepage greets me with: Error: Could not read object configuration data!
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 10:33 am
by dwhitfield
That's a really bizarre warning for Nagios. My first step would be uninstalling ScITE, followed by GTK if there is still an issue. Is there a specific reason you have GTK on your server?
I'm guessing at some point in your Core upgrade you ended up with two versions of Nagios installed, but at this point, that's just a guess. Let's take this one step at a time.
You say no changes were made to the VM. You don't have updates running via cron or anything like that?
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 10:58 am
by Mortus
No cron updates to my knowledge. I inherited this setup from a former colleague and have little-to-no experience with Linux, so all of this is still a learning process for me. So I wouldn't be able to tell you why GTK exists on it.
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 11:12 am
by dwhitfield
If Scite was installed via repo, then yum remove scite should work. If it was compiled from source you'll need to go to the Scite forum/mailing list on how to remove.
Can you post the output of cat /var/log/yum.log | tail? That will tell us the last time yum ran updates.
Are you the only person that uses this server (I don't mean the Nagios webUI, but actually logging into the server)? If so, we can be a little more aggressive with removing things. If someone is actually using scite, they'll just need to move to vi, emacs, nano, or another text editor.
Did you do the upgrade to 4.2.2? I'm just trying to get a sense of how far back your knowledge of this system goes. Thanks!
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 11:38 am
by Mortus
Yes, I am the only one who actually connects to the server. This is actually the dev server for Nagios, and all changes get pushed out to the live box when I finish testing them. So it's not a big deal if we go aggressive as I have old snapshots of it to revert to if things go south.
For the yum.log, here is the output:
Jun 24 14:51:16 Updated: openssl-0.9.8e-40.el5_11.i686
Jun 24 14:51:19 Updated: openssl-devel-0.9.8e-40.el5_11.x86_64
Jun 24 14:51:21 Updated: openssl-devel-0.9.8e-40.el5_11.i386
Jun 24 14:51:28 Updated: tzdata-2016e-1.el5.x86_64
Aug 30 12:41:47 Updated: kernel-headers-2.6.18-411.el5.x86_64
Aug 30 12:41:51 Updated: tzdata-2016f-1.el5.x86_64
Aug 30 12:42:03 Installed: kernel-devel-2.6.18-411.el5.x86_64
Aug 30 12:42:20 Installed: kernel-2.6.18-411.el5.x86_64
Aug 30 12:42:21 Updated: httpd-2.2.3-92.el5.centos.x86_64
Aug 30 12:42:46 Updated: httpd-2.2.3-92.el5.centos.x86_64
Running your initial command gave me this:
Setting up Remove Process
No Match for argument: scite
No Packages marked for removal
And I did do the upgrade to 4.2.2. The version that was on it before I did the upgrade was 3.5.1. I upgraded both the dev and production boxes a couple of weeks ago and they have both been running fine since. The production box is still running fine, which adds to the confusion.
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 12:02 pm
by dwhitfield
I should have said this earlier in the thread, but now is a good time to think about moving to CentOS 7.
https://wiki.centos.org/FAQ/General#hea ... dde5b75e6d
CentOS-5 updates until March 31, 2017
As for the issue at hand, there have been plenty of updates since your last updates, so let's do a
yum -y update. It's possible that will fix things, but in case not, here are instructions for removing Scite:
https://github.com/LuaDist/scite/blob/master/README. If you run into problems with their removal instructions, you'll need to talk to them.
It looks like GTK is installed with graphviz. Can you post the output of
yum erase gtk2? I just want to see if that is going to remove anything nagios-related before you actually erase it. Of course, if you want to take a snapshot first remove and just see what happens, then that works too. If it blows up, just revert to the snapshot.
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 12:25 pm
by Mortus
You were correct on the updates.
Installed:
kernel.x86_64 0:2.6.18-416.el5 kernel-devel.x86_64 0:2.6.18-416.el5
Updated:
bind-libs.x86_64 30:9.3.6-25.P1.el5_11.11 bind-utils.x86_64 30:9.3.6-25.P1.el5_11.11 kernel-headers.x86_64 0:2.6.18-416.el5
nss.i386 0:3.21.3-2.el5_11 nss.x86_64 0:3.21.3-2.el5_11 nss-devel.x86_64 0:3.21.3-2.el5_11
nss-tools.x86_64 0:3.21.3-2.el5_11 tzdata.x86_64 0:2016i-1.el5
The erase has completed. When attempting to run the Nagios service start (command: service nagios start), I now receive this:
Starting nagios:SciTE: error while loading shared libraries: libgtk-x11-2.0.so.0: cannot open shared object file: No such file or directory
done.
Still unsure why it's trying to use scite for anything, since I only use it to edit configs before pushing them up to the git repository.
As far as updating to CentOS 7, I'd love to. However I don't know precisely how Nagios was configured on here (lots of custom configs), so I wouldn't begin to know how to re-establish everything if everything didn't port over properly.
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 12:30 pm
by dwhitfield
How are you starting Nagios?
Can you try /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg and see if you get the same output?
If that doesn't work, can you post your /usr/local/nagios/etc/nagios.cfg?
Thanks!
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 12:35 pm
by Mortus
I get no output when I run that. I've attached the cfg file here.
Re: Nagios suddenly refuses to start
Posted: Wed Nov 30, 2016 12:47 pm
by dwhitfield
Are you using systemctl start nagios or /etc/rc.d/init.d/nagios start or something else to start Nagios? I just want to look at the startup script to see why it is calling scite.
Are you modifying the configs on the server or pushing them from the desktop?