Nagios suddenly refuses to start

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Mortus
Posts: 27
Joined: Tue Nov 15, 2016 10:34 am

Nagios suddenly refuses to start

Post by Mortus »

I am currently running Nagios Core 4.2.2 on a CentOS 5 VM. This has been operating without incident for months. Yet suddenly come Monday morning, it decided that it no longer wants to start. I have tried a wide variety of tips found around the web, but nothing seems to work. No changes were made to the VM or the configs. I even tried rolling back to a prior known-good snapshot. Any help would be appreciated.

Here's what I receive when I try to start the service:

Starting nagios:
(SciTE:30406): Gtk-WARNING **: cannot open display:
done.

I can log into the localhost/nagios site, but it shows as not running, and any attempt to access something other than the homepage greets me with: Error: Could not read object configuration data!
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios suddenly refuses to start

Post by dwhitfield »

That's a really bizarre warning for Nagios. My first step would be uninstalling ScITE, followed by GTK if there is still an issue. Is there a specific reason you have GTK on your server?

I'm guessing at some point in your Core upgrade you ended up with two versions of Nagios installed, but at this point, that's just a guess. Let's take this one step at a time.

You say no changes were made to the VM. You don't have updates running via cron or anything like that?
Mortus
Posts: 27
Joined: Tue Nov 15, 2016 10:34 am

Re: Nagios suddenly refuses to start

Post by Mortus »

No cron updates to my knowledge. I inherited this setup from a former colleague and have little-to-no experience with Linux, so all of this is still a learning process for me. So I wouldn't be able to tell you why GTK exists on it.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios suddenly refuses to start

Post by dwhitfield »

If Scite was installed via repo, then yum remove scite should work. If it was compiled from source you'll need to go to the Scite forum/mailing list on how to remove.

Can you post the output of cat /var/log/yum.log | tail? That will tell us the last time yum ran updates.

Are you the only person that uses this server (I don't mean the Nagios webUI, but actually logging into the server)? If so, we can be a little more aggressive with removing things. If someone is actually using scite, they'll just need to move to vi, emacs, nano, or another text editor.

Did you do the upgrade to 4.2.2? I'm just trying to get a sense of how far back your knowledge of this system goes. Thanks!
Last edited by dwhitfield on Wed Nov 30, 2016 11:43 am, edited 1 time in total.
Reason: added clarity in my second sentence
Mortus
Posts: 27
Joined: Tue Nov 15, 2016 10:34 am

Re: Nagios suddenly refuses to start

Post by Mortus »

Yes, I am the only one who actually connects to the server. This is actually the dev server for Nagios, and all changes get pushed out to the live box when I finish testing them. So it's not a big deal if we go aggressive as I have old snapshots of it to revert to if things go south.

For the yum.log, here is the output:

Jun 24 14:51:16 Updated: openssl-0.9.8e-40.el5_11.i686
Jun 24 14:51:19 Updated: openssl-devel-0.9.8e-40.el5_11.x86_64
Jun 24 14:51:21 Updated: openssl-devel-0.9.8e-40.el5_11.i386
Jun 24 14:51:28 Updated: tzdata-2016e-1.el5.x86_64
Aug 30 12:41:47 Updated: kernel-headers-2.6.18-411.el5.x86_64
Aug 30 12:41:51 Updated: tzdata-2016f-1.el5.x86_64
Aug 30 12:42:03 Installed: kernel-devel-2.6.18-411.el5.x86_64
Aug 30 12:42:20 Installed: kernel-2.6.18-411.el5.x86_64
Aug 30 12:42:21 Updated: httpd-2.2.3-92.el5.centos.x86_64
Aug 30 12:42:46 Updated: httpd-2.2.3-92.el5.centos.x86_64

Running your initial command gave me this:

Setting up Remove Process
No Match for argument: scite
No Packages marked for removal


And I did do the upgrade to 4.2.2. The version that was on it before I did the upgrade was 3.5.1. I upgraded both the dev and production boxes a couple of weeks ago and they have both been running fine since. The production box is still running fine, which adds to the confusion.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios suddenly refuses to start

Post by dwhitfield »

I should have said this earlier in the thread, but now is a good time to think about moving to CentOS 7. https://wiki.centos.org/FAQ/General#hea ... dde5b75e6d
CentOS-5 updates until March 31, 2017
As for the issue at hand, there have been plenty of updates since your last updates, so let's do a yum -y update. It's possible that will fix things, but in case not, here are instructions for removing Scite: https://github.com/LuaDist/scite/blob/master/README. If you run into problems with their removal instructions, you'll need to talk to them.

It looks like GTK is installed with graphviz. Can you post the output of yum erase gtk2? I just want to see if that is going to remove anything nagios-related before you actually erase it. Of course, if you want to take a snapshot first remove and just see what happens, then that works too. If it blows up, just revert to the snapshot.
Mortus
Posts: 27
Joined: Tue Nov 15, 2016 10:34 am

Re: Nagios suddenly refuses to start

Post by Mortus »

You were correct on the updates.

Installed:
kernel.x86_64 0:2.6.18-416.el5 kernel-devel.x86_64 0:2.6.18-416.el5

Updated:
bind-libs.x86_64 30:9.3.6-25.P1.el5_11.11 bind-utils.x86_64 30:9.3.6-25.P1.el5_11.11 kernel-headers.x86_64 0:2.6.18-416.el5
nss.i386 0:3.21.3-2.el5_11 nss.x86_64 0:3.21.3-2.el5_11 nss-devel.x86_64 0:3.21.3-2.el5_11
nss-tools.x86_64 0:3.21.3-2.el5_11 tzdata.x86_64 0:2016i-1.el5


The erase has completed. When attempting to run the Nagios service start (command: service nagios start), I now receive this:
Starting nagios:SciTE: error while loading shared libraries: libgtk-x11-2.0.so.0: cannot open shared object file: No such file or directory
done.

Still unsure why it's trying to use scite for anything, since I only use it to edit configs before pushing them up to the git repository.

As far as updating to CentOS 7, I'd love to. However I don't know precisely how Nagios was configured on here (lots of custom configs), so I wouldn't begin to know how to re-establish everything if everything didn't port over properly.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios suddenly refuses to start

Post by dwhitfield »

How are you starting Nagios?

Can you try /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg and see if you get the same output?

If that doesn't work, can you post your /usr/local/nagios/etc/nagios.cfg?

Thanks!
Mortus
Posts: 27
Joined: Tue Nov 15, 2016 10:34 am

Re: Nagios suddenly refuses to start

Post by Mortus »

I get no output when I run that. I've attached the cfg file here.
Attachments
nagios.cfg
(44.01 KiB) Downloaded 175 times
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios suddenly refuses to start

Post by dwhitfield »

Are you using systemctl start nagios or /etc/rc.d/init.d/nagios start or something else to start Nagios? I just want to look at the startup script to see why it is calling scite.

Are you modifying the configs on the server or pushing them from the desktop?
Locked