Page 1 of 2

Could not read host and service status information

Posted: Wed Apr 15, 2015 2:35 pm
by crafael01
Hello. I have an instance of Nagios Core that was inherited by my team a couple of years ago. It has been working fine until this week. Yesterday there were alerts going out for a service that did not appear to exist in the web UI, but it was found in the configs. That particular service wasn't really needed, so I just commented it out and all was well.

However, today the same thing happened. A host that should have many services applied to it is showing no services at all, but alerts are still going out for them.

I restarted nagios, and now I am getting this from the web UI when I try to click on anything:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Whoops!

Error: Could not read host and service status information!

The most common cause of this error message (especially for new users), is the fact that Nagios is not actually running. If Nagios is indeed not running, this is a normal error message. It simply indicates that the CGIs could not obtain the current status of hosts and services that are being monitored. If you've just installed things, make sure you read the documentation on starting Nagios.

Some other things you should check in order to resolve this error include:

Check the Nagios log file for messages relating to startup or status data errors.
Always verify configuration options using the -v command-line option before starting or restarting Nagios!
Make sure you read the documentation on installing, configuring and running Nagios thoroughly before continuing. If all else fails, try sending a message to one of the mailing lists. More information can be found at http://www.nagios.org.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I found this issue already posted in a few places, but none of them see to be the same situation. I am not running SELinux. I have tried running in daemon mode. In nagios.log I see this but not much else on startup:
[1429126364] Nagios 3.2.2 starting... (PID=16852)
[1429126364] Local time is Wed Apr 15 14:32:44 CDT 2015
[1429126364] LOG VERSION: 2.0
[1429126364] Finished daemonizing... (New PID=16853)
[1429126364] Warning: File '/dev/shm/host-perfdata' could not be opened - host performance data will not be written to file!
[1429126364] Warning: File '/dev/shm/service-perfdata' could not be opened - service performance data will not be written to file!
[1429126365] Error: Unable to open file '/dev/shm/status.dat' for writing: Bad file descriptor
[1429126365] Error: Unable to rename file '/var/nagios/nagios.tmpAyUGm6' to '/dev/shm/status.dat': Bad file descriptor
[1429126365] Error: Unable to update status data file '/dev/shm/status.dat': Bad file descriptor

Checks appear to be running fine in the background when I look at processes running under nagios user.

This is very urgent to my company, because although inherited, this is a critical enterprise instance. We are migrating it to a new XI instance in the same datacenter, but we need this working until that work is completed. Any assistance much appreciated.

Re: Could not read host and service status information

Posted: Wed Apr 15, 2015 2:37 pm
by crafael01
Forgot to add that verification of config files show 0 warnings and 0 errors.

Re: Could not read host and service status information

Posted: Wed Apr 15, 2015 3:55 pm
by abrist
status.dat is requried for the cgis to display data, and your is not getting created:
crafael01 wrote:[1429126365] Error: Unable to rename file '/var/nagios/nagios.tmpAyUGm6' to '/dev/shm/status.dat': Bad file descriptor
[1429126365] Error: Unable to update status data file '/dev/shm/status.dat': Bad file descriptor
It is unusual to write directly to /dev. You most likely have to create an shm, /tmp, or ramdisk mount somewhere else in your filesystem. Have you considered just using /tmp or create a new ramdisk and adding it to /etc/fstab?

Re: Could not read host and service status information

Posted: Wed Apr 15, 2015 4:54 pm
by crafael01
You're right that is unusual to write to /dev. I'm not sure why it was setup like that. I see it in the config:
host_perfdata_file=/dev/shm/host-perfdata
service_perfdata_file=/dev/shm/service-perfdata

I'm not very familiar with what shm is. Is it as simple as creating a directory, changing permissions for nagios, placing that in fstab, and modifying the path in the nagios.cfg?

Re: Could not read host and service status information

Posted: Thu Apr 16, 2015 12:05 pm
by tgriep
Try running the following to search for those folders.

Code: Select all

find / -name service-perfdata 
find / -name host-perfdata 

Re: Could not read host and service status information

Posted: Thu Apr 16, 2015 2:10 pm
by crafael01
Nothing was found in find, so the directories must not exist anymore. Can I just create them as standard directories?

Re: Could not read host and service status information

Posted: Thu Apr 16, 2015 2:57 pm
by jolson
Firstly, if the service-perfdata and host-perfdata files weren't created when Nagios started, I don't think creating them will do us any good.

The errors here:

Code: Select all

[1429126365] Error: Unable to open file '/dev/shm/status.dat' for writing: Bad file descriptor
[1429126365] Error: Unable to rename file '/var/nagios/nagios.tmpAyUGm6' to '/dev/shm/status.dat': Bad file descriptor
[1429126365] Error: Unable to update status data file '/dev/shm/status.dat': Bad file descriptor
Lead me to believe that something is wrong with either the permissions of the directory /dev/shm or the files themselves.

If you open up nagios.cfg (/usr/local/nagios/etc/nagios.cfg) you can define where the host-perdata, service-perfdata, and status.dat files should exist. I suggest changing the directory to either a ramdisk (as abrist mentioned) or at least back to the default (/usr/local/nagios/var/host-perfdata /usr/local/nagios/var/service-perdata and /usr/local/nagios/var/status.dat).

Does that all make sense? After the above changes, you can restart Nagios to re-write those files:

Code: Select all

service nagios restart

Re: Could not read host and service status information

Posted: Thu Apr 16, 2015 3:02 pm
by crafael01
Yep that makes sense. I will give that a shot and update later. Thanks!

Re: Could not read host and service status information

Posted: Thu Apr 16, 2015 3:03 pm
by jolson
Sounds great! I look forward to your response.

Re: Could not read host and service status information

Posted: Thu Apr 16, 2015 3:34 pm
by crafael01
I tried changing to the default. the /usr/local/nagios/var directory did not exist, so I created that first. I'm still getting the same errors, although for the new directory:

[1429216244] Warning: File '/usr/local/nagios/var/host-perfdata' could not be opened - host performance data will not be written to file!
[1429216244] Warning: File '/usr/local/nagios/var/service-perfdata' could not be opened - service performance data will not be written to file!
[1429216249] Error: Unable to open file '/usr/local/nagios/var/status.dat' for writing: Bad file descriptor
[1429216249] Error: Unable to rename file '/var/nagios/nagios.tmpo75drA' to '/usr/local/nagios/var/status.dat': Bad file descriptor
[1429216249] Error: Unable to update status data file '/usr/local/nagios/var/status.dat': Bad file descriptor