'No data to display' for some hosts in Graph Explorer

crystal.then · Post by **crystal.then** » Mon Dec 11, 2017 2:22 am

Hi,

We have recently begun experiencing an issue where some services show "no data to display" when trying to graph them using Graph Explorer -> Scalable Performance Graph.

There are two main hosts for which we have noticed this behaviour (so far), and they have the following characteristics:

- Windows machines; monitored using NRPE Agent
- Run MS SQL and are clustered together
- Some services are able to be graphed correctly (CPU usage, memory usage)
- Drive usage services show 'No data to display' (C:, E: & F: Drive Usage)

For the erroring services we can see that data is being collected correctly; notifications are sent out when thresholds are breached and show the current status/usage of the service. So the data is there, but is not being picked up by the graphing tool for some services. The RRD performance graphs for the erroring services also do not show data.

Our Nagios setup:

- Nagios XI v5.4.8
- Running on a CentOS VM
- 461 monitored hosts
- 3100 monitored services

Let me know if you need any more information for troubleshooting. Thanks in advance!

Regards,
Matt

npolovenko · Post by **npolovenko** » Mon Dec 11, 2017 11:05 am

Hello, @crystal.then. I know you said that the data is being collected correctly, but I just want to clarify. If you go click on a "broken" service and navigate to the advanced tab, are you able to see RRD output next to the performance data table? Can you check all "broken" services this way and let us know?

rrd.png

You may delete the corresponding RRD and XML files for the broken services from /usr/local/nagios/share/perfdata/. Or maybe move them to a different directory for now. This will force Nagios to recreate a new RRD's. You may also entirely clean out the contents of this directory /var/lib/mrtg/, it's all temporary files.
After that please give the system up to 30 min and check back on the services in question. If the problem is still there, please share your system profile with us so we can go over every major log file.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to your next post, or you could upload it to the cloud storage of your choice and share a link with me in a pm.

crystal.then · Post by **crystal.then** » Tue Dec 12, 2017 1:32 am

Hi @npolovenko, thank you for your prompt response!

I've gone through the steps you mentioned, and have found that the performance data is not showing in the Advanced section of affected services. For one of the services I removed the xml and rrd files from the /usr/local/nagios/share/perfdata/ directory, but it has not been recreated (it has been well over 30 minutes since removing the file). I guess this is because performance data is not being correctly ingested. I reviewed the /var/lib/mrtg folder but the files do not seem relevant to the host and services in question, so I did not remove any files.

Please see attached my system profile. I hope this can shed some light on the issue.

Thanks again for your help so far. Let me know if you need any more information.

Regards,

Matt

npolovenko · Post by **npolovenko** » Tue Dec 12, 2017 11:42 am

Hi, @crystal.then.
1. Let's increase the NPCD timeout value:
Open the following config file:

Code: Select all

nano /usr/local/nagios/etc/pnp/process_perfdata.cfg

and change the:

Code: Select all

TIMEOUT = 5

to

Code: Select all

TIMEOUT = 40

2. Lets increase the load threshold:
Open the following config file:

Code: Select all

nano /usr/local/nagios/etc/pnp/npcd.cfg

and change the:

Code: Select all

load_threshold = 10.0

to

Code: Select all

load_threshold = 30.0

3. You can delete everything from: /var/lib/mrtg/ Those are all temp files.
4. I've seen some log entries indicating that you have a few crashed DB tables. You may run a db_repair script:

Code: Select all

cd /usr/local/nagiosxi/scripts
./repair_databases.sh

*It may take a while for this script to finish since you have a large system.

5. Please run the following commands:

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
service crond restart
service npcd restart

7. (optional) Please increase the size of the root partition, it says its 86% used. It's not critical at this point but it's something that may cause problems in the future.

Nagios Support Forum

'No data to display' for some hosts in Graph Explorer

'No data to display' for some hosts in Graph Explorer

Re: 'No data to display' for some hosts in Graph Explorer

Re: 'No data to display' for some hosts in Graph Explorer

Re: 'No data to display' for some hosts in Graph Explorer