Hi,
We have recently begun experiencing an issue where some services show "no data to display" when trying to graph them using Graph Explorer -> Scalable Performance Graph.
There are two main hosts for which we have noticed this behaviour (so far), and they have the following characteristics:
- Windows machines; monitored using NRPE Agent
- Run MS SQL and are clustered together
- Some services are able to be graphed correctly (CPU usage, memory usage)
- Drive usage services show 'No data to display' (C:, E: & F: Drive Usage)
For the erroring services we can see that data is being collected correctly; notifications are sent out when thresholds are breached and show the current status/usage of the service. So the data is there, but is not being picked up by the graphing tool for some services. The RRD performance graphs for the erroring services also do not show data.
Our Nagios setup:
- Nagios XI v5.4.8
- Running on a CentOS VM
- 461 monitored hosts
- 3100 monitored services
Let me know if you need any more information for troubleshooting. Thanks in advance!
Regards,
Matt
'No data to display' for some hosts in Graph Explorer
-
crystal.then
- Posts: 57
- Joined: Mon Oct 27, 2014 12:05 am
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: 'No data to display' for some hosts in Graph Explorer
Hello, @crystal.then. I know you said that the data is being collected correctly, but I just want to clarify. If you go click on a "broken" service and navigate to the advanced tab, are you able to see RRD output next to the performance data table? Can you check all "broken" services this way and let us know?
You may delete the corresponding RRD and XML files for the broken services from /usr/local/nagios/share/perfdata/. Or maybe move them to a different directory for now. This will force Nagios to recreate a new RRD's. You may also entirely clean out the contents of this directory /var/lib/mrtg/, it's all temporary files.
After that please give the system up to 30 min and check back on the services in question. If the problem is still there, please share your system profile with us so we can go over every major log file.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to your next post, or you could upload it to the cloud storage of your choice and share a link with me in a pm.
After that please give the system up to 30 min and check back on the services in question. If the problem is still there, please share your system profile with us so we can go over every major log file.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to your next post, or you could upload it to the cloud storage of your choice and share a link with me in a pm.
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
crystal.then
- Posts: 57
- Joined: Mon Oct 27, 2014 12:05 am
Re: 'No data to display' for some hosts in Graph Explorer
Hi @npolovenko, thank you for your prompt response!
I've gone through the steps you mentioned, and have found that the performance data is not showing in the Advanced section of affected services. For one of the services I removed the xml and rrd files from the /usr/local/nagios/share/perfdata/ directory, but it has not been recreated (it has been well over 30 minutes since removing the file). I guess this is because performance data is not being correctly ingested. I reviewed the /var/lib/mrtg folder but the files do not seem relevant to the host and services in question, so I did not remove any files.
Please see attached my system profile. I hope this can shed some light on the issue.
Thanks again for your help so far. Let me know if you need any more information.
Regards,
Matt
I've gone through the steps you mentioned, and have found that the performance data is not showing in the Advanced section of affected services. For one of the services I removed the xml and rrd files from the /usr/local/nagios/share/perfdata/ directory, but it has not been recreated (it has been well over 30 minutes since removing the file). I guess this is because performance data is not being correctly ingested. I reviewed the /var/lib/mrtg folder but the files do not seem relevant to the host and services in question, so I did not remove any files.
Please see attached my system profile. I hope this can shed some light on the issue.
Thanks again for your help so far. Let me know if you need any more information.
Regards,
Matt
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: 'No data to display' for some hosts in Graph Explorer
Hi, @crystal.then.
1. Let's increase the NPCD timeout value:
Open the following config file:
and change the:
to
2. Lets increase the load threshold:
Open the following config file:
and change the:
to
3. You can delete everything from: /var/lib/mrtg/ Those are all temp files.
4. I've seen some log entries indicating that you have a few crashed DB tables. You may run a db_repair script:
*It may take a while for this script to finish since you have a large system.
5. Please run the following commands:
7. (optional) Please increase the size of the root partition, it says its 86% used. It's not critical at this point but it's something that may cause problems in the future.
1. Let's increase the NPCD timeout value:
Open the following config file:
Code: Select all
nano /usr/local/nagios/etc/pnp/process_perfdata.cfgCode: Select all
TIMEOUT = 5Code: Select all
TIMEOUT = 40Open the following config file:
Code: Select all
nano /usr/local/nagios/etc/pnp/npcd.cfgCode: Select all
load_threshold = 10.0Code: Select all
load_threshold = 30.04. I've seen some log entries indicating that you have a few crashed DB tables. You may run a db_repair script:
Code: Select all
cd /usr/local/nagiosxi/scripts
./repair_databases.sh5. Please run the following commands:
Code: Select all
service nagios stop
killall -9 nagios
service nagios start
service crond restart
service npcd restartAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.