Page 1 of 1

Missing Performance Graph Since 5.7.1 Upgrade

Posted: Wed Jul 01, 2020 4:25 pm
by NCATmax
Hello,

One of our service checks has stopped recording/reporting performance data after we upgraded Nagios to 5.7.1. The last performance data was working fine right until the upgrade occurred. After that, no data is shown on the graph. The service check still runs normally and reports "OK".

The specific check uses the "check_wmi_plus.pl" plugin. I have attached an image of the performance graphs, as well as the service check in the CCM.

I am not sure where to even begin on troubleshooting this issue, but I will gladly provide any additional information that may be needed.

Thank you for your assistance.

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Thu Jul 02, 2020 11:24 am
by ssax
Please run through this KB article to debug:

https://support.nagios.com/kb/article.php?id=9

Please send me a copy of your profile, you can download it from Admin > System Profile > Download Profile and upload it to the ticket by clicking the "choose item" link at the bottom of the menu.​ Make sure to wait until the file is finished uploading before clicking the Post Reply button.

Send me both RRD and XML files for this host/service from:
- Change HOSTNAME to the actual hostname

Code: Select all

/usr/local/nagios/share/perfdata/HOSTNAME/
Thank you!

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Thu Jul 02, 2020 3:04 pm
by NCATmax
Thank you for your reply.

I worked through that knowledgebase article.

The output from the first two commands are shown below:

Code: Select all

$ ls /usr/local/nagios/var/spool/perfdata/ | wc -l
2
$ ls /usr/local/nagios/var/spool/xidpe/ | wc -l
0
After increasing the logging level, I saw no errors or any indication of a problem in the /usr/local/nagios/var/perfdata.log file. I also saw no errors or any indication of a problem in the /usr/local/nagios/var/npcd.log file.

And the Nagios user account had not expired.


I have attached our System Profile to this message.

I have also attached the .rrd and .xml files for this service, from the specific host shown in the first post.

Please let me know if I can provide any additional information.

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Thu Jul 02, 2020 5:19 pm
by ssax
Taken from the XML file:

Code: Select all

-<RRD>
<RC>1</RC>
<TXT>/usr/local/nagios/share/perfdata/XXXXXXXX/Disk.rrd: found extra data on update argument: 438842359808:68.1:644241883136:2625594241024:59.7:4397910192128</TXT>
</RRD>
That means a new datasource metric has likely been added and it can't update the RRD.

Try renaming the RRD file in

Code: Select all

/usr/local/nagios/share/perfdata/HOSTNAME
, then force a check, wait 15 minutes, then check to see if it starts graphing. The rrd will automatically be rebuilt on the next check.

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Mon Jul 06, 2020 11:10 am
by NCATmax
Your directions did cause the service to start graphing again. It appears that two additional drives are now being monitored.

However, all of the performance data before that change is (understandably) no longer able to be viewed.

The historical performance data is very important for us to have. Is there a way to preserve that information? (I understand that it is still in the renamed file, I was hoping that the old and new data could be made visible in the performance graph section.)

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Tue Jul 07, 2020 6:23 pm
by ssax
Please attach the new RRD file.

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Wed Jul 08, 2020 8:34 am
by NCATmax
I have attached the new RRD file to this message.

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Thu Jul 09, 2020 11:31 am
by ssax
Unfortunately the merge process for two RRDs with a different number of datasources doesn't work, they need to have the same number of datasources to merge them together. It fails when I try to merge them with this tool:

https://exchange.nagios.org/directory/A ... ol/details

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Thu Jul 09, 2020 3:39 pm
by NCATmax
That is quite unfortunate.

This service monitors disk usage on one of our file servers. The service has data for at least the past year. After doing an upgrade to Nagios, the same service is now outputting additional performance data. And this makes all of the past data unusable.

Is there any recourse in this situation? That data can be fairly useful.

Re: Missing Performance Graph Since 5.7.1 Upgrade

Posted: Fri Jul 10, 2020 2:46 pm
by ssax
Here is a process you can use:

Follow this guide to install/setup the tool (ignore the title, this does what you want by adding extra data sources). It uses the OLD XML file and the OLD and NEW RRD files.

Code: Select all

https://support.nagios.com/kb/article/nagios-xi-icmp-and-ping-checks-stopped-graphing-149.html
Then install this in Admin > Manage Components:

Code: Select all

https://exchange.nagios.org/directory/Addons/Components/Performance-Data-Tool/details
Then make a new temporary directory:

Code: Select all

mkdir /usr/local/nagios/tmp/TEMP_RRD_MERGE
chown -R apache.nagios /usr/local/nagios/tmp/TEMP_RRD_MERGE
Now put the NEW XML file and the OLD RRD file into /usr/local/nagios/tmp/TEMP_RRD_MERGE (make sure they are named the same except the extension).

Run this after you'd done that and follow the instructions:

Code: Select all

cd /tmp
./fix_ds_quantity.sh -d /usr/local/nagios/tmp/TEMP_RRD_MERGE
It should say something like:
Are you sure? (y/n) y
Batch job confirmed by user.
Batch process started at Fri Jul 10 14:14:40 CDT 2020
Populating list of RRDs from the dircetory: /usr/local/nagios/tmp/TEMP_RRD_MERGE/
Backing up: /usr/local/nagios/tmp/TEMP_RRD_MERGE/Disk.rrd
Fixing permissions for file: /usr/local/nagios/tmp/TEMP_RRD_MERGE/Disk.rrd
/usr/local/nagios/tmp/TEMP_RRD_MERGE/Disk.rrd updated with 6 additional datasource(s)
Batch job finished at Fri Jul 10 14:14:48 CDT 2020.
A total of 1 file(s) were updated with a total of 6 datasource(s).
Now remove the XML file, leave the OLD (now updated) RRD file in there, and put the NEW RRD file into /usr/local/nagios/tmp/TEMP_RRD_MERGE (rename one of them if you need).

Run this after you've put them in there:

Code: Select all

chown -R apache.nagios /usr/local/nagios/tmp/TEMP_RRD_MERGE
Now go to Admin > Manage Components > Performance Data Tool > Options:
- Enter /usr/local/nagios/tmp/TEMP_RRD_MERGE
- Select "Folder containing Performance Data Files" from the Type dropdown
- Check the box next to it
- Click Apply Settings

Now click the Tools menu item in the very top menu bar:
- Under "Tools by Box293", click Performance Data Tool
- Click the Merge tab
- Select Mere Specific .rrd's
- Select the /usr/local/nagios/tmp/TEMP_RRD_MERGE from the Source Dropdown
- Select the OLD.rrd
- Select the /usr/local/nagios/tmp/TEMP_RRD_MERGE from the Destination Dropdown
- Select the NEW.rrd
- Click Perform Action
- Click the OK button on the popup and wait a few seconds and you'll see the merge operation show up below
- Click the Perform Merge button
- It should pop up a window showing the merge operation, you should OK as the output once it's done.

Now take the NEW RRD file from /usr/local/nagios/tmp/TEMP_RRD_MERGE and replace the one (rename it so it matches exactly) in /usr/local/nagios/share/perfdata/YOURHOSTNAME/.

Now go check the graphs to see if they show the merge data as well as the new data.