Page 1 of 1

Missing graph data after restoring from snapshot

Posted: Thu Oct 31, 2019 9:57 am
by altsysrq
We had an issue yesterday where we decided to restore from the previous nights snapshot. This was because we had a service that appeared to have blank fields. Editing the service and saving would only create a new service with blank fields while saving the old service id with the altered fields. This is no longer my concern but may be relevant.

Now we are seeing that all graph history prior to the time of the restore are no longer present.
  • Is there a way to get this information back? Is there a way to merge that information with what has been recorded since the restore time?
  • How can we prevent a restore from breaking graphs?
Let me know if there is additional information I can provide to make this more helpful.
  • uname: Linux nagios.<mycompany>.com 3.10.0-1062.1.2.el7.x86_64 #1 SMP Mon Sep 16 14:19:51 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux​​
  • cat /etc/redhat-release: Red Hat Enterprise Linux Server release 7.7 (Maipo) (up to date)
  • php --version: PHP 5.6.40 (cli) (built: Jan 12 2019 13:11:15)

Re: Missing graph data after restoring from snapshot

Posted: Thu Oct 31, 2019 11:02 am
by cdienger
What version of XI is this? I believe it should be making a copy of /usr/local/nagios/share/perfdata/ but will test to verify. If you have a backup of this directory then you can just copy it over.

There are tools available for merging rrd files:

https://exchange.nagios.org/directory/A ... ol/details

Re: Missing graph data after restoring from snapshot

Posted: Thu Oct 31, 2019 2:41 pm
by cdienger
Confirmed that backups should contain perfdata(at least on recent releases). /usr/local/nagiosxi/scripts/backup_xi.sh will contain:

Code: Select all

echo "Backing up Nagios Core..."
tar czfp $mydir/nagios.tar.gz /usr/local/nagios

Re: Missing graph data after restoring from snapshot

Posted: Thu Oct 31, 2019 3:38 pm
by altsysrq
Thank you for the responses.

It does not appear that we have the perfdata directory in the snapshot. To be clear, in this instance I am referencing "Configuration Snapshots." This could be my issue. Additionally, our backup link displays the "Error: This component requires php-pecl-ssh2. You must run install.sh in /usr/local/nagiosxi/html/includes/components/scheduledbackups as root to use this component." message. I will probably have to address this with more research and a separate post/ticket.

I am also working with Nagios XI support on the topic of this thread. To reduce the amount of labor on my part explaining things I will be silent for a bit. I will post updates when/if it is resolved.

Re: Missing graph data after restoring from snapshot

Posted: Thu Oct 31, 2019 3:52 pm
by cdienger
Snapshots will not contain performance data information. The system may have been failing to write performance data for some reason prior and then importing the snapshot and restarting the service may have given it the kick it needed to start again.

We'll continue to work through the ticket.

Re: Missing graph data after restoring from snapshot

Posted: Tue Nov 05, 2019 10:59 am
by altsysrq
Apologies for the delay. This thread can be considered resolved. See below for my best explanation of what happened and how I resolved it.

tl;dr
Perfdata directories were moved to a new host named Bad_Host. This was the name I chose when we had a service with a blank field in the name. We were unable to apply any configuration changes. I tried to name it, thinking that would help resolve the issue. We ended up restoring from a snapshot. This restore made us think that it was the cause of losing all the perfdata.

We were using a less favorable version of PHP (5.6). I am uncertain if this is this case but with my discussion with support this may have been the issue. Upgrading to 7.2 per their instructions did resolve the backup module within Nagios, allowing it to work and test successfully. This did break the Performance Data Tool that should have allowed us to merge the data. This was not working under 5.6 though either. Instructions for 7.2 can be found here: https://exchange.nagios.org/directory/A ... ol/details

Nagios support pointed out that we might be able to use the command line version of the Performance Data Tool, located in "/usr/local/nagiosxi/html/includes/components/performancedatatool/scripts/rrdmerge". This worked and I was able to create a looping bash script that merged the data we had from the Bad_Host directories. That script looked like this:

Code: Select all

#!/bin/bash
## Loop for merging multiple RRD performance data files in Nagios
## RA:MBX:11/04/2019

## Make sure you back everything up before working with this script.

## Also an alternative: https://gist.github.com/arantius/2166343

## Command to use
rrd_merge=/usr/local/nagiosxi/html/includes/components/performancedatatool/scripts/rrdmerge
## Source directory
source1_dir=/tmp/perfdata
## Second source directory
## Ideally this should not be the destination but that will work
source2_dir=/usr/local/nagios/share/perfdata
## Destination directory
## In this instance it is the same as the second source
destin_dir=/usr/local/nagios/share/perfdata

## Get all hosts that you have prepared for to merge data
for node in $(ls $source1_dir)
do
  ## Get all rrd files inside the host directory and use this for the destination variables
  for perfdata in $(ls $source1_dir/$node | grep .rrd)
  do
    ## Don't copy if there is not already something to merge to
    ## Old services may have been removed and the rrd files left behind.
    if [ -f $source2_dir/$node/$perfdata ]; then
      ## echo first to test or pipe to file for reference later, or do a few at a time
      ## You may receive errors if the service was deleted and recreated under the same name. I have no confirmation of this but
      ## this could be a problem. I think this is indicated by the error: "RRDs have different number of DSes."
      echo "$rrd_merge $source1_dir/$node/$perfdata $source2_dir/$node/$perfdata $destin_dir/$node/$perfdata"
      ## $rrd_merge $source1_dir/$node/$perfdata $source2_dir/$node/$perfdata $destin_dir/$node/$perfdata
    fi
  done
done
If anyone has any insight as to how Nagios could have moved the performance data to Bad_Host let me know. If it is due to the PHP version we had I would like to get details on how that might be possible.

Aside from that I think this issue is resolved (albeit with a few questions that may go unanswered). Let me know if you have any questions.