Snapshots stopped

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Snapshots stopped

Post by Fred Kroeger »

I'm running a 2 node NLS cluster (recently upgraded to NLS 1.4.0) -and noticed that the snapshots stopped being generated on 2/1/16.
In the Snapshot Table (in the Backup & Maint screen) the name of all the existing snapshots (from 7/12/15 to 2/1/16) is showing as N/A which I'm pretty sure was not the case last year.
I did notice at the start of the year that snapshots had been created that had a date of 2014 so something happened around then I believe. I have deleted these since then.
The only thing I have done differently from the doco is that the Repository is local to each node - I haven't setup a shared filesystem (NFS) as I don't have another server with enough disk to act as the Repository.
Having said that, it had been working fine for almost a month before it stopped.
Any tips on troubleshooting the snapshots?
regards... Fred
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Snapshots stopped

Post by Box293 »

Fred Kroeger wrote:The only thing I have done differently from the doco is that the Repository is local to each node - I haven't setup a shared filesystem (NFS) as I don't have another server with enough disk to act as the Repository.
This needs to be a shared repository as the node which performs the job can be either one, I don't know the logic behind how it chooses the node. It most likely co-incided with a reboot of one of the nodes.

I'll get another tech to chime in on this too.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Snapshots stopped

Post by Fred Kroeger »

Thanks Troy
Yes I understand why the shared filesystem is required, which is why I tried it first as a local repository. It appears to choose the master node to run the snapshots on.
As it was all working, I was happy to leave it as it was knowing that I could only restore a snapshot from a single node.
regards... Fred
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Snapshots stopped

Post by jolson »

I haven't setup a shared filesystem (NFS) as I don't have another server with enough disk to act as the Repository.
The logic regarding the node that 'picks up' the backup job is random - it can be either node. It's very likely that you have had many backup jobs fail to this, and the one node with the backup system would 'catch-up' when it randomly received the backup job. I highly recommend setting up NFS (even on one of your NLS nodes) so that either node can perform the backup properly - the system can have erroneous errors without proper backup mounts in place.

Now, you snapshots stopping could be related to this - or it might not be. I take it that the snapshots stopped working when you upgraded to 1.4.0? Try this:

Code: Select all

cd /your/backup/directory
mkdir oldbackups
mv * oldbackups
Then re-run your backup job from the GUI (Administration -> Command Subsystem -> Backup/Maintenance). When the backup job runs, a random node will pick it up and attempt to run it - hopefully your master node (or if you've setup a shared repository, either node is fine).

Let me know if this helps - occasionally the old backup system needs to be moved to make way for the new one - but not all of the time.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Snapshots stopped

Post by Fred Kroeger »

Rather than test unsupported processes, I have now created an NFS share on one of the nodes and the other node has mounted it OK.
I moved the old backups as suggested - I can see the oldbackups folder on both nodes OK and the Snapshots table in the Backup/Maintenance screen now has no entries.
Ran the Backup Maint command and now the table looks like below. What is strange is that there is an entry for 2015.01.07 - server wasn't built until December 2015 and only one index is listed as successful.
Looking at the Repository indices folder that was created , that rogue entry is shown in there now:

Code: Select all

# ls -la repositories/indices/
total 180
drwxr-xr-x 45 nagios users 4096 Jan 18 10:00 .
drwxr-xr-x  4 nagios users 4096 Jan 18 10:09 ..
[color=#FF0000]drwxr-xr-x  7 nagios users 4096 Jan 18 10:06 logstash-2015.01.07[/color]
drwxr-xr-x  7 nagios users 4096 Jan 18 10:04 logstash-2015.12.07
drwxr-xr-x  7 nagios users 4096 Jan 18 10:03 logstash-2015.12.08
drwxr-xr-x  7 nagios users 4096 Jan 18 10:04 logstash-2015.12.09
Snapshot.PNG
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Snapshots stopped

Post by jolson »

What is strange is that there is an entry for 2015.01.07 - server wasn't built until December 2015 and only one index is listed as successful.
Is there any chance that this index does exist on your server? Check the 'Administration -> Index Status' page and look for the index in question. If a machine that is sending logs has a date set in the past, it's possible that it is opening old indexes.

Jesse
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Snapshots stopped

Post by Fred Kroeger »

Didn't think it could happen - but it got worse....
Yes there is definitely an index for 07/01/15
index.PNG
However.... today in my Snapshot table I have two entries for every index and last nights snapshot updated the index for 27/12/15 ?
Snapshot-2.PNG
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Snapshots stopped

Post by jolson »

It would be useful to look at the index that was opened in the past so that you may see which server is causing the problem. Use a custom time filter to travel back in time:
2016-01-19 09_17_42-Dashboard • Nagios Log Server.png
Your snapshot subsystem looks fine - we have moved to an incremental backup system, so that snapshots will always be taken of every index regardless of whether or not the data has changed.

The interface is a little messy now, which will be corrected in a future version. Thanks!
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Snapshots stopped

Post by Fred Kroeger »

You're right about the messy listing , I now have 3 listings, each containing all the indexes.
So back to the starnge index - it's a syslog from the NLS server itself. It appears that there was a 6minute window where the logs reported the wrong date. This would have coincided with when I upgraded NLS, not sure how or why the date would get changed during the upgrade - but I will delete it .
Thanks for getting me on the right track.

BTW - when I went to delete the index via the Index Status Screen - noticed a typo at the bottom of the screen - should be " indices "
Capture.PNG
regards... Fred
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Snapshots stopped

Post by jolson »

Good call! I'll get that typo fixed up - thanks. Is this case good to close?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked