Logstash fails just before or during scheduled snapshot

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Logstash fails just before or during scheduled snapshot

Post by rferebee »

Hello,

We've encountered an issue where logstash is failing just before or during our scheduled snapshots and we're unable to complete snapshots as a result. The most recent occurrence, logstash failed about 2 hours into our snapshot window and Log Server is acting like it's still running a snapshot even though in Command Subsystem it says Job Status "Waiting" with a next run time of 1:30AM tomorrow.

Typically we schedule our snapshots to run at 22:30 every day. Recently, our snapshots seem to be running whenever they feel like it and won't complete until well into the following day. If logstash fails after our snapshot starts then the Command Subsystem will say that the snapshot started right when logstash failed.

Right now, my entire interface is locked up because I clicked on Snapshots & Maintenance which usually only happens when a snapshot is still running. It has been several weeks and our snapshots have been extremely sporadic. We're at the point where we're losing data because our index are no longer overlapping in the snapshots.

I'm not sure what else I can look at to figure out what's going on. Thank you.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Logstash fails just before or during scheduled snapshot

Post by rferebee »

I should add some information about our setup:

2 VMs in running in a cluster (LS1 and LS2)

6 CPUs per VM
64GBs of RAM per VM
6TB of storage per VM

Our Log Server repository has 62TB of storage with 10TB free (we recently increased this storage space when we found out we could no longer create a snapshot).

Currently, each one of our snapshots is roughly 3TB. We snapshot 20 indexes at about 150GB per index.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Logstash fails just before or during scheduled snapshot

Post by cdienger »

It sounds like you may be running into the problem described in https://support.nagios.com/kb/article/n ... g-576.html. Go through the doc and make the changes to both NLS machines.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Logstash fails just before or during scheduled snapshot

Post by rferebee »

I will make the changes and update you accordingly. Thank you very much for the prompt reply!
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Logstash fails just before or during scheduled snapshot

Post by cdienger »

No problem! We'll be here (except after 2 today and the rest of the week :) ) waiting for the results
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Logstash fails just before or during scheduled snapshot

Post by rferebee »

Ok, so this seems to have resolved our issue.

I was wondering if you could elaborate on what those two variables control though? It looks like we're getting snapshots, but each one has drastically reduced their disk consumption for whatever reason. Each snap is looks like it's only a few hundred GBs since making this change.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Logstash fails just before or during scheduled snapshot

Post by cdienger »

Glad to hear. The memory option allocates more memory to the logstash java process(starts logstash with the java option"-Xmx1000m"). The log file option allows the OS to open more files at a time. It's more likely the memory option resolved the issue and the large snapshots could be the result of incomplete snapshots. These will clear out on their own over time per the maintenance settings.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Logstash fails just before or during scheduled snapshot

Post by rferebee »

Ok, thanks for the information.

It's my understanding that a snapshot is a backup of all the open indexes up whatever number the user specifies, in my case 20. If each of my indexes is 150GBs, shouldn't my snapshot be around 3TBs? Or, is there some type of optimization going on which reduces the size of the indexes when not in use that I'm not aware of?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Logstash fails just before or during scheduled snapshot

Post by cdienger »

A snapshot is a diff from the previous snapshot.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Logstash fails just before or during scheduled snapshot

Post by rferebee »

Oh, so it's incremental. Gotcha.
Locked