Logstash fails just before or during scheduled snapshot

rferebee · Post by **rferebee** » Wed Nov 21, 2018 10:17 am

Hello,

We've encountered an issue where logstash is failing just before or during our scheduled snapshots and we're unable to complete snapshots as a result. The most recent occurrence, logstash failed about 2 hours into our snapshot window and Log Server is acting like it's still running a snapshot even though in Command Subsystem it says Job Status "Waiting" with a next run time of 1:30AM tomorrow.

Typically we schedule our snapshots to run at 22:30 every day. Recently, our snapshots seem to be running whenever they feel like it and won't complete until well into the following day. If logstash fails after our snapshot starts then the Command Subsystem will say that the snapshot started right when logstash failed.

Right now, my entire interface is locked up because I clicked on Snapshots & Maintenance which usually only happens when a snapshot is still running. It has been several weeks and our snapshots have been extremely sporadic. We're at the point where we're losing data because our index are no longer overlapping in the snapshots.

I'm not sure what else I can look at to figure out what's going on. Thank you.

rferebee · Post by **rferebee** » Wed Nov 21, 2018 10:30 am

I should add some information about our setup:

2 VMs in running in a cluster (LS1 and LS2)

6 CPUs per VM
64GBs of RAM per VM
6TB of storage per VM

Our Log Server repository has 62TB of storage with 10TB free (we recently increased this storage space when we found out we could no longer create a snapshot).

Currently, each one of our snapshots is roughly 3TB. We snapshot 20 indexes at about 150GB per index.

Post by **cdienger** » Wed Nov 21, 2018 11:10 am

It sounds like you may be running into the problem described in https://support.nagios.com/kb/article/n ... g-576.html. Go through the doc and make the changes to both NLS machines.

rferebee · Post by **rferebee** » Wed Nov 21, 2018 11:43 am

I will make the changes and update you accordingly. Thank you very much for the prompt reply!

Post by **cdienger** » Wed Nov 21, 2018 12:13 pm

No problem! We'll be here (except after 2 today and the rest of the week

) waiting for the results

rferebee · Post by **rferebee** » Mon Nov 26, 2018 10:56 am

Ok, so this seems to have resolved our issue.

I was wondering if you could elaborate on what those two variables control though? It looks like we're getting snapshots, but each one has drastically reduced their disk consumption for whatever reason. Each snap is looks like it's only a few hundred GBs since making this change.

Post by **cdienger** » Mon Nov 26, 2018 4:07 pm

Glad to hear. The memory option allocates more memory to the logstash java process(starts logstash with the java option"-Xmx1000m"). The log file option allows the OS to open more files at a time. It's more likely the memory option resolved the issue and the large snapshots could be the result of incomplete snapshots. These will clear out on their own over time per the maintenance settings.

rferebee · Post by **rferebee** » Tue Nov 27, 2018 10:58 am

Ok, thanks for the information.

It's my understanding that a snapshot is a backup of all the open indexes up whatever number the user specifies, in my case 20. If each of my indexes is 150GBs, shouldn't my snapshot be around 3TBs? Or, is there some type of optimization going on which reduces the size of the indexes when not in use that I'm not aware of?

Post by **cdienger** » Tue Nov 27, 2018 3:02 pm

A snapshot is a diff from the previous snapshot.

rferebee · Post by **rferebee** » Tue Nov 27, 2018 4:11 pm

Oh, so it's incremental. Gotcha.

Nagios Support Forum

Logstash fails just before or during scheduled snapshot

Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot

Re: Logstash fails just before or during scheduled snapshot