Hello,
We've encountered an issue where logstash is failing just before or during our scheduled snapshots and we're unable to complete snapshots as a result. The most recent occurrence, logstash failed about 2 hours into our snapshot window and Log Server is acting like it's still running a snapshot even though in Command Subsystem it says Job Status "Waiting" with a next run time of 1:30AM tomorrow.
Typically we schedule our snapshots to run at 22:30 every day. Recently, our snapshots seem to be running whenever they feel like it and won't complete until well into the following day. If logstash fails after our snapshot starts then the Command Subsystem will say that the snapshot started right when logstash failed.
Right now, my entire interface is locked up because I clicked on Snapshots & Maintenance which usually only happens when a snapshot is still running. It has been several weeks and our snapshots have been extremely sporadic. We're at the point where we're losing data because our index are no longer overlapping in the snapshots.
I'm not sure what else I can look at to figure out what's going on. Thank you.
Logstash fails just before or during scheduled snapshot
Re: Logstash fails just before or during scheduled snapshot
I should add some information about our setup:
2 VMs in running in a cluster (LS1 and LS2)
6 CPUs per VM
64GBs of RAM per VM
6TB of storage per VM
Our Log Server repository has 62TB of storage with 10TB free (we recently increased this storage space when we found out we could no longer create a snapshot).
Currently, each one of our snapshots is roughly 3TB. We snapshot 20 indexes at about 150GB per index.
2 VMs in running in a cluster (LS1 and LS2)
6 CPUs per VM
64GBs of RAM per VM
6TB of storage per VM
Our Log Server repository has 62TB of storage with 10TB free (we recently increased this storage space when we found out we could no longer create a snapshot).
Currently, each one of our snapshots is roughly 3TB. We snapshot 20 indexes at about 150GB per index.
Re: Logstash fails just before or during scheduled snapshot
It sounds like you may be running into the problem described in https://support.nagios.com/kb/article/n ... g-576.html. Go through the doc and make the changes to both NLS machines.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Logstash fails just before or during scheduled snapshot
I will make the changes and update you accordingly. Thank you very much for the prompt reply!
Re: Logstash fails just before or during scheduled snapshot
No problem! We'll be here (except after 2 today and the rest of the week
) waiting for the results
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Logstash fails just before or during scheduled snapshot
Ok, so this seems to have resolved our issue.
I was wondering if you could elaborate on what those two variables control though? It looks like we're getting snapshots, but each one has drastically reduced their disk consumption for whatever reason. Each snap is looks like it's only a few hundred GBs since making this change.
I was wondering if you could elaborate on what those two variables control though? It looks like we're getting snapshots, but each one has drastically reduced their disk consumption for whatever reason. Each snap is looks like it's only a few hundred GBs since making this change.
Re: Logstash fails just before or during scheduled snapshot
Glad to hear. The memory option allocates more memory to the logstash java process(starts logstash with the java option"-Xmx1000m"). The log file option allows the OS to open more files at a time. It's more likely the memory option resolved the issue and the large snapshots could be the result of incomplete snapshots. These will clear out on their own over time per the maintenance settings.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Logstash fails just before or during scheduled snapshot
Ok, thanks for the information.
It's my understanding that a snapshot is a backup of all the open indexes up whatever number the user specifies, in my case 20. If each of my indexes is 150GBs, shouldn't my snapshot be around 3TBs? Or, is there some type of optimization going on which reduces the size of the indexes when not in use that I'm not aware of?
It's my understanding that a snapshot is a backup of all the open indexes up whatever number the user specifies, in my case 20. If each of my indexes is 150GBs, shouldn't my snapshot be around 3TBs? Or, is there some type of optimization going on which reduces the size of the indexes when not in use that I'm not aware of?
Re: Logstash fails just before or during scheduled snapshot
A snapshot is a diff from the previous snapshot.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Logstash fails just before or during scheduled snapshot
Oh, so it's incremental. Gotcha.