Page 2 of 3
Re: Cluster failure and UDP syslogs
Posted: Fri Jul 29, 2016 9:43 am
by rkennedy
How much data do you have incoming per day split between the 3 machines? Also, are you using local disks or NAS / SAN attached mounts?
I have a theory that you ended up hitting a file descriptor limit, which then in turn caused the machine to become out of sync from the cluster, and since resources became unavailable it didn't know what to do. It's hard to say since everything is working at this point though.
Re: Cluster failure and UDP syslogs
Posted: Fri Jul 29, 2016 10:26 am
by CFT6Server
I think trying to adjust the limit is a good start. We have quite a large amount of inputs and probably pushing the limits a bit. Here are some details. Last night looks like the inputs stopped again. Although no logs on the elasticsearch or logstash side. (I am still looking through the nodes) Just seems that certain nodes just stop taking any logs. Cluster health in this case was still green, so slightly different, but I am guessing that's how it starts perhaps? I also noticed that our local configurations are all gone. (This consistently happens after a crash) So the local file input configurations is just no where to be found.
Overall statistics
index status.JPG
Indices (we should be doing anywhere from 160 to 200G or so average per day) anything less than that means logs are dropped or something isn't working. Notice 22nd to 26th, that's where the cluster hard crashed.
indices.JPG
Re: Cluster failure and UDP syslogs
Posted: Fri Jul 29, 2016 1:55 pm
by rkennedy
Increasing those limits won't hurt, and it will help us out to see if that's the same case in the future.
I also noticed that our local configurations are all gone. (This consistently happens after a crash) So the local file input configurations is just no where to be found.
Which configurations are you referring to?
Indices (we should be doing anywhere from 160 to 200G or so average per day) anything less than that means logs are dropped or something isn't working. Notice 22nd to 26th, that's where the cluster hard crashed.
Can you post a screenshot of your backup & maintenance page(s) (all pages if they are different between machines)? With this much data, I have a feeling that's part of the culprit as well.
Another thought - is there a reason you're sending logs to only 3 of the 6 members?
Re: Cluster failure and UDP syslogs
Posted: Thu Aug 04, 2016 3:42 pm
by CFT6Server
The local configurations that are node specific. They don't seem to stick.
Our backup and maintenance settings is same for all the nodes in the cluster.
backup and maintenance.JPG
We are only sending to 3 nodes as the other 3 was not going to be permanent when we first implemented. However, since there's isn't a native way to load balance the sources to all nodes, we are sending to nodes by source type. So one type goes to one node.
Re: Cluster failure and UDP syslogs
Posted: Thu Aug 04, 2016 4:43 pm
by rkennedy
The local configurations that are node specific. They don't seem to stick.
Could you please clarify, which configuration you're talking about? Just trying to understand what part of the local configuration you're referring to.
Has increasing those limits helped to stop the error in the future, or has it still persisted?
Re: Cluster failure and UDP syslogs
Posted: Fri Aug 05, 2016 12:08 pm
by CFT6Server
this is the local configurations (per instance) where you can specify inputs specific to the local node.
CONFIG.JPG
I have not increased the file descriptors yet, but I have not seen any issues with the cluster thus far.
Re: Cluster failure and UDP syslogs
Posted: Fri Aug 05, 2016 1:01 pm
by scottwilkerson
CFT6Server wrote: (I am still looking through the nodes) Just seems that certain nodes just stop taking any logs.
I'm going to throw this into the mix, with this volume of data coming into 3 instances, you may want to bump up the heap allocation for logstash by editing
change this
to something like this
then
Re: Cluster failure and UDP syslogs
Posted: Fri Aug 05, 2016 2:36 pm
by CFT6Server
Thanks. For our implementation, i have the LS heap set to 1024m. But I'll increase it. I edited the config in /etc/sysconfig/logstash
Re: Cluster failure and UDP syslogs
Posted: Mon Aug 08, 2016 9:50 am
by rkennedy
Did that help, or are you still experiencing issues?
Re: Cluster failure and UDP syslogs
Posted: Thu Aug 18, 2016 10:26 am
by CFT6Server
We did not change the setting. the LS heap was already at 1024m.