Cluster failure and UDP syslogs
Re: Cluster failure and UDP syslogs
How much data do you have incoming per day split between the 3 machines? Also, are you using local disks or NAS / SAN attached mounts?
I have a theory that you ended up hitting a file descriptor limit, which then in turn caused the machine to become out of sync from the cluster, and since resources became unavailable it didn't know what to do. It's hard to say since everything is working at this point though.
I have a theory that you ended up hitting a file descriptor limit, which then in turn caused the machine to become out of sync from the cluster, and since resources became unavailable it didn't know what to do. It's hard to say since everything is working at this point though.
Former Nagios Employee
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Cluster failure and UDP syslogs
I think trying to adjust the limit is a good start. We have quite a large amount of inputs and probably pushing the limits a bit. Here are some details. Last night looks like the inputs stopped again. Although no logs on the elasticsearch or logstash side. (I am still looking through the nodes) Just seems that certain nodes just stop taking any logs. Cluster health in this case was still green, so slightly different, but I am guessing that's how it starts perhaps? I also noticed that our local configurations are all gone. (This consistently happens after a crash) So the local file input configurations is just no where to be found.
Overall statistics Indices (we should be doing anywhere from 160 to 200G or so average per day) anything less than that means logs are dropped or something isn't working. Notice 22nd to 26th, that's where the cluster hard crashed.
Overall statistics Indices (we should be doing anywhere from 160 to 200G or so average per day) anything less than that means logs are dropped or something isn't working. Notice 22nd to 26th, that's where the cluster hard crashed.
You do not have the required permissions to view the files attached to this post.
Re: Cluster failure and UDP syslogs
Increasing those limits won't hurt, and it will help us out to see if that's the same case in the future.
Another thought - is there a reason you're sending logs to only 3 of the 6 members?
Which configurations are you referring to?I also noticed that our local configurations are all gone. (This consistently happens after a crash) So the local file input configurations is just no where to be found.
Can you post a screenshot of your backup & maintenance page(s) (all pages if they are different between machines)? With this much data, I have a feeling that's part of the culprit as well.Indices (we should be doing anywhere from 160 to 200G or so average per day) anything less than that means logs are dropped or something isn't working. Notice 22nd to 26th, that's where the cluster hard crashed.
Another thought - is there a reason you're sending logs to only 3 of the 6 members?
Former Nagios Employee
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Cluster failure and UDP syslogs
The local configurations that are node specific. They don't seem to stick.
Our backup and maintenance settings is same for all the nodes in the cluster. We are only sending to 3 nodes as the other 3 was not going to be permanent when we first implemented. However, since there's isn't a native way to load balance the sources to all nodes, we are sending to nodes by source type. So one type goes to one node.
Our backup and maintenance settings is same for all the nodes in the cluster. We are only sending to 3 nodes as the other 3 was not going to be permanent when we first implemented. However, since there's isn't a native way to load balance the sources to all nodes, we are sending to nodes by source type. So one type goes to one node.
You do not have the required permissions to view the files attached to this post.
Re: Cluster failure and UDP syslogs
Could you please clarify, which configuration you're talking about? Just trying to understand what part of the local configuration you're referring to.The local configurations that are node specific. They don't seem to stick.
Has increasing those limits helped to stop the error in the future, or has it still persisted?
Former Nagios Employee
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Cluster failure and UDP syslogs
this is the local configurations (per instance) where you can specify inputs specific to the local node.
I have not increased the file descriptors yet, but I have not seen any issues with the cluster thus far.
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Cluster failure and UDP syslogs
I'm going to throw this into the mix, with this volume of data coming into 3 instances, you may want to bump up the heap allocation for logstash by editingCFT6Server wrote: (I am still looking through the nodes) Just seems that certain nodes just stop taking any logs.
change this
Code: Select all
#LS_HEAP_SIZE="256m"Code: Select all
LS_HEAP_SIZE="2048m"Code: Select all
service logstash restart-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Cluster failure and UDP syslogs
Thanks. For our implementation, i have the LS heap set to 1024m. But I'll increase it. I edited the config in /etc/sysconfig/logstash
Re: Cluster failure and UDP syslogs
Did that help, or are you still experiencing issues?
Former Nagios Employee
-
CFT6Server
- Posts: 506
- Joined: Wed Apr 15, 2015 4:21 pm
Re: Cluster failure and UDP syslogs
We did not change the setting. the LS heap was already at 1024m.