Cluster failure and UDP syslogs

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Cluster failure and UDP syslogs

Post by rkennedy »

How much data do you have incoming per day split between the 3 machines? Also, are you using local disks or NAS / SAN attached mounts?

I have a theory that you ended up hitting a file descriptor limit, which then in turn caused the machine to become out of sync from the cluster, and since resources became unavailable it didn't know what to do. It's hard to say since everything is working at this point though.
Former Nagios Employee
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Cluster failure and UDP syslogs

Post by CFT6Server »

I think trying to adjust the limit is a good start. We have quite a large amount of inputs and probably pushing the limits a bit. Here are some details. Last night looks like the inputs stopped again. Although no logs on the elasticsearch or logstash side. (I am still looking through the nodes) Just seems that certain nodes just stop taking any logs. Cluster health in this case was still green, so slightly different, but I am guessing that's how it starts perhaps? I also noticed that our local configurations are all gone. (This consistently happens after a crash) So the local file input configurations is just no where to be found.

Overall statistics
index status.JPG
Indices (we should be doing anywhere from 160 to 200G or so average per day) anything less than that means logs are dropped or something isn't working. Notice 22nd to 26th, that's where the cluster hard crashed.
indices.JPG
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Cluster failure and UDP syslogs

Post by rkennedy »

Increasing those limits won't hurt, and it will help us out to see if that's the same case in the future.
I also noticed that our local configurations are all gone. (This consistently happens after a crash) So the local file input configurations is just no where to be found.
Which configurations are you referring to?
Indices (we should be doing anywhere from 160 to 200G or so average per day) anything less than that means logs are dropped or something isn't working. Notice 22nd to 26th, that's where the cluster hard crashed.
Can you post a screenshot of your backup & maintenance page(s) (all pages if they are different between machines)? With this much data, I have a feeling that's part of the culprit as well.

Another thought - is there a reason you're sending logs to only 3 of the 6 members?
Former Nagios Employee
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Cluster failure and UDP syslogs

Post by CFT6Server »

The local configurations that are node specific. They don't seem to stick.

Our backup and maintenance settings is same for all the nodes in the cluster.
backup and maintenance.JPG
We are only sending to 3 nodes as the other 3 was not going to be permanent when we first implemented. However, since there's isn't a native way to load balance the sources to all nodes, we are sending to nodes by source type. So one type goes to one node.
You do not have the required permissions to view the files attached to this post.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Cluster failure and UDP syslogs

Post by rkennedy »

The local configurations that are node specific. They don't seem to stick.
Could you please clarify, which configuration you're talking about? Just trying to understand what part of the local configuration you're referring to.

Has increasing those limits helped to stop the error in the future, or has it still persisted?
Former Nagios Employee
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Cluster failure and UDP syslogs

Post by CFT6Server »

this is the local configurations (per instance) where you can specify inputs specific to the local node.
CONFIG.JPG
I have not increased the file descriptors yet, but I have not seen any issues with the cluster thus far.
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster failure and UDP syslogs

Post by scottwilkerson »

CFT6Server wrote: (I am still looking through the nodes) Just seems that certain nodes just stop taking any logs.
I'm going to throw this into the mix, with this volume of data coming into 3 instances, you may want to bump up the heap allocation for logstash by editing


change this

Code: Select all

#LS_HEAP_SIZE="256m"
to something like this

Code: Select all

LS_HEAP_SIZE="2048m"
then

Code: Select all

service logstash restart
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Cluster failure and UDP syslogs

Post by CFT6Server »

Thanks. For our implementation, i have the LS heap set to 1024m. But I'll increase it. I edited the config in /etc/sysconfig/logstash
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Cluster failure and UDP syslogs

Post by rkennedy »

Did that help, or are you still experiencing issues?
Former Nagios Employee
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: Cluster failure and UDP syslogs

Post by CFT6Server »

We did not change the setting. the LS heap was already at 1024m.
Locked