I am trying to figure out why our cluster is constantly hanging up. I am trying to bump the breaker limits in attempt to workaround some of the out of memory issue. We have 3 nodes and each have 16GB of RAM.
I am seeing the following errors while tailing the logs, but not sure what these are, and how to address this.
Are your nodes physically far apart? What is the average ping time from node to node? We recommend keeping nodes in the same physical location so as to minimize ping time. The timeout you're experiencing is typically due to nodes being unreachable.
If your nodes are in the same datacenter, we could try bumping up the timeout interval - but I'm afraid that this measure would only be a 'bandaid' and not a true solution.
TwitsBlog Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
They are on the same datacenter and the same subnet. Once I reboot all nodes and get them back up and running. This seems to subside. However, I think these are some of the errors that I see in and around the time the cluster falls over.
The only sign I get usually is visually seeing less activity on all nodes in XI, and when I check the nodes, it just hangs up and the web GUI, this is usually when I find the cluster to be at red or a node has fallen off.
That's crummy (and surprising) - I'll suggest to the devs that get put on there.
Also - just to be clear I'm not suggesting that there is a system problem, but we should try to isolate before we go digging too deep into what ES has going on. Maybe your vSphere performance tab can lend a hand?
Any messages that indicate elasticsearch being killed? If so, you'll likely need to bump up the memory allocated to your instances.
TwitsBlog Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.