Page 1 of 2

Additional node introduced to cluster - log collection drop

Posted: Wed Apr 17, 2019 4:11 pm
by rferebee
Hello,

We introduced an additional node to our cluster this morning around 8:30AM. Since then it appears our log collection has severely dropped off. See attached screen shot.

It looks like elasticsearch and logstash are running on all 3 nodes, so I'm not sure why we aren't collecting.

Is there anything I can do to troubleshoot this?

Re: Additional node introduced to cluster - log collection d

Posted: Wed Apr 17, 2019 5:55 pm
by rferebee
Also, the cluster has been attempting to relocate 2 shards for the last 7 hours.

I'm wondering if something is configured incorrectly on the new node I introduced.

Re: Additional node introduced to cluster - log collection d

Posted: Thu Apr 18, 2019 9:08 am
by cdienger
Please PM a profile from all 3 machines so I may review the config and logs.

Re: Additional node introduced to cluster - log collection d

Posted: Thu Apr 18, 2019 9:52 am
by rferebee
PM sent. Thank you!

Re: Additional node introduced to cluster - log collection d

Posted: Thu Apr 18, 2019 10:09 am
by cdienger
The logs are showing a lot of attempts by the new node to contact the remote repo, but unable to. If you run a "df -h" on the nodes, you'll see that the two old nodes have a remote share but the new node doesn't. Makes sure that the share is mounted on the new node and mounted to the same location - /nlsrepcc.

Re: Additional node introduced to cluster - log collection d

Posted: Thu Apr 18, 2019 10:33 am
by rferebee
Ok, I think I have it mounted now. Can you tell me which log you saw those connecting attempts in so I can make sure it's communicating properly?

Re: Additional node introduced to cluster - log collection d

Posted: Thu Apr 18, 2019 11:21 am
by cdienger
Those would be the elasticsearch logs:

Code: Select all

tail -f /var/log/elasticsearch/e4f9550c-f37c-417f-9cdc-283429a2a0a1.log
It showed frequent messages like:

Code: Select all

[repositories             ] [29dbb5cc-f936-4f0e-8a41-26b2277c7083] failed to create repository [fs][NLSREPCC]
org.elasticsearch.common.inject.CreationException: Guice creation errors:

Re: Additional node introduced to cluster - log collection d

Posted: Thu Apr 18, 2019 11:30 am
by rferebee
Ok, perfect thank you.

Was that the only potential issue you saw in reviewing our setup?

The other thing that worries me (it might be totally normal), is that it seems one node is always working harder than the others. For example, right now LSCC2 has its CPU totally maxed out and the other two nodes are sitting at around 25% load. Shouldn't there be more of a resource share happening?

Also, I read something related to ELK about a month ago. Is there anything we need to worry about in terms of which node is the primary or if there's more than one primary at any given time? I think it was called "split brain".

And, how do you recommend we do updates going forward? I've never managed a three node cluster, so I'm not to sure how reboots are going to effect the system, etc. Eventually we need to upgrade the other two nodes to CentOS 7. Can we remove a node in order to perform the upgrade and then re-introduce it? We're going to have a total of four nodes pretty soon.

Re: Additional node introduced to cluster - log collection d

Posted: Thu Apr 18, 2019 12:56 pm
by cdienger
That was the only one I came across. I wouldn't be too concerned about the CPU at this time, there's a lot of data that needs to be transferred and the CPU is going. Split brain can occur if there are communication issues with nodes in the cluster which is why we typically recommend all nodes be in the same physical location and plugged into the same switch if possible. And upgrading a 3 or 4 node cluster is similar to upgrading a 2 node - upgrade one, wait for it to finish, and then upgrade the next, and so on - https://assets.nagios.com/downloads/nag ... Server.pdf.

Re: Additional node introduced to cluster - log collection d

Posted: Tue Apr 30, 2019 10:13 am
by rferebee
This can be closed. We've been running great for the last few weeks.