Additional node introduced to cluster - log collection drop

rferebee · Post by **rferebee** » Wed Apr 17, 2019 4:11 pm

Hello,

We introduced an additional node to our cluster this morning around 8:30AM. Since then it appears our log collection has severely dropped off. See attached screen shot.

It looks like elasticsearch and logstash are running on all 3 nodes, so I'm not sure why we aren't collecting.

Is there anything I can do to troubleshoot this?

rferebee · Post by **rferebee** » Wed Apr 17, 2019 5:55 pm

Also, the cluster has been attempting to relocate 2 shards for the last 7 hours.

I'm wondering if something is configured incorrectly on the new node I introduced.

Post by **cdienger** » Thu Apr 18, 2019 9:08 am

Please PM a profile from all 3 machines so I may review the config and logs.

rferebee · Post by **rferebee** » Thu Apr 18, 2019 9:52 am

PM sent. Thank you!

Post by **cdienger** » Thu Apr 18, 2019 10:09 am

The logs are showing a lot of attempts by the new node to contact the remote repo, but unable to. If you run a "df -h" on the nodes, you'll see that the two old nodes have a remote share but the new node doesn't. Makes sure that the share is mounted on the new node and mounted to the same location - /nlsrepcc.

rferebee · Post by **rferebee** » Thu Apr 18, 2019 10:33 am

Ok, I think I have it mounted now. Can you tell me which log you saw those connecting attempts in so I can make sure it's communicating properly?

Post by **cdienger** » Thu Apr 18, 2019 11:21 am

Those would be the elasticsearch logs:

Code: Select all

tail -f /var/log/elasticsearch/e4f9550c-f37c-417f-9cdc-283429a2a0a1.log

It showed frequent messages like:

Code: Select all

[repositories             ] [29dbb5cc-f936-4f0e-8a41-26b2277c7083] failed to create repository [fs][NLSREPCC]
org.elasticsearch.common.inject.CreationException: Guice creation errors:

rferebee · Post by **rferebee** » Thu Apr 18, 2019 11:30 am

Ok, perfect thank you.

Was that the only potential issue you saw in reviewing our setup?

The other thing that worries me (it might be totally normal), is that it seems one node is always working harder than the others. For example, right now LSCC2 has its CPU totally maxed out and the other two nodes are sitting at around 25% load. Shouldn't there be more of a resource share happening?

Also, I read something related to ELK about a month ago. Is there anything we need to worry about in terms of which node is the primary or if there's more than one primary at any given time? I think it was called "split brain".

And, how do you recommend we do updates going forward? I've never managed a three node cluster, so I'm not to sure how reboots are going to effect the system, etc. Eventually we need to upgrade the other two nodes to CentOS 7. Can we remove a node in order to perform the upgrade and then re-introduce it? We're going to have a total of four nodes pretty soon.

Post by **cdienger** » Thu Apr 18, 2019 12:56 pm

That was the only one I came across. I wouldn't be too concerned about the CPU at this time, there's a lot of data that needs to be transferred and the CPU is going. Split brain can occur if there are communication issues with nodes in the cluster which is why we typically recommend all nodes be in the same physical location and plugged into the same switch if possible. And upgrading a 3 or 4 node cluster is similar to upgrading a 2 node - upgrade one, wait for it to finish, and then upgrade the next, and so on - https://assets.nagios.com/downloads/nag ... Server.pdf.

rferebee · Post by **rferebee** » Tue Apr 30, 2019 10:13 am

This can be closed. We've been running great for the last few weeks.

Nagios Support Forum

Additional node introduced to cluster - log collection drop

Additional node introduced to cluster - log collection drop

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d

Re: Additional node introduced to cluster - log collection d