Additional node introduced to cluster - log collection drop

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Additional node introduced to cluster - log collection drop

Post by rferebee »

Hello,

We introduced an additional node to our cluster this morning around 8:30AM. Since then it appears our log collection has severely dropped off. See attached screen shot.

It looks like elasticsearch and logstash are running on all 3 nodes, so I'm not sure why we aren't collecting.

Is there anything I can do to troubleshoot this?
You do not have the required permissions to view the files attached to this post.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Additional node introduced to cluster - log collection d

Post by rferebee »

Also, the cluster has been attempting to relocate 2 shards for the last 7 hours.

I'm wondering if something is configured incorrectly on the new node I introduced.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Additional node introduced to cluster - log collection d

Post by cdienger »

Please PM a profile from all 3 machines so I may review the config and logs.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Additional node introduced to cluster - log collection d

Post by rferebee »

PM sent. Thank you!
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Additional node introduced to cluster - log collection d

Post by cdienger »

The logs are showing a lot of attempts by the new node to contact the remote repo, but unable to. If you run a "df -h" on the nodes, you'll see that the two old nodes have a remote share but the new node doesn't. Makes sure that the share is mounted on the new node and mounted to the same location - /nlsrepcc.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Additional node introduced to cluster - log collection d

Post by rferebee »

Ok, I think I have it mounted now. Can you tell me which log you saw those connecting attempts in so I can make sure it's communicating properly?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Additional node introduced to cluster - log collection d

Post by cdienger »

Those would be the elasticsearch logs:

Code: Select all

tail -f /var/log/elasticsearch/e4f9550c-f37c-417f-9cdc-283429a2a0a1.log
It showed frequent messages like:

Code: Select all

[repositories             ] [29dbb5cc-f936-4f0e-8a41-26b2277c7083] failed to create repository [fs][NLSREPCC]
org.elasticsearch.common.inject.CreationException: Guice creation errors:
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Additional node introduced to cluster - log collection d

Post by rferebee »

Ok, perfect thank you.

Was that the only potential issue you saw in reviewing our setup?

The other thing that worries me (it might be totally normal), is that it seems one node is always working harder than the others. For example, right now LSCC2 has its CPU totally maxed out and the other two nodes are sitting at around 25% load. Shouldn't there be more of a resource share happening?

Also, I read something related to ELK about a month ago. Is there anything we need to worry about in terms of which node is the primary or if there's more than one primary at any given time? I think it was called "split brain".

And, how do you recommend we do updates going forward? I've never managed a three node cluster, so I'm not to sure how reboots are going to effect the system, etc. Eventually we need to upgrade the other two nodes to CentOS 7. Can we remove a node in order to perform the upgrade and then re-introduce it? We're going to have a total of four nodes pretty soon.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Additional node introduced to cluster - log collection d

Post by cdienger »

That was the only one I came across. I wouldn't be too concerned about the CPU at this time, there's a lot of data that needs to be transferred and the CPU is going. Split brain can occur if there are communication issues with nodes in the cluster which is why we typically recommend all nodes be in the same physical location and plugged into the same switch if possible. And upgrading a 3 or 4 node cluster is similar to upgrading a 2 node - upgrade one, wait for it to finish, and then upgrade the next, and so on - https://assets.nagios.com/downloads/nag ... Server.pdf.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Additional node introduced to cluster - log collection d

Post by rferebee »

This can be closed. We've been running great for the last few weeks.
Locked