Additional node introduced to cluster - log collection drop
Additional node introduced to cluster - log collection drop
Hello,
We introduced an additional node to our cluster this morning around 8:30AM. Since then it appears our log collection has severely dropped off. See attached screen shot.
It looks like elasticsearch and logstash are running on all 3 nodes, so I'm not sure why we aren't collecting.
Is there anything I can do to troubleshoot this?
We introduced an additional node to our cluster this morning around 8:30AM. Since then it appears our log collection has severely dropped off. See attached screen shot.
It looks like elasticsearch and logstash are running on all 3 nodes, so I'm not sure why we aren't collecting.
Is there anything I can do to troubleshoot this?
You do not have the required permissions to view the files attached to this post.
Re: Additional node introduced to cluster - log collection d
Also, the cluster has been attempting to relocate 2 shards for the last 7 hours.
I'm wondering if something is configured incorrectly on the new node I introduced.
I'm wondering if something is configured incorrectly on the new node I introduced.
Re: Additional node introduced to cluster - log collection d
Please PM a profile from all 3 machines so I may review the config and logs.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Additional node introduced to cluster - log collection d
PM sent. Thank you!
Re: Additional node introduced to cluster - log collection d
The logs are showing a lot of attempts by the new node to contact the remote repo, but unable to. If you run a "df -h" on the nodes, you'll see that the two old nodes have a remote share but the new node doesn't. Makes sure that the share is mounted on the new node and mounted to the same location - /nlsrepcc.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Additional node introduced to cluster - log collection d
Ok, I think I have it mounted now. Can you tell me which log you saw those connecting attempts in so I can make sure it's communicating properly?
Re: Additional node introduced to cluster - log collection d
Those would be the elasticsearch logs:
It showed frequent messages like:
Code: Select all
tail -f /var/log/elasticsearch/e4f9550c-f37c-417f-9cdc-283429a2a0a1.logCode: Select all
[repositories ] [29dbb5cc-f936-4f0e-8a41-26b2277c7083] failed to create repository [fs][NLSREPCC]
org.elasticsearch.common.inject.CreationException: Guice creation errors:As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Additional node introduced to cluster - log collection d
Ok, perfect thank you.
Was that the only potential issue you saw in reviewing our setup?
The other thing that worries me (it might be totally normal), is that it seems one node is always working harder than the others. For example, right now LSCC2 has its CPU totally maxed out and the other two nodes are sitting at around 25% load. Shouldn't there be more of a resource share happening?
Also, I read something related to ELK about a month ago. Is there anything we need to worry about in terms of which node is the primary or if there's more than one primary at any given time? I think it was called "split brain".
And, how do you recommend we do updates going forward? I've never managed a three node cluster, so I'm not to sure how reboots are going to effect the system, etc. Eventually we need to upgrade the other two nodes to CentOS 7. Can we remove a node in order to perform the upgrade and then re-introduce it? We're going to have a total of four nodes pretty soon.
Was that the only potential issue you saw in reviewing our setup?
The other thing that worries me (it might be totally normal), is that it seems one node is always working harder than the others. For example, right now LSCC2 has its CPU totally maxed out and the other two nodes are sitting at around 25% load. Shouldn't there be more of a resource share happening?
Also, I read something related to ELK about a month ago. Is there anything we need to worry about in terms of which node is the primary or if there's more than one primary at any given time? I think it was called "split brain".
And, how do you recommend we do updates going forward? I've never managed a three node cluster, so I'm not to sure how reboots are going to effect the system, etc. Eventually we need to upgrade the other two nodes to CentOS 7. Can we remove a node in order to perform the upgrade and then re-introduce it? We're going to have a total of four nodes pretty soon.
Re: Additional node introduced to cluster - log collection d
That was the only one I came across. I wouldn't be too concerned about the CPU at this time, there's a lot of data that needs to be transferred and the CPU is going. Split brain can occur if there are communication issues with nodes in the cluster which is why we typically recommend all nodes be in the same physical location and plugged into the same switch if possible. And upgrading a 3 or 4 node cluster is similar to upgrading a 2 node - upgrade one, wait for it to finish, and then upgrade the next, and so on - https://assets.nagios.com/downloads/nag ... Server.pdf.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Additional node introduced to cluster - log collection d
This can be closed. We've been running great for the last few weeks.