So I'm in the middle of a building move and I added my second NLS node to the primary NLS node at the second office connected via VPN. I expected a large data transfer to happen in the replication and that completed successfully. However I expected replications after that to use much less bandwidth, but that seems to have not been the case.
When the second spike in bandwidth kicked off I got worried as I didn't want to impact production, so I had the sysadmin disable the NIC on the secondary NLS instance in vmware. I also deleted the instance from the cluster as I figured I'd just add it later on when we've moved all the infrastructure to the new building. But now I'm missing data in my graphs even though the indexes are open.
I'm posting to get answers to two questions: how do I get the data prior to 8/29 to show up in my queries/dashboards again, and spell out exactly how two nodes replicate data between each other.
As always, I appreciate any help I can get.
Missing data after removing instance from cluster
Missing data after removing instance from cluster
You do not have the required permissions to view the files attached to this post.
I like graphs...
Re: Missing data after removing instance from cluster
It's likely that at least part of this data exists on the second node that you had stood up. If you are willing to stand up the second node again to see whether or not your data reappears, that would be a worthwhile troubleshooting method.how do I get the data prior to 8/29 to show up in my queries/dashboards again
I would also like to see the output of the following command:
Code: Select all
curl 'localhost:9200/_cluster/health?level=indices&pretty'Nodes replicate data using sharding, which is a logical concept. You can read about how shards are distributed here:spell out exactly how two nodes replicate data between each other.
https://www.elastic.co/guide/en/elastic ... tally.html
Essentially, shards are distributed among all of the instances in your Nagios Log Server cluster - those shards are watched over by your index. What likely happened is that *some shards* moved to your second server, while some remained on your primary server.
One last question: What was the latency like between your NLS nodes over the VPN connection? High latency is a very dangerous thing to introduce your instances to, and I'm wondering exactly what the conditions were like.
Thanks a ton!
Jesse
Re: Missing data after removing instance from cluster
Code: Select all
[root@nagiosls ~]# curl 'localhost:9200/_cluster/health?level=indices&pretty'
{
"cluster_name" : "553c1f03-f76e-4910-a868-8c1e078ef969",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 161,
"active_shards" : 161,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 171,
"indices" : {
"logstash-2015.08.10" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"nagioslogserver" : {
"status" : "yellow",
"number_of_shards" : 1,
"number_of_replicas" : 1,
"active_primary_shards" : 1,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1
},
"logstash-2015.08.12" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.11" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.31" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.14" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.13" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.16" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.15" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.18" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.17" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.19" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.30" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"nagioslogserver_log" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.23" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.22" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.21" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.20" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.09" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.27" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.08" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.26" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.07" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.25" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.24" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.06" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.05" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.04" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.03" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.29" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.08.28" : {
"status" : "red",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10
},
"logstash-2015.08.02" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"logstash-2015.09.01" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},
"kibana-int" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
}
}
}
[root@nagiosls ~]#
I like graphs...
Re: Missing data after removing instance from cluster
The only index that I'm seeing with a real problem is the one from 8/28:
Thanks, looking forward to your results!
We'll see how your cluster reacts when the second server comes online. It's possible that the index will recover - it's also possible that it's corrupted. I hope it's the former!"logstash-2015.08.28" : {
"status" : "red",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10
},
This is an acceptable amount of latency. I would normally recommend that you get your servers in the same datacenter, but given that this setup is temporary I'm sure I don't have to tell you that.Latency is on average 23ms over the VPN connection
Thanks, looking forward to your results!
Re: Missing data after removing instance from cluster
How does the replication work? Is it constant or is it on a schedule?
I like graphs...
Re: Missing data after removing instance from cluster
Replication takes place when:
1. New logs enter Nagios Log Server
2. A new instance joins a cluster
3. An instance is removed from a cluster
To expand on that:
1. When a new log enters Nagios Log Server, it is assigned to one of the five daily 'shards'. These five shards are distributed among your Nagios Log Server instances, and are moved around dynamically as elasticsearch deems necessary.
In addition to the 5 'Primary Shards', there are also 5 'Replica Shards' - which are exact duplicates of your Primary Shards. These replicas are distributed among all of the instances in your cluster in such a way that two matching shards will never be on the same instance. All of this moving in the backend happens dynamically and on no sort of schedule.
Example image of a 2 instance cluster: 2. When a new instance of Nagios Log Server joins your cluster, shards will redistribute in such a way that the data and load is balanced between all of your nodes.
3. When an instance is removed from the cluster, a few things happen:
1. New logs enter Nagios Log Server
2. A new instance joins a cluster
3. An instance is removed from a cluster
To expand on that:
1. When a new log enters Nagios Log Server, it is assigned to one of the five daily 'shards'. These five shards are distributed among your Nagios Log Server instances, and are moved around dynamically as elasticsearch deems necessary.
In addition to the 5 'Primary Shards', there are also 5 'Replica Shards' - which are exact duplicates of your Primary Shards. These replicas are distributed among all of the instances in your cluster in such a way that two matching shards will never be on the same instance. All of this moving in the backend happens dynamically and on no sort of schedule.
Example image of a 2 instance cluster: 2. When a new instance of Nagios Log Server joins your cluster, shards will redistribute in such a way that the data and load is balanced between all of your nodes.
3. When an instance is removed from the cluster, a few things happen:
- Any Replica Shard without a matching Primary shard is automatically upgraded to a Primary Shard, and then a new Replica Shard is generated and distributed appropriately.
Any Primary Shard without a matching Replica shard has a new Replica shard generated and distributed appropriately.
You do not have the required permissions to view the files attached to this post.