So this morning I went to log into Nagios Log and I was unable to, I eventually worked out that it was built on the floating IP so it one instances fails it jumps to the other, but this is where the fun began. I was able to log into the individual instances by going directly to their https://servername/nagioslogserver URL rather than our DNS entry. When i logged into the server I noticed they both had different instance ID's and in the Instance Status window the opposing server was missing from the other, as if they had started to act independently and leave the cluster.
Anyway I powered off the second instance and left the primary alone for 10 mins, I then powered the second server back on and instance became one again. But now I was faced with this mess..
Code: Select all
[root@naglp01 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty= true'
{
"cluster_name" : "8e96de2d-514c-4909-8b28-b596c70b50e0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 59,
"active_shards" : 66,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 52,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0Code: Select all
[root@naglp01 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "8e96de2d-514c-4909-8b28-b596c70b50e0",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 55,
"active_shards" : 110,
"relocating_shards" : 0,
"initializing_shards" : 1,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}
Code: Select all
[root@naglp02 ~]# curl -s -XGET http://localhost:9200/_cat/shards?v | egrep 'UNASSIGNED|INITIALIZING'
nagioslogserver_history 2 p INITIALIZING 10.31.10.152 e6cd8034-67c9-4b5a-b913-3808cd5caf13
nagioslogserver_history 2 r UNASSIGNED
Code: Select all
curl -XDELETE 'http://localhost:9200/nagioslogserver_history/'