Page 1 of 1

nagioslogserver_history stuck NITIALIZING

Posted: Fri Feb 12, 2021 9:42 am
by danniiffxi
Hi,

So this morning I went to log into Nagios Log and I was unable to, I eventually worked out that it was built on the floating IP so it one instances fails it jumps to the other, but this is where the fun began. I was able to log into the individual instances by going directly to their https://servername/nagioslogserver URL rather than our DNS entry. When i logged into the server I noticed they both had different instance ID's and in the Instance Status window the opposing server was missing from the other, as if they had started to act independently and leave the cluster.

Anyway I powered off the second instance and left the primary alone for 10 mins, I then powered the second server back on and instance became one again. But now I was faced with this mess..

Code: Select all

[root@naglp01 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=                                                                             true'
{
  "cluster_name" : "8e96de2d-514c-4909-8b28-b596c70b50e0",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 59,
  "active_shards" : 66,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 52,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
I let Elasticsearch do it's thing I left it for a couple of hours and sure enough, most of unassigned Shards had once gain found their place.

Code: Select all

[root@naglp01 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "8e96de2d-514c-4909-8b28-b596c70b50e0",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 55,
  "active_shards" : 110,
  "relocating_shards" : 0,
  "initializing_shards" : 1,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0
}
 
But it has now been a couple of hours now and I am still stuck with the following and am unsure how to proceed. Instance status is still red

Code: Select all

[root@naglp02 ~]# curl -s -XGET http://localhost:9200/_cat/shards?v | egrep 'UNASSIGNED|INITIALIZING'
nagioslogserver_history 2     p      INITIALIZING                 10.31.10.152 e6cd8034-67c9-4b5a-b913-3808cd5caf13
nagioslogserver_history 2     r      UNASSIGNED
Any ideas? Do I just

Code: Select all

curl -XDELETE 'http://localhost:9200/nagioslogserver_history/'
or will that break it even more?

Re: nagioslogserver_history stuck NITIALIZING

Posted: Fri Feb 12, 2021 5:35 pm
by vtrac
HI danniiffxi,
Is the IP "10.31.10.152" (below) correct for the "p" (Primary) server?
You have mentioned that its used floating IP, so that why I asked.

Code: Select all

[root@naglp02 ~]# curl -s -XGET http://localhost:9200/_cat/shards?v | egrep 'UNASSIGNED|INITIALIZING'
nagioslogserver_history 2     p      INITIALIZING                 10.31.10.152 e6cd8034-67c9-4b5a-b913-3808cd5caf13
nagioslogserver_history 2     r      UNASSIGNED
Let try taking down both Log Servers completely (wait couple minutes) then bringing up just the Primary "p".

Run the command you had used (below) and check until ALL Primary "p" shards are initialized and "STARTED" (ASSIGNED), then bring the Replica "r" up.

Code: Select all

curl -XGET http://localhost:9200/_cat/shards?v | egrep 'UNASSIGNED|INITIALIZING'
Regards,
Vinh

Re: nagioslogserver_history stuck INITIALIZING

Posted: Mon Feb 15, 2021 6:17 am
by danniiffxi
Hi Vinh

Thanks, it is all working now. I did as you said, all shards on the Primary started in a few mins, I then powered on the secondary and the status went to Yellow and after a few hours it went from Yellow to Green.

Re: nagioslogserver_history stuck INITIALIZING

Posted: Mon Feb 15, 2021 9:32 am
by scottwilkerson
danniiffxi wrote:Hi Vinh

Thanks, it is all working now. I did as you said, all shards on the Primary started in a few mins, I then powered on the secondary and the status went to Yellow and after a few hours it went from Yellow to Green.
Great!

Locking thread