NLS goes to hung mode every 2 hours
Posted: Tue Apr 02, 2024 7:27 am
NLS goes to hung mode every 2 hours .
Current we have 2 NLS instances in the cluster in 2 different datacenter.
But every 2 hours we see below errors in the cluster.log
errors from the /var/log/elasticsearch/<cluster.log>
[2024-04-02 06:50:07,764][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug] [<<NLSnode2>>] [inet[/<<NLSnode2 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
[2024-04-02 06:50:07,764][WARN ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master left (reason = transport disconnected), current nodes: {[528b9c0b-d2fd-4a90-8889-2cd35ca64b70][mtBZ6nFMQu6fLAguPux8aw][<<NLSnode1 IP>>][inet[/<IP>:9300]]{max_local_storage_nodes=1},}
[2024-04-02 06:50:07,764][INFO ][cluster.service ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] removed {[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug] [<<NLSnode2>>][inet[/<<NLSnode2 IP>>:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-master_failed ([a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug] [<<NLSnode2>>] [inet[/<<NLSnode2 IP>>:9300]]{max_local_storage_nodes=1})
-------------------------------------------
every 2 hours 11 mins once we see same error that the NLS node2 left the cluster
# grep master_left 90bdcc06-402e-429e-87e2-a1de1745ecc7.log
[2024-04-02 04:38:30,675][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug][<<NLSnode1 name >>][inet[/<<NLSnode1 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
[2024-04-02 06:50:07,764][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug][<<NLSnode1 name >>][inet[/<<NLSnode1 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
[2024-04-02 09:01:44,851][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug][<<NLSnode1 name >>][inet[/<<NLSnode1 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
#
-------------------------------------------
# cat /proc/sys/net/ipv4/tcp_keepalive_time /proc/sys/net/ipv4/tcp_keepalive_intvl /proc/sys/net/ipv4/tcp_keepalive_probes
7200
75
9
#
Current we have 2 NLS instances in the cluster in 2 different datacenter.
But every 2 hours we see below errors in the cluster.log
errors from the /var/log/elasticsearch/<cluster.log>
[2024-04-02 06:50:07,764][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug] [<<NLSnode2>>] [inet[/<<NLSnode2 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
[2024-04-02 06:50:07,764][WARN ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master left (reason = transport disconnected), current nodes: {[528b9c0b-d2fd-4a90-8889-2cd35ca64b70][mtBZ6nFMQu6fLAguPux8aw][<<NLSnode1 IP>>][inet[/<IP>:9300]]{max_local_storage_nodes=1},}
[2024-04-02 06:50:07,764][INFO ][cluster.service ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] removed {[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug] [<<NLSnode2>>][inet[/<<NLSnode2 IP>>:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-master_failed ([a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug] [<<NLSnode2>>] [inet[/<<NLSnode2 IP>>:9300]]{max_local_storage_nodes=1})
-------------------------------------------
every 2 hours 11 mins once we see same error that the NLS node2 left the cluster
# grep master_left 90bdcc06-402e-429e-87e2-a1de1745ecc7.log
[2024-04-02 04:38:30,675][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug][<<NLSnode1 name >>][inet[/<<NLSnode1 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
[2024-04-02 06:50:07,764][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug][<<NLSnode1 name >>][inet[/<<NLSnode1 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
[2024-04-02 09:01:44,851][INFO ][discovery.zen ] [528b9c0b-d2fd-4a90-8889-2cd35ca64b70] master_left [[a40b0d7b-a79a-4c37-9225-264a71974bb9][CZ6PtUXlQ2a8OSb8GCy7Ug][<<NLSnode1 name >>][inet[/<<NLSnode1 IP>>:9300]]{max_local_storage_nodes=1}], reason [transport disconnected]
#
-------------------------------------------
# cat /proc/sys/net/ipv4/tcp_keepalive_time /proc/sys/net/ipv4/tcp_keepalive_intvl /proc/sys/net/ipv4/tcp_keepalive_probes
7200
75
9
#