Cluster status Red after upgrade

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
aspeer
Posts: 4
Joined: Tue Jan 16, 2018 11:46 am

Cluster status Red after upgrade

Post by aspeer »

Looks like it's failing to create new indexes. If I delete the broken index it will go yellow but then as soon as the next message is received it returns to red.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "3384d7f1-dc67-40b6-a18e-78ce15c78ec7",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 67,
"active_shards" : 67,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 77,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}


curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7776 100logstash-2019.10.03 2 r UNASSIGNED NODE_LEFT
777logstash-2019.10.03 0 r UNASSIGNED NODE_LEFT
6 logstash-2019.10.03 3 r UNASSIGNED NODE_LEFT
0 logstash-2019.10.03 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.03 4 r UNASSIGNED NODE_LEFT
0 9logstash-2019.10.04 4 r UNASSIGNED NODE_LEFT
5821logstash-2019.10.04 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.04 3 r UNASSIGNED NODE_LEFT
0logstash-2019.10.04 1 r UNASSIGNED NODE_LEFT
--:logstash-2019.10.04 2 r UNASSIGNED NODE_LEFT
--:-logstash-2019.10.01 2 r UNASSIGNED NODE_LEFT
- --logstash-2019.10.01 0 r UNASSIGNED NODE_LEFT
:--logstash-2019.10.01 3 r UNASSIGNED NODE_LEFT
:--logstash-2019.10.01 1 r UNASSIGNED NODE_LEFT
--logstash-2019.10.01 4 r UNASSIGNED NODE_LEFT
:--logstash-2019.09.27 4 r UNASSIGNED NODE_LEFT
:--logstash-2019.09.27 0 r UNASSIGNED NODE_LEFT
97logstash-2019.09.27 3 r UNASSIGNED NODE_LEFT
200logstash-2019.09.27 1 r UNASSIGNED NODE_LEFT

logstash-2019.09.27 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 2 r UNASSIGNED NODE_LEFT
kibana-int 2 r UNASSIGNED NODE_LEFT
kibana-int 0 r UNASSIGNED NODE_LEFT
kibana-int 3 r UNASSIGNED NODE_LEFT
kibana-int 1 r UNASSIGNED NODE_LEFT
kibana-int 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 2 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 0 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 3 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 1 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.08 2 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 2 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 0 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 0 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 3 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 3 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 1 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 1 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 4 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 4 p UNASSIGNED INDEX_CREATED
logstash-2019.10.05 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 2 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 0 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 3 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 1 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 2 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 0 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 3 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 1 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 2 r UNASSIGNED NODE_LEFT
nagioslogserver 0 r UNASSIGNED NODE_LEFT
nagioslogserver_log 4 r UNASSIGNED NODE_LEFT
nagioslogserver_log 0 r UNASSIGNED NODE_LEFT
nagioslogserver_log 3 r UNASSIGNED NODE_LEFT
nagioslogserver_log 1 r UNASSIGNED NODE_LEFT
nagioslogserver_log 2 r UNASSIGNED NODE_LEFT
nagioslogserver_history 0 r UNASSIGNED NODE_LEFT



curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason
logstash-2019.10.03 2 p STARTED
logstash-2019.10.03 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.03 0 p STARTED
logstash-2019.10.03 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.03 3 p STARTED
logstash-2019.10.03 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.03 1 p STARTED
logstash-2019.10.03 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.03 4 p STARTED
logstash-2019.10.03 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.04 4 p STARTED
logstash-2019.10.04 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.04 0 p STARTED
logstash-2019.10.04 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.04 3 p STARTED
logstash-2019.10.04 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.04 1 p STARTED
logstash-2019.10.04 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.04 2 p STARTED
logstash-2019.10.04 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.01 2 p STARTED
logstash-2019.10.01 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.01 0 p STARTED
logstash-2019.10.01 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.01 3 p STARTED
logstash-2019.10.01 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.01 1 p STARTED
logstash-2019.10.01 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.01 4 p STARTED
logstash-2019.10.01 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.27 4 p STARTED
logstash-2019.09.27 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.27 0 p STARTED
logstash-2019.09.27 0 r UNASSIGNED NODE_LEFT
logstash-2019.09.27 3 p STARTED
logstash-2019.09.27 3 r UNASSIGNED NODE_LEFT
logstash-2019.09.27 1 p STARTED
logstash-2019.09.27 1 r UNASSIGNED NODE_LEFT
logstash-2019.09.27 2 p STARTED
logstash-2019.09.27 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 4 p STARTED
logstash-2019.10.02 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 0 p STARTED
logstash-2019.10.02 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 3 p STARTED
logstash-2019.10.02 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 1 p STARTED
logstash-2019.10.02 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.02 2 p STARTED
logstash-2019.10.02 2 r UNASSIGNED NODE_LEFT
kibana-int 2 p STARTED
kibana-int 2 r UNASSIGNED NODE_LEFT
kibana-int 0 p STARTED
kibana-int 0 r UNASSIGNED NODE_LEFT
kibana-int 3 p STARTED
kibana-int 3 r UNASSIGNED NODE_LEFT
kibana-int 1 p STARTED
kibana-int 1 r UNASSIGNED NODE_LEFT
kibana-int 4 p STARTED
kibana-int 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 2 p STARTED
logstash-2019.09.28 2 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 0 p STARTED
logstash-2019.09.28 0 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 3 p STARTED
logstash-2019.09.28 3 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 1 p STARTED
logstash-2019.09.28 1 r UNASSIGNED NODE_LEFT
logstash-2019.09.28 4 p STARTED
logstash-2019.09.28 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 4 p STARTED
logstash-2019.10.07 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 0 p STARTED
logstash-2019.10.07 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 3 p STARTED
logstash-2019.10.07 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 1 p STARTED
logstash-2019.10.07 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.07 2 p STARTED
logstash-2019.10.07 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.08 2 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 2 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 0 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 0 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 3 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 3 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 1 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 1 p UNASSIGNED INDEX_CREATED
logstash-2019.10.08 4 r UNASSIGNED INDEX_CREATED
logstash-2019.10.08 4 p UNASSIGNED INDEX_CREATED
logstash-2019.10.05 2 p STARTED
logstash-2019.10.05 2 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 0 p STARTED
logstash-2019.10.05 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 3 p STARTED
logstash-2019.10.05 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 1 p STARTED
logstash-2019.10.05 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.05 4 p STARTED
logstash-2019.10.05 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 4 p STARTED
logstash-2019.10.06 4 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 0 p STARTED
logstash-2019.10.06 0 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 3 p STARTED
logstash-2019.10.06 3 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 1 p STARTED
logstash-2019.10.06 1 r UNASSIGNED NODE_LEFT
logstash-2019.10.06 2 p STARTED
logstash-2019.10.06 2 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 4 p STARTED
logstash-2019.09.30 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 0 p STARTED
logstash-2019.09.30 0 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 3 p STARTED
logstash-2019.09.30 3 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 1 p STARTED
logstash-2019.09.30 1 r UNASSIGNED NODE_LEFT
logstash-2019.09.30 2 p STARTED
logstash-2019.09.30 2 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 4 p STARTED
logstash-2019.09.29 4 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 0 p STARTED
logstash-2019.09.29 0 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 3 p STARTED
logstash-2019.09.29 3 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 1 p STARTED
logstash-2019.09.29 1 r UNASSIGNED NODE_LEFT
logstash-2019.09.29 2 p STARTED
logstash-2019.09.29 2 r UNASSIGNED NODE_LEFT
nagioslogserver 0 p STARTED
nagioslogserver 0 r UNASSIGNED NODE_LEFT
nagioslogserver_log 4 p STARTED
nagioslogserver_log 4 r UNASSIGNED NODE_LEFT
nagioslogserver_log 0 p STARTED
nagioslogserver_log 0 r UNASSIGNED NODE_LEFT
nagioslogserver_log 3 p STARTED
nagioslogserver_log 3 r UNASSIGNED NODE_LEFT
nagioslogserver_log 1 p STARTED
nagioslogserver_log 1 r UNASSIGNED NODE_LEFT
nagioslogserver_log 2 p STARTED
nagioslogserver_log 2 r UNASSIGNED NODE_LEFT
nagioslogserver_history 0 p STARTED
nagioslogserver_history 0 r UNASSIGNED NODE_LEFT
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster status Red after upgrade

Post by scottwilkerson »

lets restart elasticsearch on each instance

Code: Select all

service elasticsearch restart
then give it a bit of time to catch up

If you still have issues, lets run the following and return the output for each instance

Code: Select all

df -h
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
aspeer
Posts: 4
Joined: Tue Jan 16, 2018 11:46 am

Re: Cluster status Red after upgrade

Post by aspeer »

How long are we talking on waiting for it to recover? I'm not seeing any difference after the restart waiting about 15 minutes.

node1:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 25M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda3 105G 5.5G 94G 6% /
/dev/md0 1.8T 1.6T 161G 91% /store
/dev/sda2 976M 204M 705M 23% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
tmpfs 3.2G 0 3.2G 0% /run/user/41442
tmpfs 3.2G 0 3.2G 0% /run/user/15843

node2:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 25M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda3 105G 6.3G 93G 7% /
/dev/md0 1.8T 1.6T 170G 91% /store
/dev/sda2 976M 204M 705M 23% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
tmpfs 6.3G 0 6.3G 0% /run/user/15843
tmpfs 6.3G 0 6.3G 0% /run/user/41442

node3:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 3.2G 29G 10% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda4 104G 6.3G 93G 7% /
/dev/sda2 976M 197M 713M 22% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
store 1.8T 1.6T 167G 91% /store
tmpfs 6.3G 0 6.3G 0% /run/user/15843
tmpfs 6.3G 0 6.3G 0% /run/user/41442

node4:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 3.2G 29G 11% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda3 105G 5.6G 94G 6% /
/dev/sda2 976M 195M 715M 22% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
/dev/md0 1.8T 1.6T 166G 91% /store
tmpfs 6.3G 0 6.3G 0% /run/user/41442
tmpfs 6.3G 0 6.3G 0% /run/user/15843
aspeer
Posts: 4
Joined: Tue Jan 16, 2018 11:46 am

Re: Cluster status Red after upgrade

Post by aspeer »

How long are we talking for recovery after restarting? No change after about 20 minutes.

node1-4
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 25M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda3 105G 5.5G 94G 6% /
/dev/md0 1.8T 1.6T 161G 91% /store
/dev/sda2 976M 204M 705M 23% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
tmpfs 3.2G 0 3.2G 0% /run/user/41442
tmpfs 3.2G 0 3.2G 0% /run/user/15843

Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 25M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda3 105G 6.3G 93G 7% /
/dev/md0 1.8T 1.6T 170G 91% /store
/dev/sda2 976M 204M 705M 23% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
tmpfs 6.3G 0 6.3G 0% /run/user/15843
tmpfs 6.3G 0 6.3G 0% /run/user/41442

Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 3.2G 29G 10% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda4 104G 6.3G 93G 7% /
/dev/sda2 976M 197M 713M 22% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
store 1.8T 1.6T 167G 91% /store
tmpfs 6.3G 0 6.3G 0% /run/user/15843
tmpfs 6.3G 0 6.3G 0% /run/user/41442

Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 3.2G 29G 11% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda3 105G 5.6G 94G 6% /
/dev/sda2 976M 195M 715M 22% /boot
/dev/sda1 1022M 12M 1011M 2% /boot/efi
/dev/md0 1.8T 1.6T 166G 91% /store
tmpfs 6.3G 0 6.3G 0% /run/user/41442
tmpfs 6.3G 0 6.3G 0% /run/user/15843
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster status Red after upgrade

Post by scottwilkerson »

Are you storing active ES data in /store ?

If so the systems are over the high-water mark and will not allocate or move shards because the disks are too full

Code: Select all

store 1.8T 1.6T 167G 91% /store
You would need to add storage space if this is the case
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
aspeer
Posts: 4
Joined: Tue Jan 16, 2018 11:46 am

Re: Cluster status Red after upgrade

Post by aspeer »

Whats the high water mark threshold? Does it default to 90%?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster status Red after upgrade

Post by scottwilkerson »

The default watermark level is set to 85% of the disk that the elasticsearch data is located on.

https://support.nagios.com/kb/article.php?id=469
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked