Cluster Status is Red

hcltech · Post by **hcltech** » Thu Nov 14, 2019 9:01 am

I upgraded a few days ago and everything was green but today I came in and cluster status was red. I have tried rebooting both servers which usually fixes this issue but it is still showing red. Can someone please help.

Post by **mbellerue** » Thu Nov 14, 2019 2:48 pm

Can you run these commands on your Log Server and post the results?

Code: Select all

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
curl -XGET 'http://localhost:9200/_cat/shards?v'

hcltech · Post by **hcltech** » Fri Nov 15, 2019 3:11 pm

setup@cl-p-nagioslog01:~$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "608ebbc0-afce-4301-816c-13da1488336a",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 11,
"active_shards" : 11,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 21,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
}
setup@cl-p-nagioslog01:~$ curl -XGET 'http://localhost:9200/_cat/shards?v'
index shard prirep state docs store ip node
logstash-2019.11.14 4 p UNASSIGNED
logstash-2019.11.14 4 r UNASSIGNED
logstash-2019.11.14 0 p UNASSIGNED
logstash-2019.11.14 0 r UNASSIGNED
logstash-2019.11.14 3 p UNASSIGNED
logstash-2019.11.14 3 r UNASSIGNED
logstash-2019.11.14 1 p UNASSIGNED
logstash-2019.11.14 1 r UNASSIGNED
logstash-2019.11.14 2 p UNASSIGNED
logstash-2019.11.14 2 r UNASSIGNED
kibana-int 2 p STARTED 4 42.7kb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
kibana-int 2 r UNASSIGNED
kibana-int 0 p STARTED 2 20.7kb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
kibana-int 0 r UNASSIGNED
kibana-int 3 p STARTED 4 31.6kb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
kibana-int 3 r UNASSIGNED
kibana-int 1 p STARTED 6 64kb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
kibana-int 1 r UNASSIGNED
kibana-int 4 p STARTED 0 144b 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
kibana-int 4 r UNASSIGNED
nagioslogserver 0 p STARTED 2480 559.5kb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
nagioslogserver 0 r UNASSIGNED
nagioslogserver_log 4 p STARTED 114670 9.5mb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
nagioslogserver_log 4 r UNASSIGNED
nagioslogserver_log 0 p STARTED 115042 9.6mb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
nagioslogserver_log 0 r UNASSIGNED
nagioslogserver_log 3 p STARTED 114608 9.5mb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
nagioslogserver_log 3 r UNASSIGNED
nagioslogserver_log 1 p STARTED 115417 19.3mb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
nagioslogserver_log 1 r UNASSIGNED
nagioslogserver_log 2 p STARTED 114901 19.2mb 127.0.1.1 6e80ef06-59ba-463f-ab23-0a254d47c2cd
nagioslogserver_log 2 r UNASSIGNED

Post by **cdienger** » Fri Nov 15, 2019 4:19 pm

What does the disk space look like on this machine? Unassigned primary shards can be due to hitting the low or high water mark(https://www.elastic.co/guide/en/elastic ... cator.html).

If you're using the NLS OVA you can follow one of the guides at to https://support.nagios.com/kb/article.p ... tegory=128 increase the disk space. If this is a source install then the docs may help but there may be additional/different commands depending on the OS/file system/etc...

hcltech · Post by **hcltech** » Mon Nov 18, 2019 9:07 am

setup@cl-p-nagioslog01:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 4047608 0 4047608 0% /dev
tmpfs 815840 744 815096 1% /run
/dev/mapper/cl--p--nagioslog01--vg-root 514017592 16762060 471075132 4% /
tmpfs 4079188 280 4078908 1% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4079188 0 4079188 0% /sys/fs/cgroup
//gt-p-qnap01.hcl.internal/NagiosLogBackup 2138812336 1204441604 934370732 57% /mnt/storage_repository
tmpfs 815836 0 815836 0% /run/user/1000

Post by **cdienger** » Mon Nov 18, 2019 1:12 pm

Please PM me a profile from the system. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh

This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the system. This is usually due to the logs in the Logstash and/or Elasticsearch directories found in it. If it is too large, please open the profile, extract these directories/files and send them separately.

Post by **cdienger** » Tue Nov 19, 2019 2:58 pm

The cluster doesn't seem to recognize the nodes properly. Edit /usr/local/nagioslogserver/var/cluster_hosts on both servers and make sure that each file contains the IP of the other server as well as localhost. Restart elasticsearch on both machines:

Code: Select all

service elasticsearch restart

hcltech · Post by **hcltech** » Tue Nov 19, 2019 3:05 pm

Ok did that but now i cant login to the server, any ideas?

Post by **cdienger** » Tue Nov 19, 2019 3:24 pm

Run:

Code: Select all

curl 'localhost:9200/_cat/shards?pretty'

If it still shows all shards for logstash-2019.11.15 as unassigned, close it:

Code: Select all

curl -XPOST 'localhost:9200/logstash-2019.11.15/_close'

Gather a profile from each machine if there are still any problems after this.

hcltech · Post by **hcltech** » Tue Nov 19, 2019 3:35 pm

There was shards to that so i closed it and they are not showing up now. I did restart the servers after everything and they have been up for 25 minutes now. But i only get the Waiting for Elasticsearch page.

Nagios Support Forum

Cluster Status is Red

Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red

Re: Cluster Status is Red