red cluster health on working cluster

benhank · Post by **benhank** » Wed May 13, 2015 11:58 am

have 2 instances. One is the official VM, . The second is a manual install on a clean machine. Both were installed on centos 6 (clean no updates to the os) with the previous version then upgraded to the latest version of NLS.
My cluster seems to be working. By that I mean we have our environment set up to send logs to server P01 and the data being replicated with T01, which is working. However when I look at the cluster heath its red.
I have

Code: Select all

     service httpd stop
    service logstash stop
    service elasticsearch restart
    service logstash start
    service httpd startstop

I then went into

Code: Select all

Var/log/elasticsearch

and here is the logfile from my secondary machine:

c9dc126e-346d-4bfa-a30e-14b849c50ab5.log

and the primary:

c9dc126e-346d-4bfa-a30e-14b849c50ab5.log

Thanks in advance!

jolson · Post by **jolson** » Wed May 13, 2015 12:11 pm

This is normally an indication of failing indices. Let's take a look at your shard health. The output isn't pretty, but it gives us a good idea of what's going on:

Code: Select all

curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards'

benhank · Post by **benhank** » Wed May 13, 2015 12:24 pm

here we go:

Document.rtf

jolson · Post by **jolson** » Wed May 13, 2015 12:45 pm

It looks like you have a bad index by the name of logstash-2015.05.05. The course of action here will be to remove that index and see if your cluster health recovers. Keep in mind that all log data from that day will be lost - you can always restore if you have a backup present.

Code: Select all

curl -XDELETE 'http://localhost:9200/logstash-2015.05.05/'

How is your cluster health after that removal?

benhank · Post by **benhank** » Wed May 13, 2015 12:50 pm

lean and green my man thanks! any clue as to how that happened?

better question:
i only deleted that on one machine. shouldn't the other one detect that it was delete it and rebuild it?

jolson · Post by **jolson** » Wed May 13, 2015 1:02 pm

i only deleted that on one machine. shouldn't the other one detect that it was delete it and rebuild it?

Nope - the index contains both primary and replica shards, so all of the data is now gone unfortunately. The API that we used can affect the whole cluster, not just the node in question.

There are many reasons why an index might corrupt. Some of the more common reasons are disk space filling up or shards being unable initialize properly (for whatever reason). Typically bad shards indicate data loss, so somehow data was likely lost on the server - the culprit is hard to pin down. I would get a backup schedule in place so that you have a way to recover your information if this were to happen again. Protect those bits!

benhank · Post by **benhank** » Wed May 13, 2015 2:26 pm

Thanks for the help and info! all set my man!

jolson · Post by **jolson** » Wed May 13, 2015 2:29 pm

Glad I could help - I'll close this out. Please feel free to open additional thread if you have further questions or issues. Thanks!

Nagios Support Forum

red cluster health on working cluster

red cluster health on working cluster

Re: red cluster health on working cluster

Re: red cluster health on working cluster

Re: red cluster health on working cluster

Re: red cluster health on working cluster

Re: red cluster health on working cluster

Re: red cluster health on working cluster

Re: red cluster health on working cluster