Nagios Log Server - Understanding and Troubleshooting Yellow Cluster Health |
Problem Description Nagios Log Server is in a yellow health state. You can see the current cluster state by navigating to Admin > System > Cluster Status:
The cluster can be in one of three states: Green: All primary and replica shards are active and assigned to instances. Yellow: All data is available but some replicas are not yet allocated (cluster is fully functional). Red: There is at least one primary shard that is not active and allocated to an instance (cluster is still partially functional).
Potential Causes What can cause a shard to become unassigned/corrupt?
Troubleshooting Disk Space Run the following commands on EVERY instance in the cluster: Type: grep watermark /var/log/elasticsearch/*.log
We are looking for output like this: [2016-02-15 03:20:31,927][INFO ][cluster.routing.allocation.decider]
The message is telling us that we have used more than 85% of the available disk space. Check the amount of available disk space: df -h
Which output this: Filesystem Size Used Avail Use% Mounted on
Here you can see that the rootfs has 87% disk space used which confirms the problem.
Resolving Disk SpaceYou have two options: Add more disk spaceThis is most likely the course of action you need to take. Once you've added the disk space, if the custer health does not return to green, restart the elasticsearch service on that instance:
RHEL 7 + | CentOS 7 + | Debian | Ubuntu 16/18/20 systemctl restart elasticsearch.service
Wait about 5 minutes and the cluster health should return to green.
This documentation will help if you want to move the data location: Documentation - Changing Data Store Path
Increase The Low/High WatermarkThe default watermark level is set to 85% of the disk that the elasticsearch data is located on. If you have a much larger disk, you may want to increase this to 90% or more. Note: The watermark is a cluster-wide setting. The command to adjust the LOW watermark is: curl -s -XPUT http://localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.disk.watermark.low" : "90%" } }' The command to adjust the HIGH watermark is: curl -s -XPUT http://localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.disk.watermark.high" : "95%" } }' Which will output similar to the following: {"acknowledged":true,"persistent":{"cluster":{"routing":{"allocation":{"disk":{"watermark":{"low":"90%"}}}}}},"transient":{}}
Then restart the elasticsearch service on that instance:
RHEL 7 + | CentOS 7 + | Debian | Ubuntu 16/18/20 systemctl restart elasticsearch.service
Wait about 5 minutes and the cluster health should return to green.
Final ThoughtsFor any support related questions please visit the Nagios Support Forums at: |
Posted by: tlea - Mon, Feb 15, 2016 at 8:12 PM. This article has been viewed 10157 times. |
Online URL: https://support.nagios.com/kb/article/nagios-log-server-understanding-and-troubleshooting-yellow-cluster-health-469.html |
Powered by PHPKB (Knowledge Base Software)