Home » Categories » Multiple Categories

Nagios Log Server - Understanding and Troubleshooting Yellow Cluster Health

Problem Description

Nagios Log Server is in a yellow health state. You can see the current cluster state by navigating to Admin > System > Cluster Status:



The cluster can be in one of three states:

Green: All primary and replica shards are active and assigned to instances.

Yellow: All data is available but some replicas are not yet allocated (cluster is fully functional).

Red: There is at least one primary shard that is not active and allocated to an instance (cluster is still partially functional).



Potential Causes

What can cause a shard to become unassigned/corrupt?

  1. Unexpected reboots or shutdowns - an unexpected reboot or shutdown of any instance in your cluster can cause a primary shard to become detached or corrupt. In most cases, Elasticsearch will recover from this problem on its own.

  2. Disk space fills up - if Nagios Log Server runs out of disk space, serious complications can occur. Typically this results in corrupt/unassigned shards.

    Note: Disk space will need to be increased, or existing Log Server data will need to be removed.

  3. Out of memory error - if Elasticsearch takes up too much system memory, the kernel could reap Elasticsearch. You will see an explicit message in /var/log/messages at the time this occurs. The sudden reaping of Elasticsearch could cause corrupt/unassigned shards.

    Note: Memory will likely need to be increased on Nagios Log Server before restart - otherwise you risk Elasticsearch being reaped again.

  4. You only have one node in your Log Server cluster
    • Nagios Log server is a cluster based application, and requires more than one node in the cluster for Log Server to see it as "healthy".

    • When there is only one node in the cluster:
      • The status will always be Yellow

      • Unassigned Shards will never be 0 as they are waiting to be assigned to another node in the cluster (which does not exists)

    • If you wish to deploy a single instance cluster please refer to the following documentation:



Troubleshooting Disk Space

Run the following commands on EVERY instance in the cluster:


grep watermark /var/log/elasticsearch/*.log


We are looking for output like this:

[2016-02-15 03:20:31,927][INFO ][cluster.routing.allocation.decider]
[84b9dd98-e004-43ee-b70a-a5e48f8482cc] low disk watermark [85%]
exceeded on [cP-M7p_XQCGj_lUYvKnWOw][3e2220f4-1a3b-437b-a939-cf269b8e785c]
free: 38.1gb[12.9%], replicas will not be assigned to this node


The message is telling us that we have used more than 85% of the available disk space.

Check the amount of available disk space:

df -h


Which output this:

Filesystem            Size  Used Avail Use% Mounted on
rootfs 296G 255G 39G 87% /
devtmpfs 3.9G 148K 3.9G 1% /dev
tmpfs 4.0G 0 4.0G 0% /dev/shm
/dev/sda1 296G 255G 39G 87% /


Here you can see that the rootfs has 87% disk space used which confirms the problem.



Resolving Disk Space

You have two options:

Add more disk space

This is most likely the course of action you need to take. Once you've added the disk space, if the custer health does not return to green, restart the elasticsearch service on that instance:


RHEL 7 + | CentOS 7  + | Debian | Ubuntu 16/18/20

systemctl restart elasticsearch.service


Wait about 5 minutes and the cluster health should return to green.


This documentation will help if you want to move the data location:

Documentation - Changing Data Store Path


Increase The Low/High Watermark

The default watermark level is set to 85% of the disk that the elasticsearch data is located on. If you have a much larger disk, you may want to increase this to 90% or more.

Note: The watermark is a cluster-wide setting.

The command to adjust the LOW watermark is:

curl -s -XPUT http://localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.disk.watermark.low" : "90%" } }'

 The command to adjust the HIGH watermark is:

curl -s -XPUT http://localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.disk.watermark.high" : "95%" } }'

Which will output similar to the following:



Then restart the elasticsearch service on that instance:


RHEL 7 + | CentOS 7 + | Debian | Ubuntu 16/18/20

systemctl restart elasticsearch.service


Wait about 5 minutes and the cluster health should return to green.



Final Thoughts

For any support related questions please visit the Nagios Support Forums at:


0 (0)
Article Rating (No Votes)
Rate this article
  • Icon PDFExport to PDF
  • Icon MS-WordExport to MS Word
Attachments Attachments
There are no attachments for this article.
Related Articles RSS Feed
Nagios Log Server - Understanding and Troubleshooting Red Cluster Health
Viewed 7173 times since Mon, Apr 6, 2015
Nagios Log Server - Managing Clusters
Viewed 3003 times since Thu, Jan 28, 2016
Nagios Log Server - How To Configure SSL
Viewed 3592 times since Thu, Jan 28, 2016
Nagios Log Server - Removing An Instance From A Cluster
Viewed 3033 times since Wed, Mar 21, 2018
Installation errors on customized corporate builds of CentOS or RHEL
Viewed 9421 times since Tue, Jan 26, 2016
Configuring Your Server With A Static IP Address
Viewed 63257 times since Tue, Oct 11, 2016
Nagios Log Server - Using The Custom Includes Page
Viewed 3038 times since Mon, Sep 16, 2019
SSL/TLS - Signing Certificates With A Microsoft Certificate Authority
Viewed 24587 times since Wed, Jun 14, 2017
Nagios Log Server - Troubleshooting Commands
Viewed 5736 times since Mon, Feb 11, 2019
Nagios Log Server - Logs Not Searchable or Not Coming In
Viewed 11035 times since Tue, Jan 27, 2015