Monitoring Nagios Log Server

tmcdonald · Post by **tmcdonald** » Wed Jun 24, 2015 1:18 pm

Newborns get in for free!

stecino · Post by **stecino** » Wed Jun 24, 2015 1:25 pm

jolson wrote:By default, Nagios Log Server won't allow you to query that information from the outside. Your best bet is to use a plugin like NRPE to perform local queries.

For instance, the following will return proper java results if you run it on Nagios Log Server:
Code: Select all
curl -XGET localhost:9200/_nodes/jvm?pretty
Basic health check:
Code: Select all
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
These queries don't work from the outside for security purposes.

You could easily use a plugin like NRPE to launch these queries locally - if that's something you're interested in getting data on. Otherwise, I can recommend the following plugin: https://github.com/anchor/nagios-plugin-elasticsearch

Yes I can use the NRPE to get the json and parse it. Are there any other URIs i could use? For example what are the checks to for status check for elasticSearch and logstash, where you guys report green or red. Is status from url 'http://localhost:9200/_cluster/health?pretty=true' the indicator?

jolson · Post by **jolson** » Wed Jun 24, 2015 1:41 pm

It might be easier for your to use check_procs or similar to monitor elasticsearch and logstash - but if you're set on using the api, the call is as follows:

Code: Select all

http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=logstash&token=xxxxxxx

Where your token is the token used by your NLS user - you can find it by using a developer console and watching the token sent while pressing 'restart' on logstash. I'd be wary about sending this token across your network from Nagios - so you may be better off using NRPE.

2015-06-24 13_39_14-System Status • Nagios Log Server - Firefox Developer Edition.png

Post by **eloyd** » Wed Jun 24, 2015 1:56 pm

Hint of September's talk: We're using check_procs to check for running processes via NRPE....

jolson · Post by **jolson** » Wed Jun 24, 2015 3:23 pm

Hint of September's talk: We're using check_procs to check for running processes via NRPE....

I believe that is the correct way to go about monitoring the processes. It allows for future expansion (monitor other processes/use different plugins on the same server), and would likely be more reliable than using the API calls.

stecino · Post by **stecino** » Mon Jun 29, 2015 1:44 pm

Just to give you update: so I setup the monitors using NRPE as well as using the API with token combo. Everything is working great, getting all the stuff I need. It actually caught status change on the Cluster.

So my question is the following: I have all my 4 nodes in the clusters up, but the status is at yellow, instance statuses are green

I see this:

Active Primary Shards 161
Active Shards 207
Relocating Shards 0
Initializing Shards 8
Unassigned Shards 107

Is this due to Unassgined Shards? How is it determined the warning state?

jolson · Post by **jolson** » Mon Jun 29, 2015 2:15 pm

Is this due to Unassgined Shards? How is it determined the warning state?

This is certainly due to the uninitialized shards. Cluster health statesare described as follows:

green
All primary and replica shards are allocated. Your cluster is 100% operational.

yellow
All primary shards are allocated, but at least one replica is missing. No data is missing, so search results will still be complete. However, your high availability is compromised to some degree. If more shards disappear, you might lose data. Think of yellow as a warning that should prompt investigation.

red
At least one primary shard (and all of its replicas) are missing. This means that you are missing data: searches will return partial results, and indexing into that shard will return an exception.

Let's take a look at your cluster health in more detail. Please run the following on your CLI and return the results to us:

Code: Select all

curl 'localhost:9200/_cluster/health?level=indices&pretty'

stecino · Post by **stecino** » Mon Jun 29, 2015 3:27 pm

Code: Select all

{
  "cluster_name" : "xxxxxxxxxxxxxxxxxxxxxxxxxx",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 161,
  "active_shards" : 264,
  "relocating_shards" : 0,
  "initializing_shards" : 8,
  "unassigned_shards" : 50,
  "indices" : {
    "nagioslogserver" : {
      "status" : "yellow",
      "number_of_shards" : 1,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 1
    },
    "logstash-2015.06.19" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 9,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 1
    },
    "logstash-2015.06.28" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 5,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 5
    },
    "nagioslogserver_log" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.29" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 7,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 3
    },
    "logstash-2015.06.26" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 5,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 5
    },
    "logstash-2015.06.27" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 6,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 4
    },
    "logstash-2015.06.24" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 7,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 3
    },
    "logstash-2015.06.25" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 5,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 5
    },
    "logstash-2015.06.22" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 7,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 3
    },
    "logstash-2015.06.23" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 5,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 5
    },
    "logstash-2015.06.03" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.20" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 6,
      "relocating_shards" : 0,
      "initializing_shards" : 2,
      "unassigned_shards" : 2
    },
    "logstash-2015.06.02" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.21" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 6,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 4
    },
    "logstash-2015.06.01" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.07" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.06" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.05" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.04" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.09" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.08" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.15" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 7,
      "relocating_shards" : 0,
      "initializing_shards" : 1,
      "unassigned_shards" : 2
    },
    "logstash-2015.06.16" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 8,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 2
    },
    "logstash-2015.06.17" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 7,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 3
    },
    "logstash-2015.06.18" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 8,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 2
    },
    "logstash-2015.06.11" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.05.31" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.12" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.13" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 7,
      "relocating_shards" : 0,
      "initializing_shards" : 3,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.14" : {
      "status" : "yellow",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 8,
      "relocating_shards" : 0,
      "initializing_shards" : 2,
      "unassigned_shards" : 0
    },
    "kibana-int" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.06.10" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    }
  }
}

jolson · Post by **jolson** » Mon Jun 29, 2015 3:52 pm

"initializing_shards" : 8,
"unassigned_shards" : 50,

The issue lies in the numbers above. I notice that this number has decreased from the metric you posted earlier:

Initializing Shards 8
Unassigned Shards 107

This is a good sign. It means that the shards without homes are being assigned to instances of Nagios Log Server properly. Keep an eye on your cluster - if the health isn't green in a day or so, I want you to show us another capture of the index status:

Code: Select all

curl 'localhost:9200/_cluster/health?level=indices&pretty'

For now, as long as the 'unassigned shards' number is going down, we're on track to a green cluster state. Ultimately this means that your shards are moving between your instances for load balance and availability purposes - this movement takes some time.

stecino · Post by **stecino** » Mon Jun 29, 2015 4:10 pm

jolson wrote:
"initializing_shards" : 8,
"unassigned_shards" : 50,
The issue lies in the numbers above. I notice that this number has decreased from the metric you posted earlier:
Initializing Shards 8
Unassigned Shards 107
This is a good sign. It means that the shards without homes are being assigned to instances of Nagios Log Server properly. Keep an eye on your cluster - if the health isn't green in a day or so, I want you to show us another capture of the index status:
Code: Select all
curl 'localhost:9200/_cluster/health?level=indices&pretty'
For now, as long as the 'unassigned shards' number is going down, we're on track to a green cluster state. Ultimately this means that your shards are moving between your instances for load balance and availability purposes - this movement takes some time.

Got it thanks. What triggers this behavior do you know? And how can I prevent this?

Nagios Support Forum

Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server

Re: Monitoring Nagios Log Server