Nagios Support Forum

Posted: **Tue Jul 14, 2015 11:35 am**

Hi,

Background:
Added a 2nd node, it worked fine, then I started messing with the elasticsearch config and messed it up where it wouldn't want to recover. Ended up deleting everything out and reinstalling from scratch. I was able to add a new node and it shows up in the GUI but it shows that elasticsearch and logstash are both down. Checking the machine itself shows that the services are running fine.

status.PNG

I am able to access the web ui from either hostname and it's showing the same story on both - which if nag2 (new node) thinks that elasticsearch is down, wouldn't it take me to an error page? I tried restarting (just the services, and the whole box) with no luck. Also - note lack of "delete" (garbage can icon) in the Actions column. Also in the System Status section of the admin screen in the Instance drop down I only get nag1 (the original node) as an option, and the same in the Per Instance (Advanced) section - regardless of which hostname I access the gui from.

I am not denying the fact that maybe I screwed up something with my deleting and reinstalling and I'm not against doing it again, just want to make sure I'm not missing anything.

EDIT - After restarting elasticsearch I do get this error:

Code: Select all

[root@schpnag2 ~]# service elasticsearch restart
Stopping elasticsearch:                                    [  OK  ]
Starting elasticsearch:                                    [  OK  ]
[root@schpnag2 ~]# Exception in thread ">output" org.elasticsearch.client.transport.NoNodeAvailableException: No node available
        at org.elasticsearch.client.transport.TransportClientNodesService.execute(org/elasticsearch/client/transport/TransportClientNodesService.java:219)
        at org.elasticsearch.client.transport.support.InternalTransportIndicesAdminClient.execute(org/elasticsearch/client/transport/support/InternalTransportIndicesAdminClient.java:85)
        at org.elasticsearch.client.support.AbstractIndicesAdminClient.getTemplates(org/elasticsearch/client/support/AbstractIndicesAdminClient.java:544)
        at org.elasticsearch.action.admin.indices.template.get.GetIndexTemplatesRequestBuilder.doExecute(org/elasticsearch/action/admin/indices/template/get/GetIndexTemplatesRequestBuilder.java:41)
        at org.elasticsearch.action.ActionRequestBuilder.execute(org/elasticsearch/action/ActionRequestBuilder.java:85)
        at org.elasticsearch.action.ActionRequestBuilder.execute(org/elasticsearch/action/ActionRequestBuilder.java:59)
        at org.elasticsearch.action.ActionRequestBuilder.get(org/elasticsearch/action/ActionRequestBuilder.java:67)
        at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:606)
        at RUBY.template_exists?(/usr/local/nagioslogserver/logstash/lib/logstash/outputs/elasticsearch/protocol.rb:231)
        at RUBY.template_install(/usr/local/nagioslogserver/logstash/lib/logstash/outputs/elasticsearch/protocol.rb:21)
        at RUBY.register(/usr/local/nagioslogserver/logstash/lib/logstash/outputs/elasticsearch.rb:259)
        at org.jruby.RubyArray.each(org/jruby/RubyArray.java:1613)
        at RUBY.outputworker(/usr/local/nagioslogserver/logstash/lib/logstash/pipeline.rb:220)
        at RUBY.start_outputs(/usr/local/nagioslogserver/logstash/lib/logstash/pipeline.rb:152)
        at java.lang.Thread.run(java/lang/Thread.java:745)

Posted: **Tue Jul 14, 2015 4:26 pm**

Please run the following on both nodes:

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts
cat /usr/local/nagioslogserver/var/cluster_uuid
cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf

From the above we should be able to tell where things might be going wrong.

Posted: **Tue Jul 14, 2015 4:45 pm**

Node 1

Code: Select all

[root@schpnag1 ~]# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
192.168.1.175
192.168.1.249

[root@schpnag1 ~]# cat /usr/local/nagioslogserver/var/cluster_uuid
4f703585-84ab-40e0-9ff9-f72c904bdc38

[root@schpnag1 ~]# cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 04 Mar 2015 15:27:23 -0600
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '4f703585-84ab-40e0-9ff9-f72c904bdc38'
        host => 'localhost'
        index_type => '%{type}'
        node_name => ''
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#


[root@schpnag1 ~]# clear
[root@schpnag1 ~]# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
192.168.1.175
192.168.1.249[root@slear
[root@schpnag1 ~]#
[root@schpnag1 ~]# cat /usr/local/nagioslogserver/var/cluster_uuid
4f703585-84ab-40e0-9ff9-f72c904bdc38
[root@schpnag1 ~]# clear
[root@schpnag1 ~]# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
192.168.1.175
192.168.1.249[root@sat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 04 Mar 2015 15:27:23 -0600
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '4f703585-84ab-40e0-9ff9-f72c904bdc38'
        host => 'localhost'
        index_type => '%{type}'
        node_name => ''
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Node 2

Code: Select all

[root@schpnag2 ~]# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost

schpnag1[root@schpnag2 ~]# cat /usr/local/nagioslogserver/var/cluster_uuid
4f703585-84ab-40e0-9ff9-f72c904bdc38

[root@schpnag2 ~]# cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Tue, 14 Jul 2015 13:55:36 -0500
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => 'c3003aac-586b-4be2-8581-e73938592447'
        host => 'localhost'
        index_type => '%{type}'
        node_name => '843eb4bb-fb4a-4166-9f69-a1cfd529a18d'
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

So no big surprise there, something about how the cluster was set up for node 2 seems to be off...

Posted: **Tue Jul 14, 2015 4:50 pm**

[root@schpnag2 ~]# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost

This throws me off. Please modify this file to include localhost, the local IP, and the IP of the other node as your other node is configured:

cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
192.168.1.175
192.168.1.249

The following is also interesting:

output {
elasticsearch {
cluster => '4f703585-84ab-40e0-9ff9-f72c904bdc38'

output {
elasticsearch {
cluster => 'c3003aac-586b-4be2-8581-e73938592447'

Be sure that the 'cluster' field in this file is set to the cluster UUID (in this case, the proper UUID is likely 4f703585-84ab-40e0-9ff9-f72c904bdc38.

After changing those two things, you should see the Web GUI respond more politely.

Posted: **Wed Jul 15, 2015 10:04 am**

Made the changes, and did a full system restart. No dice.

NODE 2 (after the changes):

Code: Select all

[root@schpnag2 ~]# cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Tue, 14 Jul 2015 13:55:36 -0500
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '4f703585-84ab-40e0-9ff9-f72c904bdc38'
        host => 'localhost'
        index_type => '%{type}'
        node_name => '843eb4bb-fb4a-4166-9f69-a1cfd529a18d'
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Code: Select all

[root@schpnag2 ~]# cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
192.168.1.175
192.168.1.249

One thing I didn't notice earlier is that on NODE 1 the node_name field is blank:

Code: Select all

[root@sat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 04 Mar 2015 15:27:23 -0600
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '4f703585-84ab-40e0-9ff9-f72c904bdc38'
        host => 'localhost'
        index_type => '%{type}'
        node_name => ''
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Posted: **Wed Jul 15, 2015 10:15 am**

One thing I didn't notice earlier is that on NODE 1 the node_name field is blank:

Good catch. Ensure that the node_name field matches up with what you see in the node_uuid file:

node_name should be set to the value of:

Code: Select all

cat /usr/local/nagioslogserver/var/node_uuid

After making that change, restart logstash on node 1. Any change in your behavior?

Posted: **Wed Jul 15, 2015 10:30 am**

Fixed NODE 1 to reflect the node name. As far as I can tell all 3 files on the 2 nodes now reflect each other, and both nodes have been rebooted. Still no go. What are some other possible variables that might be causing this?

Code: Select all

[root@schpnag1 ~]# cat /usr/local/nagioslogserver/logstash/etc/conf.d/999_outputs.conf
#
# Logstash Configuration File
# Dynamically created by Nagios Log Server
#
# DO NOT EDIT THIS FILE. IT WILL BE OVERWRITTEN.
#
# Created Wed, 04 Mar 2015 15:27:23 -0600
#

#
# Required output for Nagios Log Server
#

output {
    elasticsearch {
        cluster => '4f703585-84ab-40e0-9ff9-f72c904bdc38'
        host => 'localhost'
        index_type => '%{type}'
        node_name => 'ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b'
        protocol => 'transport'
        workers => 4
    }
}

#
# Global outputs
#



#
# Local outputs
#

Posted: **Wed Jul 15, 2015 10:48 am**

I can't think of a reason this might be happening. Let's collect some additional information.

Run the following on both nodes:

Code: Select all

curl 'localhost:9200/_cat/master?v'

Run the following on one node:

Code: Select all

curl 'localhost:9200/_cat/nodes?v'
curl 'localhost:9200/_cat/pending_tasks?v'
curl -XGET 'localhost:9200/_cat/recovery?v'
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

Posted: **Wed Jul 15, 2015 10:54 am**

Same on both nodes

Code: Select all

[root@schpnag1 ~]# curl 'localhost:9200/_cat/master?v'
id                     host     ip            node
qF8dekxASSKDwhE39PDwjg schpnag1 192.168.1.175 ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b

[root@schpnag2 ~]# curl 'localhost:9200/_cat/master?v'
id                     host     ip            node
qF8dekxASSKDwhE39PDwjg schpnag1 192.168.1.175 ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b

Code: Select all

[root@schpnag1 ~]# curl 'localhost:9200/_cat/nodes?v'
host     ip            heap.percent ram.percent load node.role master name
schpnag1 192.168.1.175           36          61 0.52 d         *      ea9ddcd0-c0a5-4d5d-a802-e741d9c51a5b
schpnag2 127.0.0.1               37          60 0.06 d         m      843eb4bb-fb4a-4166-9f69-a1cfd529a18d

[root@schpnag1 ~]# curl 'localhost:9200/_cat/pending_tasks?v'
insertOrder timeInQueue priority source

[root@schpnag1 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "4f703585-84ab-40e0-9ff9-f72c904bdc38",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 726,
  "active_shards" : 1451,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1
}

recovery too long to insert in message, added as attachment

Posted: **Wed Jul 15, 2015 12:18 pm**

Any interesting data from the following logs on either node?

Code: Select all

cat /var/log/elasticsearch/*.log
cat /var/log/logstash/logstash.log
tail -n20 /var/log/httpd/error_log
tail -n20 /var/log/httpd/access_log
tail -f /usr/local/nagioslogserver/var/jobs.log
tail -f /usr/local/nagioslogserver/var/poller.log

Please note that poller.log and jobs.log will need to be tailed for a few minutes to see any good output.

Nagios Support Forum

Adding a node, Elasticsearch and Logstash down in GUI

Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI

Re: Adding a node, Elasticsearch and Logstash down in GUI