Cluster 2nd Node OFF

teirekos · Post by **teirekos** » Mon Mar 02, 2015 4:56 am

I have one cluster with 2 nodes. My 2nd node after 5-6 days of normal operation it stops to be part of the cluster.

I' ve noticed that the memory consumption is high (though the 16G of RAM where 8 of them are allocated for heap memory).

After a reboot memory looks like this:

Code: Select all

[root@NagiosLogServer2 elasticsearch]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081       8783       7297          0         61       3758
-/+ buffers/cache:       4963      11117
Swap:          255          0        255

5 days later:
[root@NagiosLogServer2 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081      15891        189          0        134       1063
-/+ buffers/cache:      14693       1387
Swap:          255         71        184

Any hints?

scottwilkerson · Post by **scottwilkerson** » Mon Mar 02, 2015 8:57 am

What version of Log Server is this running?

Also, when the system has all this used, can you run the following and post memory.txt so we can see what is using the memory

Code: Select all

ps aux > /tmp/memory.txt

teirekos · Post by **teirekos** » Mon Mar 02, 2015 9:07 am

I'm running the latest ver 1.3

Now the cluster is "broken", so
I attach the memory.txt and the free -m output

Code: Select all

[root@NagiosLogServer2 tmp]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081      15878        202          0        136        927
-/+ buffers/cache:      14814       1266
Swap:          255         88        167

scottwilkerson · Post by **scottwilkerson** » Mon Mar 02, 2015 9:24 am

Hmmm, memory usage in the ps aux only looks like less than 50% is being used...

Can you also run the following

Code: Select all

grep http /usr/local/nagioslogserver/etc/conf.d/*

teirekos · Post by **teirekos** » Mon Mar 02, 2015 9:33 am

The following path does not exist.
[root@NagiosLogServer2 elasticsearch]# grep http /usr/local/nagioslogserver/etc/conf.d/*
grep: /usr/local/nagioslogserver/etc/conf.d/*: No such file or directory

If you mean the path below no http entry there
[root@NagiosLogServer2 elasticsearch]# grep http /usr/local/nagioslogserver/logstash/etc/conf.d

I also attach you a screenshot from Instance status where mem used is 92%

tmcdonald · Post by **tmcdonald** » Mon Mar 02, 2015 6:18 pm

Can we get a tail of your logstash logs?

Code: Select all

tail /var/log/logstash/logstash.log

teirekos · Post by **teirekos** » Tue Mar 03, 2015 9:02 am

[root@NagiosLogServer2 yum.repos.d]# tail /var/log/logstash/logstash.log
log4j, [2015-03-03T14:01:05.561] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:10.563] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:15.565] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:20.566] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:25.568] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:30.569] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:35.571] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:40.573] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:45.574] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:50.576] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...

jolson · Post by **jolson** » Tue Mar 03, 2015 3:31 pm

That last error message looks a little suspect. Just to be sure there's not an obvious error, can you please run the following commands on both of your nodes and return the output:

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_uuid

Code: Select all

cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name

Thank you.

teirekos · Post by **teirekos** » Wed Mar 04, 2015 2:34 am

Node A
[root@NagiosLogServer /]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b249934-e049-4f18-96ed-db395faae965
Node B
[root@NagiosLogServer2 /]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b249934-e049-4f18-96ed-db395faae965

Node A
[root@NagiosLogServer /]# cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
cluster.name: nagios_elasticsearch
Node B
[root@NagiosLogServer2 /]# cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
cluster.name: nagios_elasticsearch

cmerchant · Post by **cmerchant** » Wed Mar 04, 2015 2:46 pm

Could you show us the output of the following local cluster queries (on each node) from the command line:

Code: Select all

curl -XGET 'http://127.0.0.1:9200/?pretty'

Nagios Support Forum

Cluster 2nd Node OFF

Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF

Re: Cluster 2nd Node OFF