Page 1 of 4

Cluster 2nd Node OFF

Posted: Mon Mar 02, 2015 4:56 am
by teirekos
I have one cluster with 2 nodes. My 2nd node after 5-6 days of normal operation it stops to be part of the cluster.

I' ve noticed that the memory consumption is high (though the 16G of RAM where 8 of them are allocated for heap memory).

After a reboot memory looks like this:

Code: Select all

[root@NagiosLogServer2 elasticsearch]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081       8783       7297          0         61       3758
-/+ buffers/cache:       4963      11117
Swap:          255          0        255

5 days later:
[root@NagiosLogServer2 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081      15891        189          0        134       1063
-/+ buffers/cache:      14693       1387
Swap:          255         71        184
 
Any hints?

Re: Cluster 2nd Node OFF

Posted: Mon Mar 02, 2015 8:57 am
by scottwilkerson
What version of Log Server is this running?

Also, when the system has all this used, can you run the following and post memory.txt so we can see what is using the memory

Code: Select all

ps aux > /tmp/memory.txt

Re: Cluster 2nd Node OFF

Posted: Mon Mar 02, 2015 9:07 am
by teirekos
I'm running the latest ver 1.3

Now the cluster is "broken", so
I attach the memory.txt and the free -m output

Code: Select all

[root@NagiosLogServer2 tmp]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081      15878        202          0        136        927
-/+ buffers/cache:      14814       1266
Swap:          255         88        167

Re: Cluster 2nd Node OFF

Posted: Mon Mar 02, 2015 9:24 am
by scottwilkerson
Hmmm, memory usage in the ps aux only looks like less than 50% is being used...

Can you also run the following

Code: Select all

grep http /usr/local/nagioslogserver/etc/conf.d/*

Re: Cluster 2nd Node OFF

Posted: Mon Mar 02, 2015 9:33 am
by teirekos
The following path does not exist.
[root@NagiosLogServer2 elasticsearch]# grep http /usr/local/nagioslogserver/etc/conf.d/*
grep: /usr/local/nagioslogserver/etc/conf.d/*: No such file or directory

If you mean the path below no http entry there
[root@NagiosLogServer2 elasticsearch]# grep http /usr/local/nagioslogserver/logstash/etc/conf.d

I also attach you a screenshot from Instance status where mem used is 92%

Re: Cluster 2nd Node OFF

Posted: Mon Mar 02, 2015 6:18 pm
by tmcdonald
Can we get a tail of your logstash logs?

Code: Select all

tail /var/log/logstash/logstash.log

Re: Cluster 2nd Node OFF

Posted: Tue Mar 03, 2015 9:02 am
by teirekos
[root@NagiosLogServer2 yum.repos.d]# tail /var/log/logstash/logstash.log
log4j, [2015-03-03T14:01:05.561] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:10.563] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:15.565] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:20.566] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:25.568] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:30.569] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:35.571] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:40.573] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:45.574] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:50.576] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...

Re: Cluster 2nd Node OFF

Posted: Tue Mar 03, 2015 3:31 pm
by jolson
That last error message looks a little suspect. Just to be sure there's not an obvious error, can you please run the following commands on both of your nodes and return the output:

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_uuid

Code: Select all

cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
Thank you.

Re: Cluster 2nd Node OFF

Posted: Wed Mar 04, 2015 2:34 am
by teirekos
Node A
[root@NagiosLogServer /]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b249934-e049-4f18-96ed-db395faae965
Node B
[root@NagiosLogServer2 /]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b249934-e049-4f18-96ed-db395faae965

Node A
[root@NagiosLogServer /]# cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
cluster.name: nagios_elasticsearch
Node B
[root@NagiosLogServer2 /]# cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
cluster.name: nagios_elasticsearch

Re: Cluster 2nd Node OFF

Posted: Wed Mar 04, 2015 2:46 pm
by cmerchant
Could you show us the output of the following local cluster queries (on each node) from the command line:

Code: Select all

curl -XGET 'http://127.0.0.1:9200/?pretty'