Cluster 2nd Node OFF

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Cluster 2nd Node OFF

Post by teirekos »

I have one cluster with 2 nodes. My 2nd node after 5-6 days of normal operation it stops to be part of the cluster.

I' ve noticed that the memory consumption is high (though the 16G of RAM where 8 of them are allocated for heap memory).

After a reboot memory looks like this:

Code: Select all

[root@NagiosLogServer2 elasticsearch]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081       8783       7297          0         61       3758
-/+ buffers/cache:       4963      11117
Swap:          255          0        255

5 days later:
[root@NagiosLogServer2 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081      15891        189          0        134       1063
-/+ buffers/cache:      14693       1387
Swap:          255         71        184
 
Any hints?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster 2nd Node OFF

Post by scottwilkerson »

What version of Log Server is this running?

Also, when the system has all this used, can you run the following and post memory.txt so we can see what is using the memory

Code: Select all

ps aux > /tmp/memory.txt
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

I'm running the latest ver 1.3

Now the cluster is "broken", so
I attach the memory.txt and the free -m output

Code: Select all

[root@NagiosLogServer2 tmp]# free -m
             total       used       free     shared    buffers     cached
Mem:         16081      15878        202          0        136        927
-/+ buffers/cache:      14814       1266
Swap:          255         88        167
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster 2nd Node OFF

Post by scottwilkerson »

Hmmm, memory usage in the ps aux only looks like less than 50% is being used...

Can you also run the following

Code: Select all

grep http /usr/local/nagioslogserver/etc/conf.d/*
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

The following path does not exist.
[root@NagiosLogServer2 elasticsearch]# grep http /usr/local/nagioslogserver/etc/conf.d/*
grep: /usr/local/nagioslogserver/etc/conf.d/*: No such file or directory

If you mean the path below no http entry there
[root@NagiosLogServer2 elasticsearch]# grep http /usr/local/nagioslogserver/logstash/etc/conf.d

I also attach you a screenshot from Instance status where mem used is 92%
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Cluster 2nd Node OFF

Post by tmcdonald »

Can we get a tail of your logstash logs?

Code: Select all

tail /var/log/logstash/logstash.log
Former Nagios employee
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

[root@NagiosLogServer2 yum.repos.d]# tail /var/log/logstash/logstash.log
log4j, [2015-03-03T14:01:05.561] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:10.563] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:15.565] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:20.566] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:25.568] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:30.569] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:35.571] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:40.573] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:45.574] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
log4j, [2015-03-03T14:01:50.576] WARN: org.elasticsearch.client.transport: [845bc07c-ed91-4920-8e23-747c9cc699f5] node [#transport#-1][inet[localhost/127.0.0.1:9300]] not part of the cluster Cluster [688cc8f8-067d-46d7-8e7d-0856a5267c32], ignoring...
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Cluster 2nd Node OFF

Post by jolson »

That last error message looks a little suspect. Just to be sure there's not an obvious error, can you please run the following commands on both of your nodes and return the output:

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_uuid

Code: Select all

cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
Thank you.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Cluster 2nd Node OFF

Post by teirekos »

Node A
[root@NagiosLogServer /]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b249934-e049-4f18-96ed-db395faae965
Node B
[root@NagiosLogServer2 /]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b249934-e049-4f18-96ed-db395faae965

Node A
[root@NagiosLogServer /]# cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
cluster.name: nagios_elasticsearch
Node B
[root@NagiosLogServer2 /]# cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml |grep cluster.name
cluster.name: nagios_elasticsearch
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: Cluster 2nd Node OFF

Post by cmerchant »

Could you show us the output of the following local cluster queries (on each node) from the command line:

Code: Select all

curl -XGET 'http://127.0.0.1:9200/?pretty'
Locked