Nagios Support Forum

Posted: **Tue May 12, 2015 8:06 am**

Seeing this.. Looks like data is being replicated, but they show offline to each other.. Status of cluster is green thus the confusion. Tried restarting browser and cache, re-applying configuration, nothing..

newcluster.PNG

SystemStatus.PNG

Under system status they only show the first instance, nagilgp01. Again that is on both servers.

Posted: **Tue May 12, 2015 9:19 am**

Please run the following commands on both of your NLS nodes:

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts
sestatus
curl 'localhost:9200/_cat/master?v'
tail -f /usr/local/nagioslogserver/var/poller.log

What browser are you using?

Posted: **Tue May 12, 2015 10:25 am**

Here you go. Running Firefox 37.0.2

Code: Select all

 cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
10.0.103.180
10.136.132.107

Code: Select all

[nagios@nagilgp01 ~]$ sestatus
SELinux status:                 disabled

Code: Select all

[nagios@nagilgp01 ~]$ curl 'localhost:9200/_cat/master?v'
id                     host      ip             node
dYVMjW-KTPavYvFoNiRaTA nagilgp02 10.136.132.107 11fe29cc-9353-4cc1-a368-14a0b6977937

Code: Select all

Updating Cluster Hosts File
Updating Elasticsearch with instance...
Updating Cluster Hosts File
Updating Elasticsearch with instance...
Updating Cluster Hosts File
Updating Elasticsearch with instance...
Updating Cluster Hosts File
Updating Elasticsearch with instance...
Finished Polling.

Posted: **Tue May 12, 2015 10:29 am**

Could you please run the commands on your other node as well?

Posted: **Tue May 12, 2015 11:23 am**

Here you go. Also the first command is leading me to believe we do have an issue.

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts
localhost

Code: Select all

SELinux status:                 disabled

Code: Select all

-bash-4.1$ curl 'localhost:9200/_cat/master?v'
id                     host      ip             node
dYVMjW-KTPavYvFoNiRaTA nagilgp02 10.136.132.107 11fe29cc-9353-4cc1-a368-14a0b6977937

Code: Select all

-bash-4.1$ tail -f /usr/local/nagioslogserver/var/poller.log
tail: cannot open `/usr/local/nagioslogserver/var/poller.log' for reading: No such file or directory
tail: no files remaining
-bash-4.1$ cd /usr/local/nagioslogserver/
-bash-4.1$ ll
total 32
drwxr-xr-x 7 nagios nagios 4096 May 11 14:39 elasticsearch
drwxrwxr-x 2 nagios nagios 4096 May 11 14:39 etc
drwxr-xr-x 9 nagios nagios 4096 May 11 14:39 logstash
drwxrwxr-x 2 nagios nagios 4096 May 11 14:39 mibs
drwxrwxr-x 2 nagios nagios 4096 May 11 14:39 scripts
drwxrwxr-x 2 nagios nagios 4096 May 11 14:39 snapshots
drwxrwxr-x 3 nagios nagios 4096 May 11 23:03 tmp
drwxrwxr-x 2 nagios nagios 4096 May 11 23:02 var
-bash-4.1$ cd var
-bash-4.1$ ll
total 12
-rwxrwxr-x 1 nagios nagios 34 May 11 23:02 cluster_hosts
-rw-rw-r-- 1 nagios nagios 36 May 11 23:02 cluster_uuid
-rw-rw-r-- 1 nagios nagios 37 May 11 23:02 node_uuid
-bash-4.1$

Posted: **Tue May 12, 2015 11:27 am**

Here you go. Also the first command is leading me to believe we do have an issue.

You are correct - please add the following to the 'cluster_hosts' file on node 2 (the one you posted second):

Code: Select all

echo "10.0.103.180" >> /usr/local/nagioslogserver/var/cluster_hosts
echo "10.136.132.107" >> /usr/local/nagioslogserver/var/cluster_hosts

After adding that information, please re-run your tests to see whether or not that helped. If your issue isn't resolved, restart elasticsearch on both nodes:

Code: Select all

service elasticsearch restart

Thanks!

Posted: **Tue May 12, 2015 1:49 pm**

Still having same issue where nagilgp02 is showing red for elasticsearch and logstash and when I go to select instances it is only showing 01. All the files are as they should be per your changes.

I do notice there is no poller.log on the 02 server.

Any ideas?

Posted: **Tue May 12, 2015 2:16 pm**

Maybe this is working as expected. I am able to log into nagilgp02.dcri.duke.net just fine and look at all the events and dashboards as they were defined on lgp01 before.

This is the instance status screen. Elasticsearch and Logstash are running according to ps -ef

Code: Select all

bash-4.1$ ps -ef | grep -i elastic
nagios    1519     1  4 13:46 ?        00:03:52 /usr/bin/java -Xms7975m -Xmx7975m -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Des.cluster.name=907e60a9-dc29-411e-96e8-2dfe503e0867 -Des.node.name=11fe29cc-9353-4cc1-a368-14a0b6977937 -Des.discovery.zen.ping.unicast.hosts=localhost,10.0.103.180,10.136.132.107 -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/local/nagioslogserver/elasticsearch -cp :/usr/local/nagioslogserver/elasticsearch/lib/elasticsearch-1.3.2.jar:/usr/local/nagioslogserver/elasticsearch/lib/*:/usr/local/nagioslogserver/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/local/nagioslogserver/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/usr/local/nagioslogserver/elasticsearch/data -Des.default.path.work=/usr/local/nagioslogserver/tmp/elasticsearch -Des.default.path.conf=/usr/local/nagioslogserver/elasticsearch/config org.elasticsearch.bootstrap.Elasticsearch

NLS-InstanceOverview.png

Posted: **Tue May 12, 2015 2:30 pm**

It's interesting that elasticsearch isn't being detected properly from the GUI.

Does any functionality seem impacted? I'd like to see a screenshot of your 'Cluster Status' page.

As far as detection is concerned, please perform the following procedure:

Log into Node 1 and navigate to 'Administration -> System Status'. Select the instance that you're on. Are all of the buttons showing green? Select the other instance and report what displays.

Log into Node 2 and navigate to 'Administration -> System Status'. Select the instance that you're on. Are all of the buttons showing green? Select the other instance and report what displays.

My assumption is that if you're logged into Node 1, you can't see the status of Node 2 - and vice versa. I would like you to confirm this.

Posted: **Wed May 13, 2015 8:42 am**

Actually, on both nagilgp01 and 02 under Administration --> System Status they only show nagilgp01.

This is from nagilgp02 host. Nagilgp01 is the only option available, same if I am on the 01 server.

nls-systemstatus.png

Here is the Cluster Status page you requested. I think it is working normally as I can use the 02 web gui just fine. I just cannot control 01 from 02 and vice versa.

NSL-ClusterStatus.png

Nagios Support Forum

Added second NLS server to cluster but...

Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...

Re: Added second NLS server to cluster but...