Page 1 of 2

NLS CLuster and Alerting

Posted: Tue Jul 28, 2015 12:38 pm
by Mitchell
We are using NLS cluster ( 2 nodes ) , currently we are using default php alert mechanism and we are seeing alert from both the nodes , is there any way so that we can see the alerts from only one host not from both the hosts , how the clustering and alerting works together ?

Thanks,
Ramesh

Re: NLS CLuster and Alerting

Posted: Tue Jul 28, 2015 1:25 pm
by jolson
Ramesh,

The alerting mechanism should only ever fire on a single instance. If your alerts are firing from both of your instances, there's likely a configuration problem somewhere. Could you run the following commands for me please?

Run this on both nodes:

Code: Select all

curl 'localhost:9200/_cat/master?v'

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_uuid

Code: Select all

tail -n20 /var/log/cron
Run this on a single node:

Code: Select all

curl 'localhost:9200/_cat/nodes?v'

Re: NLS CLuster and Alerting

Posted: Tue Jul 28, 2015 1:45 pm
by Mitchell
Here is the output for Node 1
[root@node1 ~]# curl 'localhost:9200/_cat/master?v'
id host ip node
b73hsHGESAiXyMrzQqxlyg node2.mitchell.com 172.24.25.136 00796953-fd9f-4953-b9d8-913a6b61f4da
[root@node1 ~]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b3cb2e4-7688-4e6d-a72d-44350fea9b83
[root@node1 ~]# tail -n20 /var/log/cron
Jul 28 11:29:01 node1 CROND[5401]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:29:01 node1 CROND[5403]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:30:01 node1 CROND[5628]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:30:01 node1 CROND[5630]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:31:01 node1 CROND[5957]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:31:01 node1 CROND[5958]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:32:01 node1 CROND[6148]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:32:01 node1 CROND[6149]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:33:01 node1 CROND[6338]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:33:01 node1 CROND[6339]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:34:01 node1 CROND[6528]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:34:01 node1 CROND[6529]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:35:01 node1 CROND[6754]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:35:01 node1 CROND[6755]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:36:01 node1 CROND[6945]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:36:01 node1 CROND[6946]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:37:01 node1 CROND[7135]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:37:01 node1 CROND[7136]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:38:01 node1 CROND[7353]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:38:01 node1 CROND[7354]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
[root@node1 ~]# curl 'localhost:9200/_cat/nodes?v'
host ip heap.percent ram.percent load node.role master name
node1.mitchell.com 172.24.25.135 46 75 0.12 d m 41a07432-8d31-4259-a3d5-9ba9c0379bad
node2.mitchell.com 172.24.25.136 70 85 0.00 d * 00796953-fd9f-4953-b9d8-913a6b61f4da
[root@node1 ~]#

Here is the output for node 2

[root@node2 ~]# curl 'localhost:9200/_cat/master?v'
id host ip node
b73hsHGESAiXyMrzQqxlyg node2.mitchell.com 172.24.25.136 00796953-fd9f-4953-b9d8-913a6b61f4da
[root@node2 ~]# cat /usr/local/nagioslogserver/var/cluster_uuid
2b3cb2e4-7688-4e6d-a72d-44350fea9b83[root@node2 ~]#
[root@node2 ~]# tail -n20 /var/log/cron
Jul 28 11:27:01 node2 CROND[17314]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:27:01 node2 CROND[17315]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:28:01 node2 CROND[17406]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:28:01 node2 CROND[17407]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:29:01 node2 CROND[17496]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:29:01 node2 CROND[17497]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:30:01 node2 CROND[17587]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:30:01 node2 CROND[17588]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:31:01 node2 CROND[17807]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:31:01 node2 CROND[17808]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:32:01 node2 CROND[17913]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:32:01 node2 CROND[17914]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:33:01 node2 CROND[18008]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:33:01 node2 CROND[18009]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:34:01 node2 CROND[18101]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:34:01 node2 CROND[18102]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:35:01 node2 CROND[18192]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:35:01 node2 CROND[18193]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
Jul 28 11:36:01 node2 CROND[18283]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1)
Jul 28 11:36:01 node2 CROND[18284]: (nagios) CMD (/usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log 2>&1)
[root@node2 ~]#

Thanks
Ramesh

Re: NLS CLuster and Alerting

Posted: Tue Jul 28, 2015 4:05 pm
by Mitchell
i ran the command you provided and sent out the output , what these commands does ? is it suppose to fix the issue ?

Re: NLS CLuster and Alerting

Posted: Tue Jul 28, 2015 5:29 pm
by jolson
No - the commands that I had you run were for information gathering purposes. The output looks good, so I'm doing some investigation on my lab boxes - so far I cannot get your problem to reproduce. I will update you tomorrow with the result of my testing (I'll be running further tests tomorrow morning).

Re: NLS CLuster and Alerting

Posted: Wed Jul 29, 2015 1:32 pm
by Mitchell
Any update on this ?

I was looking at the output of the below command ( i believe its giving information oabout master node ) , does it saying node 2 is master node ? if yes when i run curl 'localhost:9200/_cat/nodes?v' command its giving in the output node1 is master node am i redaing this correct ?
curl 'localhost:9200/_cat/master?v'
id host ip node
b73hsHGESAiXyMrzQqxlyg node2.mitchell.com 172.24.25.136 00796953-fd9f-4953-b9d8-913a6b61f4da

[root@node2 ~]# curl 'localhost:9200/_cat/master?v'
id host ip node
b73hsHGESAiXyMrzQqxlyg node2.mitchell.com 172.24.25.136 00796953-fd9f-4953-b9d8-913a6b61f4da

[root@node1 ~]# curl 'localhost:9200/_cat/nodes?v'
host ip heap.percent ram.percent load node.role master name
node1.mitchell.com 172.24.25.135 46 75 0.12 d m 41a07432-8d31-4259-a3d5-9ba9c0379bad
node2.mitchell.com 172.24.25.136 70 85 0.00 d * 00796953-fd9f-4953-b9d8-913a6b61f4da

Thanks,
Ramesh

Re: NLS CLuster and Alerting

Posted: Wed Jul 29, 2015 2:23 pm
by jolson
The master difference you have noted is default behavior in Nagios Log Server, and it something that I've noticed on my own working clusters as well - I do not think it's related to this problem.

After my testing this morning, I am seeing only one node pick up the email job (using PHP Mailer as you have specified).

What I would like you to do is this:

Start a follow-tail on your mail log on every node in your cluster:

Code: Select all

tail -f /var/log/maillog
Send a test email:
Access the NLS Web GUI and naviagate to 'Administration -> Mail Settings'. Press the 'Test Mail' button.

Do you see mail log entries in all of your Nagios Log Server instances? If you create a new alert with the following settings:
2015-07-29 14_21_23-Alerting • Nagios Log Server - Firefox Developer Edition.png
Now force the alert to run - do you get 3 emails with appropriate entries in all three of your instances?

I'm wondering if this has to do with DNS - do you use DNS to access your NLS cluster? If so, try making the alert by accessing one of your instances via IP Address instead, is the behavior the same?

I would also like a screenshot of your 'Administration -> Global Settings' page. Specifically, do you have the 'Cluster Hostname' set?

Re: NLS CLuster and Alerting

Posted: Wed Jul 29, 2015 5:34 pm
by Mitchell
we access the NLS GUI by going to individual node URL example , and we don't get duplicate alerts , alert sent from each host are unique alerts , my initial question was we are getting alert from both the nodes but those were not duplicate alerts
http://node1lxv/nagioslogserver/index.php/admin
http://node2lxv/nagioslogserver/index.php/admin

i pressed test mail button by going to above url we got 1 email from each node , when we run the alert , alert would come from that specific node for example if i run the alerts configured in node1 alert email will come from node1 and if i run the alerts from node 2 alert email will come from node 2

Here is the global setting page on each node
Node 1
Cluster Host Name : node1lxv
Node2
Cluster Host Name : node2lxv

Thanks,
Ramesh

Re: NLS CLuster and Alerting

Posted: Wed Jul 29, 2015 5:38 pm
by Mitchell
correction to global setting page
Here is the global setting page on each node
Node 1
Cluster Host Name : node1lxv
Node2
Cluster Host Name : node1lxv

Thanks,
Ramesh

Re: NLS CLuster and Alerting

Posted: Thu Jul 30, 2015 9:53 am
by jolson
we access the NLS GUI by going to individual node URL example , and we don't get duplicate alerts , alert sent from each host are unique alerts , my initial question was we are getting alert from both the nodes but those were not duplicate alerts
I'm sorry - I did not understand until now.

The answer to your questions is the when an alert is going out, it is assigned to one random node in your cluster and pushed out of that node. There is currently no way to restrict that mailing to a single node - would you like me to put in a feature request for this function?