Page 2 of 2

Re: Load balancing NLS nodes

Posted: Fri Jun 19, 2015 5:46 pm
by stecino
jolson wrote:Say you have a DNS name that resolves to all of your NLS boxes - nls1.nagios.local resolves to 192.168.1.1 -or- 192.168.1.2 -or- 192.168.1.3.

At this point, you would set the 'cluster hostname' to 'nls1.nagios.local'. This will configure all of the nodes in your cluster to be aware of that dns name - it will also set your alert messages to point to nls1.nagios.local instead of the default 127.0.0.1.

Does that make sense?
Also I have setup the VIP, it's listening on 5544 port, profile is set to UDP. I updated one of the logsources, to point to the VIP. Datagram Stats for Recieved and Transmitted show that there is traffic, but Packets don't seem to be going out, as no more events are being recorded for that host. Something definitely being passed
Are the packets from this host showing up on any of the NLS nodes? You can run a tcpdump to verify:

Code: Select all

yum install tcpdump
tcpdump -n host 192.168.x.x and dst port 5544
Where 192.168.x.x is the IP address of the sending host. You should run the above commands on each NLS node and see whether any of them are receiving traffic from the host in question.
This is a tcpdump from logsource to a VIP that fronts the NLS nodes in the cluster

# tcpdump -n host 10.yy.yy.yy and dst port 5544
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
15:29:53.182778 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 80
15:29:54.976334 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 80
15:29:55.286766 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 87
15:29:57.646546 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 74
15:29:57.646692 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 100
15:29:57.646988 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 74
15:29:57.647391 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 100
15:29:57.647466 IP 10.xx.xx.xx.34037 > 10.yy.yy.yy.5544: UDP, length 173
15:29:57.967540 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 222
15:29:57.967667 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 217
15:29:57.967949 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 224
15:29:57.968152 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 289
15:29:57.968252 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 188
15:29:57.968517 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 233
15:29:57.968753 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 164
15:29:57.968930 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 134
15:29:57.969090 IP 10.xx.xx.xx.52595 > 10.yy.yy.yy.5544: UDP, length 225

I am seeing some stuff coming in through a SNATed IP but logsource is the right one (mylogsource). The only issue i see is that this server is very chatty and I was getting alot more messages before when it was directly forwarding to one of the cluster nodes

2015-06-19T04:16:27.000Z zz.zz.zz.zz.201 nnsjq_log 18 Jun 2015 17:16:18 DEBUG com.nnn.nns.queue.job.dao.mybatis.mapper.JobProfileMapper.retreiveRescheduledJobs - ==> Parameters: mylogsource
2015-06-19T04:16:27.000Z zz.zz.zz.zz.201 nnsjq_log 18 Jun 2015 17:16:18 DEBUG org.springframework.transaction.support.TransactionSynchronizationManager - Retrieved value [org.springframework.jdbc.datasource.ConnectionHolder@698c5b] for key [jdbc:oracle:oci:@JQAPP] bound to th... mylogsource
2015-06-19T04:16:27.000Z zz.zz.zz.zz.201 nnsjq_log 18 Jun 2015 17:16:18 DEBUG org.springframework.jdbc.datasource.DataSourceTransactionManager - Switching JDBC Connection [oracle.jdbc.driver.T2CConnection@129e0c4] to manual commit mylogsource
2015-06-19T04:16:27.000Z zz.zz.zz.zz.201 nnsjq_log 18 Jun 2015 17:16:17 DEBUG org.springframework.beans.factory.support.DefaultListableBeanFactory - Returning cached instance of singleton bean 'amqpTemplate' mylogsource
2015-06-19T04:16:27.000Z zz.zz.zz.zz.201 nnsjq_log 18 Jun 2015 17:16:18 DEBUG org.springframework.jdbc.datasource.DataSourceUtils - Changing isolation level of JDBC Connection [oracle.jdbc.driver.T2CConnection@129e0c4] to 8 mylogsource
2015-06-19T04:16:27.000Z zz.zz.zz.zz.201 nnsjq_log 18 Jun 2015 17:16:17 DEBUG org.springframework.beans.factory.support.DefaultListableBeanFactory - Returning cached instance of singleton bean 'jqConfig' mylogsource

Re: Load balancing NLS nodes

Posted: Mon Jun 22, 2015 9:29 am
by jolson
Since we know that you were getting more messages previously, it would be a good idea to perform a TCPDump on that chatty host. This way we can see exactly what it's sending out. If we see more logs going out than we see coming in on the other side, the issue is in the network.

Let's do a tcpdump on your remote host:

Code: Select all

tcpdump -n host 10.x.x.x dst port 5544
Where 10.x.x.x is the IP address of Nagios Log Server.

Some other thoughts:
Since UDP is being used to transmit these packets, it's possible that some network device is dropping them without your knowledge.
Are you positive that the logs aren't showing up on the dashboard under another IP or anything of that nature?

Re: Load balancing NLS nodes

Posted: Mon Jun 22, 2015 4:13 pm
by stecino
jolson wrote:Since we know that you were getting more messages previously, it would be a good idea to perform a TCPDump on that chatty host. This way we can see exactly what it's sending out. If we see more logs going out than we see coming in on the other side, the issue is in the network.

Let's do a tcpdump on your remote host:

Code: Select all

tcpdump -n host 10.x.x.x dst port 5544
Where 10.x.x.x is the IP address of Nagios Log Server.

Some other thoughts:
Since UDP is being used to transmit these packets, it's possible that some network device is dropping them without your knowledge.
Are you positive that the logs aren't showing up on the dashboard under another IP or anything of that nature?
I have opened a case with F5. I have provided them with the all the details, QWkView, tcpdump and some logs from NLS. We will see what they say

Re: Load balancing NLS nodes

Posted: Mon Jun 22, 2015 4:31 pm
by jolson
stecino,

Sounds good - let us know.

Re: Load balancing NLS nodes

Posted: Fri Jul 17, 2015 10:55 am
by mike4vr
I am also very much interested to find out how you got this to work with the F5, if you do. We faced the same issue and ended up letting DNS handle the load balancing. Not ideal, by any means.

Re: Load balancing NLS nodes

Posted: Fri Jul 17, 2015 11:08 am
by jolson
Agreed. Any followup here stecino?