Page 2 of 3

Re: TCP/UDP ports stop responding

Posted: Mon Aug 08, 2016 2:00 pm
by rkennedy

Code: Select all

[root@localhost tmp]# cat output.txt | grep logstash | wc -l
340
[root@localhost tmp]# cat output2.txt | grep logstash | wc -l
350

[root@localhost tmp]# cat output.txt | wc -l
4136
[root@localhost tmp]# cat output2.txt | wc -l
6924

[root@localhost tmp]# cat output.txt | grep 'java' | wc -l
1821
[root@localhost tmp]# cat output2.txt | grep 'java' | wc -l
3158
It looks like the amount of connections hasn't increased much which is a good sign. I believe this is fine. If this number continues to increase until you hit the 65k cap though, then we'll need to figure out what isn't closing properly. The true test will be time though. Would you mind posting back tomorrow morning with an update of it so I can start to compare?

Re: TCP/UDP ports stop responding

Posted: Mon Aug 08, 2016 2:03 pm
by jspink
Sure can - i'll be spending time on it anyway - seems a bunch of our servers are crashing nxlog services and we're troubleshooting

Thanks for the assist

Re: TCP/UDP ports stop responding

Posted: Mon Aug 08, 2016 2:28 pm
by jspink
Issue just occured - dropped down to less than 50 hosts reporting - this time applying config did not bring it back - ended up having to reboot each node in the cluster.
Attached are lsof from before and after reboot

Output3.txt - before
output3.txt
output4.txt - after
output4.txt

Re: TCP/UDP ports stop responding

Posted: Mon Aug 08, 2016 4:01 pm
by rkennedy
Could you also attach the logstash.log file(s)?

Re: TCP/UDP ports stop responding

Posted: Tue Aug 09, 2016 10:43 am
by jspink
rkennedy wrote:Could you also attach the logstash.log file(s)?
logstash.log tail:
logstash.log-20160809.txt
for good measure - another lsof:
output5.txt

Re: TCP/UDP ports stop responding

Posted: Tue Aug 09, 2016 4:56 pm
by rkennedy
I have a feeling this has to do with logstash not closing out properly based on your latest output.

Code: Select all

[root@localhost tmp]# cat output5.txt | grep 'java' | wc -l
25800
Going to guess that it's getting to 65k, reboot is needed, and repeat. I can do some more testing with a 10 node cluster tomorrow. For reference, when you had 8 nodes, was everything working as expected?

Re: TCP/UDP ports stop responding

Posted: Wed Aug 10, 2016 7:25 am
by jspink
rkennedy wrote: Going to guess that it's getting to 65k, reboot is needed, and repeat. I can do some more testing with a 10 node cluster tomorrow. For reference, when you had 8 nodes, was everything working as expected?
With 8, and even 9 nodes, we didn't have this issue - it seems once we added the 10th, it started having issues - might it make sense to pull one node while you are doing your testing, to see if it helps the issue?
While you can build 10 nodes, guessing the volume of inbound logs isn't going to be easy to replicate.

Re: TCP/UDP ports stop responding

Posted: Wed Aug 10, 2016 12:32 pm
by rkennedy
Yes, please size down to 9 for the time being since that will get you to a stable point.

We are going to do some testing in house, and this should help answer a few questions. While we won't have the same amount of logs you do, it should still be possible to see if files are not closing properly as the number will continue to increase. @mcapra is spinning up the cluster now so we should have more information in a few days.

Re: TCP/UDP ports stop responding

Posted: Wed Aug 10, 2016 2:33 pm
by jspink
9 nodes in cluster now - and will wait for your testing results

Re: TCP/UDP ports stop responding

Posted: Thu Aug 11, 2016 9:13 am
by rkennedy
We will get back to you early next week about this. Let us know if anything weird happens in the mean time.