TCP/UDP ports stop responding

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: TCP/UDP ports stop responding

Post by rkennedy »

Code: Select all

[root@localhost tmp]# cat output.txt | grep logstash | wc -l
340
[root@localhost tmp]# cat output2.txt | grep logstash | wc -l
350

[root@localhost tmp]# cat output.txt | wc -l
4136
[root@localhost tmp]# cat output2.txt | wc -l
6924

[root@localhost tmp]# cat output.txt | grep 'java' | wc -l
1821
[root@localhost tmp]# cat output2.txt | grep 'java' | wc -l
3158
It looks like the amount of connections hasn't increased much which is a good sign. I believe this is fine. If this number continues to increase until you hit the 65k cap though, then we'll need to figure out what isn't closing properly. The true test will be time though. Would you mind posting back tomorrow morning with an update of it so I can start to compare?
Former Nagios Employee
jspink
Posts: 43
Joined: Wed Nov 25, 2015 3:27 pm

Re: TCP/UDP ports stop responding

Post by jspink »

Sure can - i'll be spending time on it anyway - seems a bunch of our servers are crashing nxlog services and we're troubleshooting

Thanks for the assist
Nagios Log Server: 10 Instances - 3,916,302,797 documents last check in 180 shards
jspink
Posts: 43
Joined: Wed Nov 25, 2015 3:27 pm

Re: TCP/UDP ports stop responding

Post by jspink »

Issue just occured - dropped down to less than 50 hosts reporting - this time applying config did not bring it back - ended up having to reboot each node in the cluster.
Attached are lsof from before and after reboot

Output3.txt - before
output3.txt
output4.txt - after
output4.txt
You do not have the required permissions to view the files attached to this post.
Nagios Log Server: 10 Instances - 3,916,302,797 documents last check in 180 shards
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: TCP/UDP ports stop responding

Post by rkennedy »

Could you also attach the logstash.log file(s)?
Former Nagios Employee
jspink
Posts: 43
Joined: Wed Nov 25, 2015 3:27 pm

Re: TCP/UDP ports stop responding

Post by jspink »

rkennedy wrote:Could you also attach the logstash.log file(s)?
logstash.log tail:
logstash.log-20160809.txt
for good measure - another lsof:
output5.txt
You do not have the required permissions to view the files attached to this post.
Nagios Log Server: 10 Instances - 3,916,302,797 documents last check in 180 shards
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: TCP/UDP ports stop responding

Post by rkennedy »

I have a feeling this has to do with logstash not closing out properly based on your latest output.

Code: Select all

[root@localhost tmp]# cat output5.txt | grep 'java' | wc -l
25800
Going to guess that it's getting to 65k, reboot is needed, and repeat. I can do some more testing with a 10 node cluster tomorrow. For reference, when you had 8 nodes, was everything working as expected?
Former Nagios Employee
jspink
Posts: 43
Joined: Wed Nov 25, 2015 3:27 pm

Re: TCP/UDP ports stop responding

Post by jspink »

rkennedy wrote: Going to guess that it's getting to 65k, reboot is needed, and repeat. I can do some more testing with a 10 node cluster tomorrow. For reference, when you had 8 nodes, was everything working as expected?
With 8, and even 9 nodes, we didn't have this issue - it seems once we added the 10th, it started having issues - might it make sense to pull one node while you are doing your testing, to see if it helps the issue?
While you can build 10 nodes, guessing the volume of inbound logs isn't going to be easy to replicate.
Nagios Log Server: 10 Instances - 3,916,302,797 documents last check in 180 shards
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: TCP/UDP ports stop responding

Post by rkennedy »

Yes, please size down to 9 for the time being since that will get you to a stable point.

We are going to do some testing in house, and this should help answer a few questions. While we won't have the same amount of logs you do, it should still be possible to see if files are not closing properly as the number will continue to increase. @mcapra is spinning up the cluster now so we should have more information in a few days.
Former Nagios Employee
jspink
Posts: 43
Joined: Wed Nov 25, 2015 3:27 pm

Re: TCP/UDP ports stop responding

Post by jspink »

9 nodes in cluster now - and will wait for your testing results
Nagios Log Server: 10 Instances - 3,916,302,797 documents last check in 180 shards
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: TCP/UDP ports stop responding

Post by rkennedy »

We will get back to you early next week about this. Let us know if anything weird happens in the mean time.
Former Nagios Employee
Locked