Log collection just stops after a period of time
Log collection just stops after a period of time
I recently migrated my NLS from Ubuntu 20.04 LTS to Ubuntu 24.04 LTS. Migration went ok and no worries there. However now after the migration I'm facing a problem where the log server just stops collecting logs. After reboot it starts to work again, but then after a random period of time it stops again. Any ideas where to start looking for the solution?
I attached a screenshot where this behaviour can be seen.
I attached a screenshot where this behaviour can be seen.
You do not have the required permissions to view the files attached to this post.
Re: Log collection just stops after a period of time
Check the status of logstash when the problem happens, then restart logstash.
Re: Log collection just stops after a period of time
I can see the following in the syslog:
rsyslogd: cannot connect to localhost:5544: Connection refused
rsyslogd: cannot connect to localhost:5544: Connection refused
Re: Log collection just stops after a period of time
I am wondering if the issue isn't that something else is running on the same port. It seems like that has been the culprit for other Log Server users who have reported similar issues in the past. You could try running the following, when you see that logs have stopped collecting:
You may need to install net-tools first if that isn't already on this server, but it is installed with Log Server so it should be present.
Code: Select all
netstat -ltnp | grep -w ':5544'- jmichaelson
- Posts: 375
- Joined: Wed Aug 23, 2023 1:02 pm
Re: Log collection just stops after a period of time
It would also be worth checking the logstash logs in /usr/local/nagioslogserver/logstash/logs to see if they contain anything relevant around the time from logs stopped being collected. It may also be worth checking the system logs to see if something like an out of memory killer has terminated the opensearch service around that time.
Please let us know if you have any other questions or concerns.
-Jason
-Jason
Re: Log collection just stops after a period of time
It just stops listening the port 5544. "Netstat -tulpn | grep 5544" gives an empty output
I can see the following in the logs at the time when the logs stop collecting:
{:timestamp=>"2024-11-24T04:45:47.613000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:5544", :exception=>#<SocketError: problem when accepting>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:174:in `accept'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:155:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}
Also the log is filled with the following:
{:timestamp=>"2024-11-22T21:59:58.549000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:514", :exception=>#<SocketError: initialize: name or service not known>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:126:in `initialize'", "org/jruby/RubyIO.java:871:in `new'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:152:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}
And then in the syslog:
2024-11-24T06:45:56.529917+02:00 logsrv-24 rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This often happens whe
n the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if
configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
2024-11-24T06:45:56.530036+02:00 logsrv-24 rsyslogd[1037]: rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This o
ften happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open t
he connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.
com/e/2027 ]
Any other logs to check?
I can see the following in the logs at the time when the logs stop collecting:
{:timestamp=>"2024-11-24T04:45:47.613000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:5544", :exception=>#<SocketError: problem when accepting>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:174:in `accept'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:155:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}
Also the log is filled with the following:
{:timestamp=>"2024-11-22T21:59:58.549000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:514", :exception=>#<SocketError: initialize: name or service not known>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:126:in `initialize'", "org/jruby/RubyIO.java:871:in `new'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:152:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}
And then in the syslog:
2024-11-24T06:45:56.529917+02:00 logsrv-24 rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This often happens whe
n the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if
configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
2024-11-24T06:45:56.530036+02:00 logsrv-24 rsyslogd[1037]: rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This o
ften happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open t
he connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.
com/e/2027 ]
Any other logs to check?
Re: Log collection just stops after a period of time
You may just want to set a cron to restart logstash at midnight and move on. If you want to get clever you might create a script that checks if the port is open and restart logstash. Same with elasticsearch. When I set the crons, my troubles disappeared.
Both are java which has always been a disaster.
Both are java which has always been a disaster.
Re: Log collection just stops after a period of time
I think the next step is to try tuning Logstash a little bit, as this may help with it crashing.
Try modifying this config file: (usually I recommend copying the config file before changing it in case anything goes wrong)
Change the following config options to the listed values:
Restart Logstash:
If this does not resolve the issue, you could try @kg2857's approach of setting a cron job to restart Logstash daily. You can also send in a system profile which will help with further diagnosis, either to me on this forum via DM, or by opening a case with our Support department:
https://answerhub.nagios.com/support/login
Try modifying this config file: (usually I recommend copying the config file before changing it in case anything goes wrong)
Code: Select all
/etc/init.d/logstashCode: Select all
LS_HEAP_SIZE="1000m"
LS_OPEN_FILES=65535Code: Select all
systemctl daemon-reload
systemctl restart logstash.servicehttps://answerhub.nagios.com/support/login
Re: Log collection just stops after a period of time
Hi @elpakko,
I appreciate you sending in your system profile. It looks like your syslog listener is set up to listen on port 514, which requires additional configuration steps to function. I would suggest taking a look at this documentation and ensuring you have Logstash set up to run as the root user rather than the nagios user for this port.
It does seem strange that the listener dies at this regular interval. I'm wondering if it is listening at all -- possibly it fills up a system specific log location for a week and then dies when it hits a limit and requires a restart. If you filter your incoming log data for 0.0.0.0, are you seeing logs coming from the NLS system itself or only the external inputs you are configured to listen for?
I appreciate you sending in your system profile. It looks like your syslog listener is set up to listen on port 514, which requires additional configuration steps to function. I would suggest taking a look at this documentation and ensuring you have Logstash set up to run as the root user rather than the nagios user for this port.
It does seem strange that the listener dies at this regular interval. I'm wondering if it is listening at all -- possibly it fills up a system specific log location for a week and then dies when it hits a limit and requires a restart. If you filter your incoming log data for 0.0.0.0, are you seeing logs coming from the NLS system itself or only the external inputs you are configured to listen for?