Page 1 of 2

Log collection just stops after a period of time

Posted: Fri Nov 15, 2024 12:19 am
by elpakko
I recently migrated my NLS from Ubuntu 20.04 LTS to Ubuntu 24.04 LTS. Migration went ok and no worries there. However now after the migration I'm facing a problem where the log server just stops collecting logs. After reboot it starts to work again, but then after a random period of time it stops again. Any ideas where to start looking for the solution?

I attached a screenshot where this behaviour can be seen.

Re: Log collection just stops after a period of time

Posted: Fri Nov 15, 2024 12:37 am
by kg2857
Check the status of logstash when the problem happens, then restart logstash.

Re: Log collection just stops after a period of time

Posted: Fri Nov 15, 2024 12:39 am
by elpakko
I can see the following in the syslog:

rsyslogd: cannot connect to localhost:5544: Connection refused

Re: Log collection just stops after a period of time

Posted: Mon Nov 18, 2024 1:05 am
by elpakko
kg2857 wrote: Fri Nov 15, 2024 12:37 am Check the status of logstash when the problem happens, then restart logstash.
Logstash is running when the problem occurs. Restarting the service will help, but it will stop collecting logs again after a few days.

Re: Log collection just stops after a period of time

Posted: Tue Nov 19, 2024 1:18 pm
by jsimon
I am wondering if the issue isn't that something else is running on the same port. It seems like that has been the culprit for other Log Server users who have reported similar issues in the past. You could try running the following, when you see that logs have stopped collecting:

Code: Select all

netstat -ltnp | grep -w ':5544'
You may need to install net-tools first if that isn't already on this server, but it is installed with Log Server so it should be present.

Re: Log collection just stops after a period of time

Posted: Wed Nov 20, 2024 2:36 pm
by jmichaelson
It would also be worth checking the logstash logs in /usr/local/nagioslogserver/logstash/logs to see if they contain anything relevant around the time from logs stopped being collected. It may also be worth checking the system logs to see if something like an out of memory killer has terminated the opensearch service around that time.

Re: Log collection just stops after a period of time

Posted: Mon Nov 25, 2024 1:01 am
by elpakko
It just stops listening the port 5544. "Netstat -tulpn | grep 5544" gives an empty output

I can see the following in the logs at the time when the logs stop collecting:

{:timestamp=>"2024-11-24T04:45:47.613000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:5544", :exception=>#<SocketError: problem when accepting>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:174:in `accept'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:155:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}

Also the log is filled with the following:

{:timestamp=>"2024-11-22T21:59:58.549000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:514", :exception=>#<SocketError: initialize: name or service not known>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:126:in `initialize'", "org/jruby/RubyIO.java:871:in `new'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:152:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}

And then in the syslog:

2024-11-24T06:45:56.529917+02:00 logsrv-24 rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This often happens whe
n the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if
configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
2024-11-24T06:45:56.530036+02:00 logsrv-24 rsyslogd[1037]: rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This o
ften happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open t
he connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.
com/e/2027 ]

Any other logs to check?

Re: Log collection just stops after a period of time

Posted: Mon Nov 25, 2024 9:59 pm
by kg2857
You may just want to set a cron to restart logstash at midnight and move on. If you want to get clever you might create a script that checks if the port is open and restart logstash. Same with elasticsearch. When I set the crons, my troubles disappeared.
Both are java which has always been a disaster.

Re: Log collection just stops after a period of time

Posted: Tue Nov 26, 2024 10:36 am
by jsimon
I think the next step is to try tuning Logstash a little bit, as this may help with it crashing.

Try modifying this config file: (usually I recommend copying the config file before changing it in case anything goes wrong)

Code: Select all

/etc/init.d/logstash
Change the following config options to the listed values:

Code: Select all

LS_HEAP_SIZE="1000m"
LS_OPEN_FILES=65535
Restart Logstash:

Code: Select all

systemctl daemon-reload
systemctl restart logstash.service
If this does not resolve the issue, you could try @kg2857's approach of setting a cron job to restart Logstash daily. You can also send in a system profile which will help with further diagnosis, either to me on this forum via DM, or by opening a case with our Support department:
https://answerhub.nagios.com/support/login

Re: Log collection just stops after a period of time

Posted: Mon Dec 02, 2024 12:18 pm
by jsimon
Hi @elpakko,

I appreciate you sending in your system profile. It looks like your syslog listener is set up to listen on port 514, which requires additional configuration steps to function. I would suggest taking a look at this documentation and ensuring you have Logstash set up to run as the root user rather than the nagios user for this port.

It does seem strange that the listener dies at this regular interval. I'm wondering if it is listening at all -- possibly it fills up a system specific log location for a week and then dies when it hits a limit and requires a restart. If you filter your incoming log data for 0.0.0.0, are you seeing logs coming from the NLS system itself or only the external inputs you are configured to listen for?