Log collection just stops after a period of time

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
elpakko
Posts: 14
Joined: Fri Mar 22, 2019 1:07 am

Log collection just stops after a period of time

Post by elpakko »

I recently migrated my NLS from Ubuntu 20.04 LTS to Ubuntu 24.04 LTS. Migration went ok and no worries there. However now after the migration I'm facing a problem where the log server just stops collecting logs. After reboot it starts to work again, but then after a random period of time it stops again. Any ideas where to start looking for the solution?

I attached a screenshot where this behaviour can be seen.
You do not have the required permissions to view the files attached to this post.
kg2857
Posts: 490
Joined: Wed Apr 12, 2023 5:48 pm

Re: Log collection just stops after a period of time

Post by kg2857 »

Check the status of logstash when the problem happens, then restart logstash.
elpakko
Posts: 14
Joined: Fri Mar 22, 2019 1:07 am

Re: Log collection just stops after a period of time

Post by elpakko »

I can see the following in the syslog:

rsyslogd: cannot connect to localhost:5544: Connection refused
elpakko
Posts: 14
Joined: Fri Mar 22, 2019 1:07 am

Re: Log collection just stops after a period of time

Post by elpakko »

kg2857 wrote: Fri Nov 15, 2024 12:37 am Check the status of logstash when the problem happens, then restart logstash.
Logstash is running when the problem occurs. Restarting the service will help, but it will stop collecting logs again after a few days.
jsimon
Posts: 339
Joined: Wed Aug 23, 2023 11:27 am

Re: Log collection just stops after a period of time

Post by jsimon »

I am wondering if the issue isn't that something else is running on the same port. It seems like that has been the culprit for other Log Server users who have reported similar issues in the past. You could try running the following, when you see that logs have stopped collecting:

Code: Select all

netstat -ltnp | grep -w ':5544'
You may need to install net-tools first if that isn't already on this server, but it is installed with Log Server so it should be present.
User avatar
jmichaelson
Posts: 375
Joined: Wed Aug 23, 2023 1:02 pm

Re: Log collection just stops after a period of time

Post by jmichaelson »

It would also be worth checking the logstash logs in /usr/local/nagioslogserver/logstash/logs to see if they contain anything relevant around the time from logs stopped being collected. It may also be worth checking the system logs to see if something like an out of memory killer has terminated the opensearch service around that time.
Please let us know if you have any other questions or concerns.

-Jason
elpakko
Posts: 14
Joined: Fri Mar 22, 2019 1:07 am

Re: Log collection just stops after a period of time

Post by elpakko »

It just stops listening the port 5544. "Netstat -tulpn | grep 5544" gives an empty output

I can see the following in the logs at the time when the logs stop collecting:

{:timestamp=>"2024-11-24T04:45:47.613000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:5544", :exception=>#<SocketError: problem when accepting>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:174:in `accept'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:155:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}

Also the log is filled with the following:

{:timestamp=>"2024-11-22T21:59:58.549000+0000", :message=>"syslog listener died", :protocol=>:tcp, :address=>"0.0.0.0:514", :exception=>#<SocketError: initialize: name or service not known>, :backtrace=>["org/jruby/ext/socket/RubyTCPServer.java:126:in `initialize'", "org/jruby/RubyIO.java:871:in `new'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:152:in `tcp_listener'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:117:in `server'", "/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-syslog-2.0.5/lib/logstash/inputs/syslog.rb:101:in `run'"], :level=>:warn}

And then in the syslog:

2024-11-24T06:45:56.529917+02:00 logsrv-24 rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This often happens whe
n the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if
configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.com/e/2027 ]
2024-11-24T06:45:56.530036+02:00 logsrv-24 rsyslogd[1037]: rsyslogd: omfwd: remote server at localhost:5544 seems to have closed connection. This o
ften happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open t
he connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2312.0 try https://www.rsyslog.
com/e/2027 ]

Any other logs to check?
kg2857
Posts: 490
Joined: Wed Apr 12, 2023 5:48 pm

Re: Log collection just stops after a period of time

Post by kg2857 »

You may just want to set a cron to restart logstash at midnight and move on. If you want to get clever you might create a script that checks if the port is open and restart logstash. Same with elasticsearch. When I set the crons, my troubles disappeared.
Both are java which has always been a disaster.
jsimon
Posts: 339
Joined: Wed Aug 23, 2023 11:27 am

Re: Log collection just stops after a period of time

Post by jsimon »

I think the next step is to try tuning Logstash a little bit, as this may help with it crashing.

Try modifying this config file: (usually I recommend copying the config file before changing it in case anything goes wrong)

Code: Select all

/etc/init.d/logstash
Change the following config options to the listed values:

Code: Select all

LS_HEAP_SIZE="1000m"
LS_OPEN_FILES=65535
Restart Logstash:

Code: Select all

systemctl daemon-reload
systemctl restart logstash.service
If this does not resolve the issue, you could try @kg2857's approach of setting a cron job to restart Logstash daily. You can also send in a system profile which will help with further diagnosis, either to me on this forum via DM, or by opening a case with our Support department:
https://answerhub.nagios.com/support/login
jsimon
Posts: 339
Joined: Wed Aug 23, 2023 11:27 am

Re: Log collection just stops after a period of time

Post by jsimon »

Hi @elpakko,

I appreciate you sending in your system profile. It looks like your syslog listener is set up to listen on port 514, which requires additional configuration steps to function. I would suggest taking a look at this documentation and ensuring you have Logstash set up to run as the root user rather than the nagios user for this port.

It does seem strange that the listener dies at this regular interval. I'm wondering if it is listening at all -- possibly it fills up a system specific log location for a week and then dies when it hits a limit and requires a restart. If you filter your incoming log data for 0.0.0.0, are you seeing logs coming from the NLS system itself or only the external inputs you are configured to listen for?
Post Reply