Long reboots on Ubuntu after installing NLS
Long reboots on Ubuntu after installing NLS
After a fresh install of NLS on Ubuntu 20 LTS, whenever we need to shutdown/reboot the machine the wait time is very long.
Every single shutdown/reboot gets stuck on A stop job is running for LSB: Logstash until the 5 mins timeout ends, which makes downtimes much longer than necessary.
I've googled a bit and it seems to be a somewhat common issue with Logstash (not responding to service stop request because of some background task is still running), but since Logstash is installed by your install.sh script, I guess something could be adjusted to avoid this (or at least reduce reboot times).
Some people suggest setting a shorter TimeoutStopSec time in /etc/systemd/system/logstash.service, but that file doesn't exist. There are multiple logstash.service files in other dirs and I'm not sure which is the right one.
What do you recommend to fix this?
Every single shutdown/reboot gets stuck on A stop job is running for LSB: Logstash until the 5 mins timeout ends, which makes downtimes much longer than necessary.
I've googled a bit and it seems to be a somewhat common issue with Logstash (not responding to service stop request because of some background task is still running), but since Logstash is installed by your install.sh script, I guess something could be adjusted to avoid this (or at least reduce reboot times).
Some people suggest setting a shorter TimeoutStopSec time in /etc/systemd/system/logstash.service, but that file doesn't exist. There are multiple logstash.service files in other dirs and I'm not sure which is the right one.
What do you recommend to fix this?
Re: Long reboots on Ubuntu after installing NLS
Hello @mlabbepg
Thanks for reaching out about this issue.
I want to find out if you see that the logstash or elasticsearch services are hanging for a particular reason?
And
Please also provide the following:
Thanks,
Perry
Thanks for reaching out about this issue.
I want to find out if you see that the logstash or elasticsearch services are hanging for a particular reason?
Code: Select all
journalctl -u logstash.service
Code: Select all
journalctl -u elasticsearch.service
Code: Select all
ps -ef | grep logstash
Code: Select all
java --version
Thanks,
Perry
Re: Long reboots on Ubuntu after installing NLS
Sorry for the delay, I didn't get notified on your reply.
Here are the requested logs.
journalctl -u logstash.service :
journalctl -u elasticsearch.service :
ps -ef | grep logstash :
java -version :
Here are the requested logs.
journalctl -u logstash.service :
Code: Select all
-- Reboot --
Sep 10 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 10 02:06:57 syslog-mgmt logstash[829]: * Starting Logstash Daemon
Sep 10 02:06:57 syslog-mgmt logstash[829]: /etc/init.d/logstash: invalid arguments
Sep 10 02:06:57 syslog-mgmt logstash[829]: ...done.
Sep 10 02:06:57 syslog-mgmt systemd[1]: Started LSB: Logstash.
Sep 23 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Sep 23 02:00:01 syslog-mgmt logstash[3045427]: * Stopping Logstash Daemon
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Sep 23 02:05:01 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Sep 23 02:06:59 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 23 02:07:00 syslog-mgmt logstash[839]: * Starting Logstash Daemon
Sep 23 02:07:00 syslog-mgmt logstash[839]: /etc/init.d/logstash: invalid arguments
Sep 23 02:07:00 syslog-mgmt logstash[839]: ...done.
Sep 23 02:07:00 syslog-mgmt systemd[1]: Started LSB: Logstash.
Sep 29 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Sep 29 02:00:00 syslog-mgmt logstash[1442046]: * Stopping Logstash Daemon
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Sep 29 02:05:01 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Sep 29 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 29 02:06:56 syslog-mgmt logstash[840]: * Starting Logstash Daemon
Sep 29 02:06:56 syslog-mgmt logstash[840]: /etc/init.d/logstash: invalid arguments
Sep 29 02:06:56 syslog-mgmt logstash[840]: ...done.
Sep 29 02:06:56 syslog-mgmt systemd[1]: Started LSB: Logstash.
Oct 01 10:48:19 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Oct 01 10:48:19 syslog-mgmt logstash[588184]: * Stopping Logstash Daemon
Oct 01 10:48:26 syslog-mgmt logstash[588184]: ...done.
Oct 01 10:48:26 syslog-mgmt systemd[1]: logstash.service: Succeeded.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Oct 01 10:48:26 syslog-mgmt logstash[588211]: * Starting Logstash Daemon
Oct 01 10:48:26 syslog-mgmt logstash[588211]: /etc/init.d/logstash: invalid arguments
Oct 01 10:48:26 syslog-mgmt logstash[588211]: ...done.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Started LSB: Logstash.
Oct 06 15:59:22 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Oct 06 15:59:22 syslog-mgmt logstash[1804807]: * Stopping Logstash Daemon
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Oct 06 16:04:22 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Oct 06 16:06:16 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Oct 06 16:06:16 syslog-mgmt logstash[862]: * Starting Logstash Daemon
Oct 06 16:06:16 syslog-mgmt logstash[862]: /etc/init.d/logstash: invalid arguments
Oct 06 16:06:16 syslog-mgmt logstash[862]: ...done.
Oct 06 16:06:16 syslog-mgmt systemd[1]: Started LSB: Logstash.
Code: Select all
-- Reboot --
Sep 23 02:06:59 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 23 02:07:00 syslog-mgmt elasticsearch[834]: * Starting Elasticsearch Server
Sep 23 02:07:00 syslog-mgmt elasticsearch[834]: ...done.
Sep 23 02:07:00 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Sep 28 10:46:04 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Sep 28 10:46:04 syslog-mgmt elasticsearch[1294617]: * Stopping Elasticsearch Server
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294617]: ...done.
Sep 28 10:46:05 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294639]: * Starting Elasticsearch Server
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294639]: ...done.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Sep 29 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Sep 29 02:00:00 syslog-mgmt elasticsearch[1442038]: * Stopping Elasticsearch Server
Sep 29 02:00:01 syslog-mgmt elasticsearch[1442038]: ...done.
Sep 29 02:00:01 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Sep 29 02:00:01 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
-- Reboot --
Sep 29 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 29 02:06:56 syslog-mgmt elasticsearch[836]: * Starting Elasticsearch Server
Sep 29 02:06:56 syslog-mgmt elasticsearch[836]: ...done.
Sep 29 02:06:56 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Oct 06 15:59:22 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Oct 06 15:59:22 syslog-mgmt elasticsearch[1804797]: * Stopping Elasticsearch Server
Oct 06 15:59:23 syslog-mgmt elasticsearch[1804797]: ...done.
Oct 06 15:59:23 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Oct 06 15:59:23 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
-- Reboot --
Oct 06 16:06:16 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Oct 06 16:06:16 syslog-mgmt elasticsearch[856]: * Starting Elasticsearch Server
Oct 06 16:06:16 syslog-mgmt elasticsearch[856]: ...done.
Oct 06 16:06:16 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Code: Select all
root 1081 1 17 16:06 ? 00:00:53 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/nagioslogserver/logstash/heapdump.hprof -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
pgadmin 2589 1789 0 16:11 pts/0 00:00:00 grep --color=auto logstash
Code: Select all
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
Re: Long reboots on Ubuntu after installing NLS
Hello @mlabbepg
Thanks for following up, see what you mean the logstash is hanging for around 5 minutes. Want to see if we can get more info on the logstash service restart.
Please review and send the results.txt
Perry
Thanks for following up, see what you mean the logstash is hanging for around 5 minutes. Want to see if we can get more info on the logstash service restart.
Code: Select all
systemctl restart logstash && journalctl -fexu logstash.service > /tmp/results.txt &
Perry
Re: Long reboots on Ubuntu after installing NLS
Here is the log file.
Thanks.
Thanks.
You do not have the required permissions to view the files attached to this post.
Re: Long reboots on Ubuntu after installing NLS
Hello @mlabbepg
Thanks for the results.txt, which did not provide more details than we would have expected.
We know that during the following timestamped time range Logstash was hanging and want to know what the logstash.log is reporting:
Thanks,
Perry
Thanks for the results.txt, which did not provide more details than we would have expected.
We know that during the following timestamped time range Logstash was hanging and want to know what the logstash.log is reporting:
Please let us know what you find in '/var/log/logstash/logstash.log' and '/var/log/elasticsearch/nagios_elasticsearch.log/log/elasticsearch/nagios_elasticsearch.log' from 02:00:00 to 02:05:00 on September 10th.Sep 10 02:00:00 syslog-mgmt logstash[1797412]: * Stopping Logstash Daemon
Sep 10 02:05:00 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Thanks,
Perry
Re: Long reboots on Ubuntu after installing NLS
Hi,
Here is the requested logstash logfile.
There is no /var/log/elasticsearch/nagios_elasticsearch.log file/folder (looks like you did a cut & paste error too).
sudo find / -name nagios_elasticsearch.log found nothing either.
I believe the logfile on our system is named /var/log/elasticsearch/8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log instead.
Elasticsearch log file from september 10th is gone already.
I'm adding both logs for the Oct 07 (09:17:57 to 09:22:57) timeout event to have a complete trace.
Thanks.
Here is the requested logstash logfile.
There is no /var/log/elasticsearch/nagios_elasticsearch.log file/folder (looks like you did a cut & paste error too).
sudo find / -name nagios_elasticsearch.log found nothing either.
I believe the logfile on our system is named /var/log/elasticsearch/8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log instead.
Code: Select all
sudo ls -l /var/log/elasticsearch
total 144
-rw-r--r-- 1 nagios nagios 0 May 15 11:45 8f463c53-c40a-4dcd-8e1d-87f3e6de8421_index_indexing_slowlog.log
-rw-r--r-- 1 nagios nagios 0 May 15 11:45 8f463c53-c40a-4dcd-8e1d-87f3e6de8421_index_search_slowlog.log
-rw-r--r-- 1 nagios nagios 13841 Oct 8 09:27 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log
-rw-r--r-- 1 nagios nagios 5766 Oct 7 20:00 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.1.gz
-rw-r--r-- 1 nagios nagios 18764 Oct 6 20:00 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.2.gz
-rw-r--r-- 1 nagios nagios 17570 Oct 5 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.3.gz
-rw-r--r-- 1 nagios nagios 16505 Oct 4 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.4.gz
-rw-r--r-- 1 nagios nagios 16498 Oct 3 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.5.gz
-rw-r--r-- 1 nagios nagios 17398 Oct 2 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.6.gz
-rw-r--r-- 1 nagios nagios 17711 Oct 1 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.7.gz
I'm adding both logs for the Oct 07 (09:17:57 to 09:22:57) timeout event to have a complete trace.
Thanks.
You do not have the required permissions to view the files attached to this post.
Re: Long reboots on Ubuntu after installing NLS
Hello @mlabbepg
Thanks for following up with the 'elasticsearch' and 'logstash' logs.
We see from the logstash messages stating;
Sample output:
Sample output:
It will generate a file called system-profile.tar.gz in /tmp. If the file is too big to send, please use the split command to chunk it into smaller pieces by:
Send the 'splitupa, splitupb, and etc... in separate [PM] Private Messages.
Thanks,
Perry
Thanks for following up with the 'elasticsearch' and 'logstash' logs.
We see from the logstash messages stating;
; around the sametime elasticsearch statesmessage=>"Attempted to send a bulk request to Elasticsearch configured at '[\"http://localhost:9200\"]', but Elasticsearch appears to be unreachable or down!"
Let's find out about your 'snapshots':[WARN ][snapshots.......failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException:...... Failed to perform snapshot (index files)
Code: Select all
curl -X GET "localhost:9200/_snapshot/_all?pretty=true"
Let's run:{
"nameofyourrepository" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/your/store/logs/for/repositories/"
}
Code: Select all
curl -X PUT "localhost:9200/_snapshot/<nameofyourrepository>/snapshot_test?wait_for_completion=true&pretty"
Let me know the results and follow up with a copy of the Nagios Log Server System Profile:"snapshot" : {
"snapshot" : "snapshot_test",
"version_id" : 1070699,
"version" : "1.7.6",
"indices" : [ "kibana-int", "logstash-2021.10.09", "logstash-2021.10.06", "logstash-2021.10.10", "logstash-2021.10.05", "logstash-2021.10.07", "nagioslogserver", "logstash-2021.09.30", "logstash-2021.10.11", "logstash-2021.10.02", "logstash-2021.10.04", "logstash-2021.10.01", "nagioslogserver_log", "logstash-2021.10.08", "logstash-2021.10.03", "nagioslogserver_history" ],
"state" : "SUCCESS",
"start_time" : "2021-10-11T16:19:23.820Z",
"start_time_in_millis" : 1633969163820,
"end_time" : "2021-10-11T16:19:28.218Z",
"end_time_in_millis" : 1633969168218,
"duration_in_millis" : 4398,
"failures" : [ ],
"shards" : {
"total" : 72,
"failed" : 0,
"successful" : 72
Code: Select all
/usr/local/nagioslogserver/scripts/profile.sh
Code: Select all
split -b 50M thenameofthesystemprofiletargzfilehere splitup
Thanks,
Perry
Re: Long reboots on Ubuntu after installing NLS
Here's the result:
-EDIT-
copy of the Nagios Log Server System Profile sent via PM.
Thanks.
Code: Select all
curl -X PUT "localhost:9200/_snapshot/sdb/snapshot_test?wait_for_completion=true&pretty"
{
"snapshot" : {
"snapshot" : "snapshot_test",
"version_id" : 1070699,
"version" : "1.7.6",
"indices" : [ "logstash-2021.10.09", "logstash-2021.10.08", "logstash-2021.09.22", "logstash-2021.10.05", "logstash-2021.09.13", "logstash-2021.09.25", "logstash-2021.10.07", "logstash-2021.10.01", "logstash-2021.09.19", "logstash-2021.09.17", "logstash-2021.10.03", "logstash-2021.09.24", "logstash-2021.09.14", "logstash-2021.09.18", "kibana-int", "logstash-2021.09.16", "nagioslogserver", "logstash-2021.10.12", "logstash-2021.09.20", "logstash-2021.10.10", "logstash-2021.10.02", "logstash-2021.09.21", "logstash-2021.09.26", "logstash-2021.09.29", "logstash-2021.09.23", "logstash-2021.10.11", "logstash-2021.09.15", "logstash-2021.10.04", "logstash-2021.09.30", "logstash-2021.10.06", "logstash-2021.09.27", "nagioslogserver_log", "logstash-2021.09.28" ],
"state" : "SUCCESS",
"start_time" : "2021-10-12T15:35:05.511Z",
"start_time_in_millis" : 1634052905511,
"end_time" : "2021-10-12T15:35:12.634Z",
"end_time_in_millis" : 1634052912634,
"duration_in_millis" : 7123,
"failures" : [ ],
"shards" : {
"total" : 161,
"failed" : 0,
"successful" : 161
}
}
}
copy of the Nagios Log Server System Profile sent via PM.
Thanks.
Re: Long reboots on Ubuntu after installing NLS
@mlabbepg,
Thanks for following up with the Profile, after review we see that the instance is running out of resources while going through and stopping services. Let's have you increase the RAM mem and the http content length setting in the:
Perry
Thanks for following up with the Profile, after review we see that the instance is running out of resources while going through and stopping services. Let's have you increase the RAM mem and the http content length setting in the:
Open the file with a text editor, and find the following setting:/usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml
Uncomment the line and change it to 500:# http.max_content_length: 100mb
Save the file and run the following commands:http.max_content_length: 500mb
Thanks,service elasticsearch restart
service logstash restart
service httpd restart
Perry