Page 1 of 2
Long reboots on Ubuntu after installing NLS
Posted: Fri Oct 01, 2021 10:38 am
by mlabbepg
After a fresh install of NLS on Ubuntu 20 LTS, whenever we need to shutdown/reboot the machine the wait time is very long.
Every single shutdown/reboot gets stuck on A stop job is running for LSB: Logstash until the 5 mins timeout ends, which makes downtimes much longer than necessary.
I've googled a bit and it seems to be a somewhat common issue with Logstash (not responding to service stop request because of some background task is still running), but since Logstash is installed by your install.sh script, I guess something could be adjusted to avoid this (or at least reduce reboot times).
Some people suggest setting a shorter TimeoutStopSec time in /etc/systemd/system/logstash.service, but that file doesn't exist. There are multiple logstash.service files in other dirs and I'm not sure which is the right one.
What do you recommend to fix this?
Re: Long reboots on Ubuntu after installing NLS
Posted: Mon Oct 04, 2021 3:00 pm
by pbroste
Hello @mlabbepg
Thanks for reaching out about this issue.
I want to find out if you see that the
logstash or elasticsearch services are hanging for a particular reason?
And
Code: Select all
journalctl -u elasticsearch.service
Please also provide the following:
Thanks,
Perry
Re: Long reboots on Ubuntu after installing NLS
Posted: Wed Oct 06, 2021 3:19 pm
by mlabbepg
Sorry for the delay, I didn't get notified on your reply.
Here are the requested logs.
journalctl -u logstash.service :
Code: Select all
-- Reboot --
Sep 10 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 10 02:06:57 syslog-mgmt logstash[829]: * Starting Logstash Daemon
Sep 10 02:06:57 syslog-mgmt logstash[829]: /etc/init.d/logstash: invalid arguments
Sep 10 02:06:57 syslog-mgmt logstash[829]: ...done.
Sep 10 02:06:57 syslog-mgmt systemd[1]: Started LSB: Logstash.
Sep 23 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Sep 23 02:00:01 syslog-mgmt logstash[3045427]: * Stopping Logstash Daemon
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Sep 23 02:05:01 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Sep 23 02:06:59 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 23 02:07:00 syslog-mgmt logstash[839]: * Starting Logstash Daemon
Sep 23 02:07:00 syslog-mgmt logstash[839]: /etc/init.d/logstash: invalid arguments
Sep 23 02:07:00 syslog-mgmt logstash[839]: ...done.
Sep 23 02:07:00 syslog-mgmt systemd[1]: Started LSB: Logstash.
Sep 29 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Sep 29 02:00:00 syslog-mgmt logstash[1442046]: * Stopping Logstash Daemon
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Sep 29 02:05:01 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Sep 29 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 29 02:06:56 syslog-mgmt logstash[840]: * Starting Logstash Daemon
Sep 29 02:06:56 syslog-mgmt logstash[840]: /etc/init.d/logstash: invalid arguments
Sep 29 02:06:56 syslog-mgmt logstash[840]: ...done.
Sep 29 02:06:56 syslog-mgmt systemd[1]: Started LSB: Logstash.
Oct 01 10:48:19 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Oct 01 10:48:19 syslog-mgmt logstash[588184]: * Stopping Logstash Daemon
Oct 01 10:48:26 syslog-mgmt logstash[588184]: ...done.
Oct 01 10:48:26 syslog-mgmt systemd[1]: logstash.service: Succeeded.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Oct 01 10:48:26 syslog-mgmt logstash[588211]: * Starting Logstash Daemon
Oct 01 10:48:26 syslog-mgmt logstash[588211]: /etc/init.d/logstash: invalid arguments
Oct 01 10:48:26 syslog-mgmt logstash[588211]: ...done.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Started LSB: Logstash.
Oct 06 15:59:22 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Oct 06 15:59:22 syslog-mgmt logstash[1804807]: * Stopping Logstash Daemon
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Oct 06 16:04:22 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Oct 06 16:06:16 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Oct 06 16:06:16 syslog-mgmt logstash[862]: * Starting Logstash Daemon
Oct 06 16:06:16 syslog-mgmt logstash[862]: /etc/init.d/logstash: invalid arguments
Oct 06 16:06:16 syslog-mgmt logstash[862]: ...done.
Oct 06 16:06:16 syslog-mgmt systemd[1]: Started LSB: Logstash.
journalctl -u elasticsearch.service :
Code: Select all
-- Reboot --
Sep 23 02:06:59 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 23 02:07:00 syslog-mgmt elasticsearch[834]: * Starting Elasticsearch Server
Sep 23 02:07:00 syslog-mgmt elasticsearch[834]: ...done.
Sep 23 02:07:00 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Sep 28 10:46:04 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Sep 28 10:46:04 syslog-mgmt elasticsearch[1294617]: * Stopping Elasticsearch Server
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294617]: ...done.
Sep 28 10:46:05 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294639]: * Starting Elasticsearch Server
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294639]: ...done.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Sep 29 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Sep 29 02:00:00 syslog-mgmt elasticsearch[1442038]: * Stopping Elasticsearch Server
Sep 29 02:00:01 syslog-mgmt elasticsearch[1442038]: ...done.
Sep 29 02:00:01 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Sep 29 02:00:01 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
-- Reboot --
Sep 29 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 29 02:06:56 syslog-mgmt elasticsearch[836]: * Starting Elasticsearch Server
Sep 29 02:06:56 syslog-mgmt elasticsearch[836]: ...done.
Sep 29 02:06:56 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Oct 06 15:59:22 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Oct 06 15:59:22 syslog-mgmt elasticsearch[1804797]: * Stopping Elasticsearch Server
Oct 06 15:59:23 syslog-mgmt elasticsearch[1804797]: ...done.
Oct 06 15:59:23 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Oct 06 15:59:23 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
-- Reboot --
Oct 06 16:06:16 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Oct 06 16:06:16 syslog-mgmt elasticsearch[856]: * Starting Elasticsearch Server
Oct 06 16:06:16 syslog-mgmt elasticsearch[856]: ...done.
Oct 06 16:06:16 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
ps -ef | grep logstash :
Code: Select all
root 1081 1 17 16:06 ? 00:00:53 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/nagioslogserver/logstash/heapdump.hprof -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
pgadmin 2589 1789 0 16:11 pts/0 00:00:00 grep --color=auto logstash
java -version :
Code: Select all
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
Re: Long reboots on Ubuntu after installing NLS
Posted: Thu Oct 07, 2021 1:23 pm
by pbroste
Hello @mlabbepg
Thanks for following up, see what you mean the
logstash is hanging for around 5 minutes. Want to see if we can get more info on the
logstash service restart.
Code: Select all
systemctl restart logstash && journalctl -fexu logstash.service > /tmp/results.txt &
Please review and send the results.txt
Perry
Re: Long reboots on Ubuntu after installing NLS
Posted: Thu Oct 07, 2021 2:14 pm
by mlabbepg
Here is the log file.
Thanks.
Re: Long reboots on Ubuntu after installing NLS
Posted: Fri Oct 08, 2021 10:01 am
by pbroste
Hello @mlabbepg
Thanks for the results.txt, which did not provide more details than we would have expected.
We know that during the following timestamped time range
Logstash was hanging and want to know what the
logstash.log is reporting:
Sep 10 02:00:00 syslog-mgmt logstash[1797412]: * Stopping Logstash Daemon
Sep 10 02:05:00 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Please let us know what you find in '/var/log/
logstash/
logstash.log' and '/var/log/elasticsearch/nagios_elasticsearch.log/log/elasticsearch/nagios_elasticsearch.log' from 02:00:00 to 02:05:00 on September 10th.
Thanks,
Perry
Re: Long reboots on Ubuntu after installing NLS
Posted: Fri Oct 08, 2021 11:19 am
by mlabbepg
Hi,
Here is the requested
logstash logfile.
logstash_2021-09-10.zip
There is no
/var/log/elasticsearch/nagios_elasticsearch.log file/folder (looks like you did a cut & paste error too).
sudo find / -name nagios_elasticsearch.log found nothing either.
I believe the logfile on our system is named
/var/log/elasticsearch/8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log instead.
Code: Select all
sudo ls -l /var/log/elasticsearch
total 144
-rw-r--r-- 1 nagios nagios 0 May 15 11:45 8f463c53-c40a-4dcd-8e1d-87f3e6de8421_index_indexing_slowlog.log
-rw-r--r-- 1 nagios nagios 0 May 15 11:45 8f463c53-c40a-4dcd-8e1d-87f3e6de8421_index_search_slowlog.log
-rw-r--r-- 1 nagios nagios 13841 Oct 8 09:27 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log
-rw-r--r-- 1 nagios nagios 5766 Oct 7 20:00 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.1.gz
-rw-r--r-- 1 nagios nagios 18764 Oct 6 20:00 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.2.gz
-rw-r--r-- 1 nagios nagios 17570 Oct 5 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.3.gz
-rw-r--r-- 1 nagios nagios 16505 Oct 4 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.4.gz
-rw-r--r-- 1 nagios nagios 16498 Oct 3 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.5.gz
-rw-r--r-- 1 nagios nagios 17398 Oct 2 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.6.gz
-rw-r--r-- 1 nagios nagios 17711 Oct 1 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.7.gz
Elasticsearch log file from september 10th is gone already.
I'm adding both logs for the Oct 07 (09:17:57 to 09:22:57) timeout event to have a complete trace.
Thanks.
Re: Long reboots on Ubuntu after installing NLS
Posted: Mon Oct 11, 2021 11:37 am
by pbroste
Hello @mlabbepg
Thanks for following up with the 'elasticsearch' and '
logstash' logs.
We see from the
logstash messages stating;
message=>"Attempted to send a bulk request to Elasticsearch configured at '[\"
http://localhost:9200\"]', but Elasticsearch appears to be unreachable or down!"
; around the sametime elasticsearch states
[WARN ][snapshots.......failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException:...... Failed to perform snapshot (index files)
Let's find out about your 'snapshots':
Code: Select all
curl -X GET "localhost:9200/_snapshot/_all?pretty=true"
Sample output:
{
"nameofyourrepository" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/your/store/logs/for/repositories/"
}
Let's run:
Code: Select all
curl -X PUT "localhost:9200/_snapshot/<nameofyourrepository>/snapshot_test?wait_for_completion=true&pretty"
Sample output:
"snapshot" : {
"snapshot" : "snapshot_test",
"version_id" : 1070699,
"version" : "1.7.6",
"indices" : [ "kibana-int", "logstash-2021.10.09", "logstash-2021.10.06", "logstash-2021.10.10", "logstash-2021.10.05", "logstash-2021.10.07", "nagioslogserver", "logstash-2021.09.30", "logstash-2021.10.11", "logstash-2021.10.02", "logstash-2021.10.04", "logstash-2021.10.01", "nagioslogserver_log", "logstash-2021.10.08", "logstash-2021.10.03", "nagioslogserver_history" ],
"state" : "SUCCESS",
"start_time" : "2021-10-11T16:19:23.820Z",
"start_time_in_millis" : 1633969163820,
"end_time" : "2021-10-11T16:19:28.218Z",
"end_time_in_millis" : 1633969168218,
"duration_in_millis" : 4398,
"failures" : [ ],
"shards" : {
"total" : 72,
"failed" : 0,
"successful" : 72
Let me know the results and follow up with a copy of the Nagios Log Server System Profile:
Code: Select all
/usr/local/nagioslogserver/scripts/profile.sh
It will generate a file called system-profile.tar.gz in /tmp. If the file is too big to send, please use the split command to chunk it into smaller pieces by:
Code: Select all
split -b 50M thenameofthesystemprofiletargzfilehere splitup
Send the 'splitupa, splitupb, and etc... in separate [PM] Private Messages.
Thanks,
Perry
Re: Long reboots on Ubuntu after installing NLS
Posted: Tue Oct 12, 2021 10:55 am
by mlabbepg
Here's the result:
Code: Select all
curl -X PUT "localhost:9200/_snapshot/sdb/snapshot_test?wait_for_completion=true&pretty"
{
"snapshot" : {
"snapshot" : "snapshot_test",
"version_id" : 1070699,
"version" : "1.7.6",
"indices" : [ "logstash-2021.10.09", "logstash-2021.10.08", "logstash-2021.09.22", "logstash-2021.10.05", "logstash-2021.09.13", "logstash-2021.09.25", "logstash-2021.10.07", "logstash-2021.10.01", "logstash-2021.09.19", "logstash-2021.09.17", "logstash-2021.10.03", "logstash-2021.09.24", "logstash-2021.09.14", "logstash-2021.09.18", "kibana-int", "logstash-2021.09.16", "nagioslogserver", "logstash-2021.10.12", "logstash-2021.09.20", "logstash-2021.10.10", "logstash-2021.10.02", "logstash-2021.09.21", "logstash-2021.09.26", "logstash-2021.09.29", "logstash-2021.09.23", "logstash-2021.10.11", "logstash-2021.09.15", "logstash-2021.10.04", "logstash-2021.09.30", "logstash-2021.10.06", "logstash-2021.09.27", "nagioslogserver_log", "logstash-2021.09.28" ],
"state" : "SUCCESS",
"start_time" : "2021-10-12T15:35:05.511Z",
"start_time_in_millis" : 1634052905511,
"end_time" : "2021-10-12T15:35:12.634Z",
"end_time_in_millis" : 1634052912634,
"duration_in_millis" : 7123,
"failures" : [ ],
"shards" : {
"total" : 161,
"failed" : 0,
"successful" : 161
}
}
}
-EDIT-
copy of the Nagios Log Server System Profile sent via PM.
Thanks.
Re: Long reboots on Ubuntu after installing NLS
Posted: Wed Oct 13, 2021 9:47 am
by pbroste
@mlabbepg,
Thanks for following up with the Profile, after review we see that the instance is running out of resources while going through and stopping services. Let's have you increase the RAM mem and the http content length setting in the:
/usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml
Open the file with a text editor, and find the following setting:
# http.max_content_length: 100mb
Uncomment the line and change it to 500:
http.max_content_length: 500mb
Save the file and run the following commands:
service elasticsearch restart
service logstash restart
service httpd restart
Thanks,
Perry