Long reboots on Ubuntu after installing NLS

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
mlabbepg
Posts: 30
Joined: Fri Apr 16, 2021 1:10 pm

Long reboots on Ubuntu after installing NLS

Post by mlabbepg »

After a fresh install of NLS on Ubuntu 20 LTS, whenever we need to shutdown/reboot the machine the wait time is very long.

Every single shutdown/reboot gets stuck on A stop job is running for LSB: Logstash until the 5 mins timeout ends, which makes downtimes much longer than necessary.

I've googled a bit and it seems to be a somewhat common issue with Logstash (not responding to service stop request because of some background task is still running), but since Logstash is installed by your install.sh script, I guess something could be adjusted to avoid this (or at least reduce reboot times).

Some people suggest setting a shorter TimeoutStopSec time in /etc/systemd/system/logstash.service, but that file doesn't exist. There are multiple logstash.service files in other dirs and I'm not sure which is the right one.

What do you recommend to fix this?
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Long reboots on Ubuntu after installing NLS

Post by pbroste »

Hello @mlabbepg

Thanks for reaching out about this issue.

I want to find out if you see that the logstash or elasticsearch services are hanging for a particular reason?

Code: Select all

journalctl -u logstash.service
And

Code: Select all

journalctl -u elasticsearch.service
Please also provide the following:

Code: Select all

ps -ef | grep logstash

Code: Select all

java --version

Thanks,
Perry
mlabbepg
Posts: 30
Joined: Fri Apr 16, 2021 1:10 pm

Re: Long reboots on Ubuntu after installing NLS

Post by mlabbepg »

Sorry for the delay, I didn't get notified on your reply.

Here are the requested logs.


journalctl -u logstash.service :

Code: Select all

-- Reboot --
Sep 10 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 10 02:06:57 syslog-mgmt logstash[829]:  * Starting Logstash Daemon
Sep 10 02:06:57 syslog-mgmt logstash[829]: /etc/init.d/logstash: invalid arguments
Sep 10 02:06:57 syslog-mgmt logstash[829]:    ...done.
Sep 10 02:06:57 syslog-mgmt systemd[1]: Started LSB: Logstash.
Sep 23 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Sep 23 02:00:01 syslog-mgmt logstash[3045427]:  * Stopping Logstash Daemon
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Sep 23 02:05:01 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Sep 23 02:05:01 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Sep 23 02:06:59 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 23 02:07:00 syslog-mgmt logstash[839]:  * Starting Logstash Daemon
Sep 23 02:07:00 syslog-mgmt logstash[839]: /etc/init.d/logstash: invalid arguments
Sep 23 02:07:00 syslog-mgmt logstash[839]:    ...done.
Sep 23 02:07:00 syslog-mgmt systemd[1]: Started LSB: Logstash.
Sep 29 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Sep 29 02:00:00 syslog-mgmt logstash[1442046]:  * Stopping Logstash Daemon
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Sep 29 02:05:01 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Sep 29 02:05:01 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Sep 29 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Sep 29 02:06:56 syslog-mgmt logstash[840]:  * Starting Logstash Daemon
Sep 29 02:06:56 syslog-mgmt logstash[840]: /etc/init.d/logstash: invalid arguments
Sep 29 02:06:56 syslog-mgmt logstash[840]:    ...done.
Sep 29 02:06:56 syslog-mgmt systemd[1]: Started LSB: Logstash.
Oct 01 10:48:19 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Oct 01 10:48:19 syslog-mgmt logstash[588184]:  * Stopping Logstash Daemon
Oct 01 10:48:26 syslog-mgmt logstash[588184]:    ...done.
Oct 01 10:48:26 syslog-mgmt systemd[1]: logstash.service: Succeeded.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Oct 01 10:48:26 syslog-mgmt logstash[588211]:  * Starting Logstash Daemon
Oct 01 10:48:26 syslog-mgmt logstash[588211]: /etc/init.d/logstash: invalid arguments
Oct 01 10:48:26 syslog-mgmt logstash[588211]:    ...done.
Oct 01 10:48:26 syslog-mgmt systemd[1]: Started LSB: Logstash.
Oct 06 15:59:22 syslog-mgmt systemd[1]: Stopping LSB: Logstash...
Oct 06 15:59:22 syslog-mgmt logstash[1804807]:  * Stopping Logstash Daemon
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Control process exited, code=killed, status=15/TERM
Oct 06 16:04:22 syslog-mgmt systemd[1]: logstash.service: Failed with result 'timeout'.
Oct 06 16:04:22 syslog-mgmt systemd[1]: Stopped LSB: Logstash.
-- Reboot --
Oct 06 16:06:16 syslog-mgmt systemd[1]: Starting LSB: Logstash...
Oct 06 16:06:16 syslog-mgmt logstash[862]:  * Starting Logstash Daemon
Oct 06 16:06:16 syslog-mgmt logstash[862]: /etc/init.d/logstash: invalid arguments
Oct 06 16:06:16 syslog-mgmt logstash[862]:    ...done.
Oct 06 16:06:16 syslog-mgmt systemd[1]: Started LSB: Logstash.
journalctl -u elasticsearch.service :

Code: Select all

-- Reboot --
Sep 23 02:06:59 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 23 02:07:00 syslog-mgmt elasticsearch[834]:  * Starting Elasticsearch Server
Sep 23 02:07:00 syslog-mgmt elasticsearch[834]:    ...done.
Sep 23 02:07:00 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Sep 28 10:46:04 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Sep 28 10:46:04 syslog-mgmt elasticsearch[1294617]:  * Stopping Elasticsearch Server
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294617]:    ...done.
Sep 28 10:46:05 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294639]:  * Starting Elasticsearch Server
Sep 28 10:46:05 syslog-mgmt elasticsearch[1294639]:    ...done.
Sep 28 10:46:05 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Sep 29 02:00:00 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Sep 29 02:00:00 syslog-mgmt elasticsearch[1442038]:  * Stopping Elasticsearch Server
Sep 29 02:00:01 syslog-mgmt elasticsearch[1442038]:    ...done.
Sep 29 02:00:01 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Sep 29 02:00:01 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
-- Reboot --
Sep 29 02:06:56 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Sep 29 02:06:56 syslog-mgmt elasticsearch[836]:  * Starting Elasticsearch Server
Sep 29 02:06:56 syslog-mgmt elasticsearch[836]:    ...done.
Sep 29 02:06:56 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
Oct 06 15:59:22 syslog-mgmt systemd[1]: Stopping LSB: Starts elasticsearch...
Oct 06 15:59:22 syslog-mgmt elasticsearch[1804797]:  * Stopping Elasticsearch Server
Oct 06 15:59:23 syslog-mgmt elasticsearch[1804797]:    ...done.
Oct 06 15:59:23 syslog-mgmt systemd[1]: elasticsearch.service: Succeeded.
Oct 06 15:59:23 syslog-mgmt systemd[1]: Stopped LSB: Starts elasticsearch.
-- Reboot --
Oct 06 16:06:16 syslog-mgmt systemd[1]: Starting LSB: Starts elasticsearch...
Oct 06 16:06:16 syslog-mgmt elasticsearch[856]:  * Starting Elasticsearch Server
Oct 06 16:06:16 syslog-mgmt elasticsearch[856]:    ...done.
Oct 06 16:06:16 syslog-mgmt systemd[1]: Started LSB: Starts elasticsearch.
ps -ef | grep logstash :

Code: Select all

root        1081       1 17 16:06 ?        00:00:53 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/nagioslogserver/logstash/heapdump.hprof -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
pgadmin     2589    1789  0 16:11 pts/0    00:00:00 grep --color=auto logstash
java -version :

Code: Select all

openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Long reboots on Ubuntu after installing NLS

Post by pbroste »

Hello @mlabbepg

Thanks for following up, see what you mean the logstash is hanging for around 5 minutes. Want to see if we can get more info on the logstash service restart.

Code: Select all

systemctl restart logstash && journalctl -fexu logstash.service > /tmp/results.txt &
Please review and send the results.txt
Perry
mlabbepg
Posts: 30
Joined: Fri Apr 16, 2021 1:10 pm

Re: Long reboots on Ubuntu after installing NLS

Post by mlabbepg »

Here is the log file.

Thanks.
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Long reboots on Ubuntu after installing NLS

Post by pbroste »

Hello @mlabbepg

Thanks for the results.txt, which did not provide more details than we would have expected.

We know that during the following timestamped time range Logstash was hanging and want to know what the logstash.log is reporting:
Sep 10 02:00:00 syslog-mgmt logstash[1797412]: * Stopping Logstash Daemon
Sep 10 02:05:00 syslog-mgmt systemd[1]: logstash.service: Stopping timed out. Terminating.
Please let us know what you find in '/var/log/logstash/logstash.log' and '/var/log/elasticsearch/nagios_elasticsearch.log/log/elasticsearch/nagios_elasticsearch.log' from 02:00:00 to 02:05:00 on September 10th.

Thanks,
Perry
mlabbepg
Posts: 30
Joined: Fri Apr 16, 2021 1:10 pm

Re: Long reboots on Ubuntu after installing NLS

Post by mlabbepg »

Hi,

Here is the requested logstash logfile.
logstash_2021-09-10.zip
There is no /var/log/elasticsearch/nagios_elasticsearch.log file/folder (looks like you did a cut & paste error too).

sudo find / -name nagios_elasticsearch.log found nothing either.

I believe the logfile on our system is named /var/log/elasticsearch/8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log instead.

Code: Select all

sudo ls -l /var/log/elasticsearch
total 144
-rw-r--r-- 1 nagios nagios     0 May 15 11:45 8f463c53-c40a-4dcd-8e1d-87f3e6de8421_index_indexing_slowlog.log
-rw-r--r-- 1 nagios nagios     0 May 15 11:45 8f463c53-c40a-4dcd-8e1d-87f3e6de8421_index_search_slowlog.log
-rw-r--r-- 1 nagios nagios 13841 Oct  8 09:27 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log
-rw-r--r-- 1 nagios nagios  5766 Oct  7 20:00 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.1.gz
-rw-r--r-- 1 nagios nagios 18764 Oct  6 20:00 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.2.gz
-rw-r--r-- 1 nagios nagios 17570 Oct  5 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.3.gz
-rw-r--r-- 1 nagios nagios 16505 Oct  4 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.4.gz
-rw-r--r-- 1 nagios nagios 16498 Oct  3 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.5.gz
-rw-r--r-- 1 nagios nagios 17398 Oct  2 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.6.gz
-rw-r--r-- 1 nagios nagios 17711 Oct  1 23:59 8f463c53-c40a-4dcd-8e1d-87f3e6de8421.log.7.gz
Elasticsearch log file from september 10th is gone already.
I'm adding both logs for the Oct 07 (09:17:57 to 09:22:57) timeout event to have a complete trace.

Thanks.
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Long reboots on Ubuntu after installing NLS

Post by pbroste »

Hello @mlabbepg

Thanks for following up with the 'elasticsearch' and 'logstash' logs.

We see from the logstash messages stating;
message=>"Attempted to send a bulk request to Elasticsearch configured at '[\"http://localhost:9200\"]', but Elasticsearch appears to be unreachable or down!"
; around the sametime elasticsearch states
[WARN ][snapshots.......failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException:...... Failed to perform snapshot (index files)
Let's find out about your 'snapshots':

Code: Select all

curl -X GET "localhost:9200/_snapshot/_all?pretty=true"
Sample output:
{
"nameofyourrepository" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/your/store/logs/for/repositories/"
}
Let's run:

Code: Select all

curl -X PUT "localhost:9200/_snapshot/<nameofyourrepository>/snapshot_test?wait_for_completion=true&pretty"
Sample output:
"snapshot" : {
"snapshot" : "snapshot_test",
"version_id" : 1070699,
"version" : "1.7.6",
"indices" : [ "kibana-int", "logstash-2021.10.09", "logstash-2021.10.06", "logstash-2021.10.10", "logstash-2021.10.05", "logstash-2021.10.07", "nagioslogserver", "logstash-2021.09.30", "logstash-2021.10.11", "logstash-2021.10.02", "logstash-2021.10.04", "logstash-2021.10.01", "nagioslogserver_log", "logstash-2021.10.08", "logstash-2021.10.03", "nagioslogserver_history" ],
"state" : "SUCCESS",
"start_time" : "2021-10-11T16:19:23.820Z",
"start_time_in_millis" : 1633969163820,
"end_time" : "2021-10-11T16:19:28.218Z",
"end_time_in_millis" : 1633969168218,
"duration_in_millis" : 4398,
"failures" : [ ],
"shards" : {
"total" : 72,
"failed" : 0,
"successful" : 72
Let me know the results and follow up with a copy of the Nagios Log Server System Profile:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
It will generate a file called system-profile.tar.gz in /tmp. If the file is too big to send, please use the split command to chunk it into smaller pieces by:

Code: Select all

split -b 50M thenameofthesystemprofiletargzfilehere splitup
Send the 'splitupa, splitupb, and etc... in separate [PM] Private Messages.

Thanks,
Perry
mlabbepg
Posts: 30
Joined: Fri Apr 16, 2021 1:10 pm

Re: Long reboots on Ubuntu after installing NLS

Post by mlabbepg »

Here's the result:

Code: Select all

curl -X PUT "localhost:9200/_snapshot/sdb/snapshot_test?wait_for_completion=true&pretty"
{
  "snapshot" : {
    "snapshot" : "snapshot_test",
    "version_id" : 1070699,
    "version" : "1.7.6",
    "indices" : [ "logstash-2021.10.09", "logstash-2021.10.08", "logstash-2021.09.22", "logstash-2021.10.05", "logstash-2021.09.13", "logstash-2021.09.25", "logstash-2021.10.07", "logstash-2021.10.01", "logstash-2021.09.19", "logstash-2021.09.17", "logstash-2021.10.03", "logstash-2021.09.24", "logstash-2021.09.14", "logstash-2021.09.18", "kibana-int", "logstash-2021.09.16", "nagioslogserver", "logstash-2021.10.12", "logstash-2021.09.20", "logstash-2021.10.10", "logstash-2021.10.02", "logstash-2021.09.21", "logstash-2021.09.26", "logstash-2021.09.29", "logstash-2021.09.23", "logstash-2021.10.11", "logstash-2021.09.15", "logstash-2021.10.04", "logstash-2021.09.30", "logstash-2021.10.06", "logstash-2021.09.27", "nagioslogserver_log", "logstash-2021.09.28" ],
    "state" : "SUCCESS",
    "start_time" : "2021-10-12T15:35:05.511Z",
    "start_time_in_millis" : 1634052905511,
    "end_time" : "2021-10-12T15:35:12.634Z",
    "end_time_in_millis" : 1634052912634,
    "duration_in_millis" : 7123,
    "failures" : [ ],
    "shards" : {
      "total" : 161,
      "failed" : 0,
      "successful" : 161
    }
  }
}
-EDIT-
copy of the Nagios Log Server System Profile sent via PM.

Thanks.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Long reboots on Ubuntu after installing NLS

Post by pbroste »

@mlabbepg,
Thanks for following up with the Profile, after review we see that the instance is running out of resources while going through and stopping services. Let's have you increase the RAM mem and the http content length setting in the:
/usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml
Open the file with a text editor, and find the following setting:
# http.max_content_length: 100mb
Uncomment the line and change it to 500:
http.max_content_length: 500mb
Save the file and run the following commands:
service elasticsearch restart
service logstash restart
service httpd restart
Thanks,
Perry
Locked