Logstash process exited, but running?
Posted: Thu Mar 23, 2017 10:17 am
Greetings,
I recently migrated my NLS clusters from RHEL 6.8 to 7.3, updated to NLS 1.4.4 while I was at it, and I'm seeing a weird behavior on a couple of them. Long story short, the dashboard shows Logstash as running and NLS seems to be behaving correctly, however on the server itself there is no indication that the logstash service is running and there is no java process for it that is active (verified by looking at top -u nagios).
On my bad node I see this:
The dashboard is happy --
Elasticsearch is alright...
And it's not just the behavior of logstash because on another note everything is alright...
As far as I can tell (and I should, since I built the damn things) - the configuration between the nodes is identical. As I stated, the cluster appears to be working fine, but I'm trying to set up my infrastructure monitoring to alert me when elasticsearch or logstash dies, and that' really hard to do if it can't tell they are alive in the first place.
I recently migrated my NLS clusters from RHEL 6.8 to 7.3, updated to NLS 1.4.4 while I was at it, and I'm seeing a weird behavior on a couple of them. Long story short, the dashboard shows Logstash as running and NLS seems to be behaving correctly, however on the server itself there is no indication that the logstash service is running and there is no java process for it that is active (verified by looking at top -u nagios).
On my bad node I see this:
The dashboard is happy --
Code: Select all
[root@schpnag1 ~]# systemctl -l status logstash
● logstash.service - LSB: Logstash
Loaded: loaded (/etc/rc.d/init.d/logstash; bad; vendor preset: disabled)
Active: active (exited) since Thu 2017-03-23 09:58:31 CDT; 1min 49s ago
Docs: man:systemd-sysv-generator(8)
Process: 19321 ExecStop=/etc/rc.d/init.d/logstash stop (code=exited, status=0/SUCCESS)
Process: 19330 ExecStart=/etc/rc.d/init.d/logstash start (code=exited, status=0/SUCCESS)
Mar 23 09:58:31 schpnag1 systemd[1]: Starting LSB: Logstash...
Mar 23 09:58:31 schpnag1 runuser[19336]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
Mar 23 09:58:31 schpnag1 logstash[19330]: Starting Logstash Daemon: [ OK ]
Mar 23 09:58:31 schpnag1 systemd[1]: Started LSB: Logstash.
Mar 23 09:58:47 schpnag1 runuser[19336]: pam_unix(runuser:session): session closed for user nagios
Code: Select all
[root@schpnag1 ~]# systemctl -l status elasticsearch
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: active (running) since Tue 2017-03-07 09:42:52 CST; 2 weeks 1 days ago
Docs: man:systemd-sysv-generator(8)
Process: 15378 ExecStop=/etc/rc.d/init.d/elasticsearch stop (code=exited, status=0/SUCCESS)
Process: 15944 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/elasticsearch.service
└─15964 java -Xms32125m -Xmx32125m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Des.cluster.name=4f703585-84ab-40e0-9ff9-f72c904bdc38 -Des.node.name=dd72d9be-b8a7-484b-9f56-dcd2374c36e8 -Des.discovery.zen.ping.unicast.hosts=localhost,,schpnag2 -Des.path.repo=/ -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/local/nagioslogserver/elasticsearch -cp :/usr/local/nagioslogserver/elasticsearch/lib/elasticsearch-1.6.0.jar:/usr/local/nagioslogserver/elasticsearch/lib/*:/usr/local/nagioslogserver/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/local/nagioslogserver/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/nagios/data -Des.default.path.work=/usr/local/nagioslogserver/tmp/elasticsearch -Des.default.path.conf=/usr/local/nagioslogserver/elasticsearch/config org.elasticsearch.bootstrap.Elasticsearch
Mar 07 09:42:52 schpnag1 systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
Mar 07 09:42:52 schpnag1 runuser[15955]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
Mar 07 09:42:52 schpnag1 runuser[15955]: pam_unix(runuser:session): session closed for user nagios
Mar 07 09:42:52 schpnag1 elasticsearch[15944]: Starting elasticsearch: [ OK ]
Mar 07 09:42:52 schpnag1 systemd[1]: Started LSB: This service manages the elasticsearch daemon.Code: Select all
[root@schpnag11 ~]# systemctl -l status logstash
● logstash.service - LSB: Logstash
Loaded: loaded (/etc/rc.d/init.d/logstash; bad; vendor preset: disabled)
Active: active (running) since Tue 2017-02-21 15:09:16 CST; 4 weeks 1 days ago
Docs: man:systemd-sysv-generator(8)
Process: 17753 ExecStop=/etc/rc.d/init.d/logstash stop (code=exited, status=0/SUCCESS)
Process: 17762 ExecStart=/etc/rc.d/init.d/logstash start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/logstash.service
├─17768 runuser -s /bin/sh -c exec /usr/local/nagioslogserver/logstash/bin/logstash agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4 nagios
└─17770 java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4