ES dead on all 3 nodes of the cluster

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

ES dead on all 3 nodes of the cluster

Post by gormank »

I found that ES wasn't running on 2/3 NLS hosts, restarted and they didn't stay up. I the made the mistake of restarting ES on the 1st host and of course it dies there as well.
I'm getting the following on all NLS nodes and ES dies after about 10 seconds when started.
I searched the forum, KB and internet and see very little about this message in general.
NLS is running the current version on RHEL 7.8.
Any suggestions fo reviving ES?

Code: Select all

264) Error injecting constructor, java.lang.IllegalStateException: This is a proxy used to support circular references involving constructors. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.
  at org.elasticsearch.river.routing.RiversRouter.<init>(Unknown Source)
  while locating org.elasticsearch.river.routing.RiversRouter
    for parameter 3 at org.elasticsearch.river.RiversManager.<init>(Unknown Source)
  while locating org.elasticsearch.river.RiversManager
Caused by: java.lang.IllegalStateException: This is a proxy used to support circular references involving constructors. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.
        at org.elasticsearch.common.inject.internal.ConstructionContext$DelegatingInvocationHandler.invoke(ConstructionContext.java:102)
        at com.sun.proxy.$Proxy12.add(Unknown Source)
        at org.elasticsearch.river.routing.RiversRouter.<init>(RiversRouter.java:82)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
        at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
        at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
        at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
        at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
        at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
        at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
        at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
        at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
        at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
        at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
        at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
        at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
        at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
        at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
        at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
        at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
        at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:210)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)

264 errors
        at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
        at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:178)
        at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
        at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93)
        at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70)
        at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59)
        at org.elasticsearch.node.internal.InternalNode.<init>(InternalNode.java:210)
        at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:77)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:245)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: ES dead on all 3 nodes of the cluster

Post by pbroste »

Hello @gormank

Thanks for reaching out, and providing the details. not quite sure what is going on so we will need some additional information.

Let's get the following details:

Java:
  • java -version
And any interesting elasticsearch logs (and private message them over to me, logs.tar.gz will be in the /tmp/)
  • Code: Select all

    tar -cvzf /tmp/logs.tar.gz /var/log/elasticsearch/*.log /usr/local/nagioslogserver/var/*.log
Thanks,
Perry
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: ES dead on all 3 nodes of the cluster

Post by gormank »

Hi,
Here's the java version. I'll grab the log info and PM it...

# java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: ES dead on all 3 nodes of the cluster

Post by pbroste »

Hello @gormank

Thanks for following up and providing the logs.

After review, we see log messages failing to resolved ipv6 IP addresses. With that I am wondering if there has been changes or updates to the network on your environment? (ie proxy, ipv4 to ipv6, DNS, firewall, SELinux security application, and etc)

Code: Select all

Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: Failed to resolve address for [2001:4888:a00:.......ff2:0:b02]
Also, want to verify that the date/time/timezone is the configure the same across the os and configs.

Code: Select all

timezone check:

php -r 'echo date("D M j G:i:s T Y")."\n";'

mysql ....SELECT 
@@GLOBAL.time_zone,
 @@SESSION.time_zone;'
date
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
grep "date.timezone =" /etc/php.ini
grep date.timezone /etc/php.ini
Please let us know what you find so we continue to assist with further troubleshooting.

Thanks,
Perry
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: ES dead on all 3 nodes of the cluster

Post by gormank »

I saw the TZ was wrong in on php.ini file but it defaulted to UTC. I updated that.
The SQL seems to be malformed so nothing but an error from that.
The date and time are correct, the address is resolvable and selinux is disabled.

[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ ip a | grep 2001:4888:a00
inet6 2001:4888:a00:3154:f0:ff2:0:b01/64 scope global noprefixroute
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ nslookup 2001:4888:a00:3154:f0:ff2:0:b01
1.0.b.0.0.0.0.0.2.f.f.0.0.f.0.0.4.5.1.3.0.0.a.0.8.8.8.4.1.0.0.2.ip6.arpa name = sandcaykhsc-v-sweslog-01.iotsc.cdsapps.com.
1.0.b.0.0.0.0.0.2.f.f.0.0.f.0.0.4.5.1.3.0.0.a.0.8.8.8.4.1.0.0.2.ip6.arpa name = syslog-sand.iotsc.cdsapps.com.

[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ nslookup 2001:4888:a00:3154:f0:ff2:0:b02
2.0.b.0.0.0.0.0.2.f.f.0.0.f.0.0.4.5.1.3.0.0.a.0.8.8.8.4.1.0.0.2.ip6.arpa name = sandcaykhsc-v-sweslog-02.iotsc.cdsapps.com.
2.0.b.0.0.0.0.0.2.f.f.0.0.f.0.0.4.5.1.3.0.0.a.0.8.8.8.4.1.0.0.2.ip6.arpa name = syslog-sand.iotsc.cdsapps.com.

[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ nslookup 2001:4888:a00:3154:f0:ff2:0:b03
3.0.b.0.0.0.0.0.2.f.f.0.0.f.0.0.4.5.1.3.0.0.a.0.8.8.8.4.1.0.0.2.ip6.arpa name = sandcaykhsc-v-sweslog-03.iotsc.cdsapps.com.

[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ sestatus
SELinux status: disabled
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ php -r 'echo date("D M j G:i:s T Y")."\n";'
Mon Aug 16 18:28:06 UTC 2021
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ date
Mon Aug 16 18:28:11 UTC 2021
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ mysql ....SELECT
-sh: mysql: command not found
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ @@GLOBAL.time_zone,
-sh: @@GLOBAL.time_zone,: command not found
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ @@SESSION.time_zone;'
> ^C
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ ls -l /etc/localtime
lrwxrwxrwx 1 root root 25 Mar 31 16:38 /etc/localtime -> ../usr/share/zoneinfo/UTC
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ php -r 'echo date("D M j G:i:s T Y")."\n";'
Mon Aug 16 18:29:36 UTC 2021
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ grep "date.timezone =" /etc/php.ini
date.timezone = UTC
[STAGING gormke1@sandcaykhsc-v-sweslog-01 ~]$ grep date.timezone /etc/php.ini
; http://php.net/date.timezone
date.timezone = UTC

[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ ls -l /etc/localtime
lrwxrwxrwx 1 root root 25 Mar 31 16:38 /etc/localtime -> ../usr/share/zoneinfo/UTC
[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ php -r 'echo date("D M j G:i:s T Y")."\n";'
PHP Warning: date(): Invalid date.timezone value 'n/a', we selected the timezone 'UTC' for now. in Command line code on line 1
Mon Aug 16 18:29:34 UTC 2021
[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ php -r 'echo date("D M j G:i:s T Y")."\n";'
PHP Warning: date(): Invalid date.timezone value 'n/a', we selected the timezone 'UTC' for now. in Command line code on line 1
Mon Aug 16 18:29:46 UTC 2021
[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ grep "date.timezone =" /etc/php.ini
date.timezone = n/a
[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ grep date.timezone /etc/php.ini
; http://php.net/date.timezone
date.timezone = n/a
[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ ll /etc/php.ini
-rw-r--r--. 1 root root 64949 Dec 10 2018 /etc/php.ini
[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ sudo vi /etc/php.ini
[STAGING gormke1@sandcaykhsc-v-sweslog-02 ~]$ grep date.timezone /etc/php.ini
; http://php.net/date.timezone
date.timezone = UTC


[STAGING gormke1@sandcaykhsc-v-sweslog-03 ~]$ ls -l /etc/localtime
lrwxrwxrwx. 1 root root 25 Jul 28 2020 /etc/localtime -> ../usr/share/zoneinfo/UTC
[STAGING gormke1@sandcaykhsc-v-sweslog-03 ~]$ php -r 'echo date("D M j G:i:s T Y")."\n";'
Mon Aug 16 18:29:32 UTC 2021
[STAGING gormke1@sandcaykhsc-v-sweslog-03 ~]$ grep "date.timezone =" /etc/php.ini
date.timezone = UTC
[STAGING gormke1@sandcaykhsc-v-sweslog-03 ~]$ grep date.timezone /etc/php.ini
; http://php.net/date.timezone
date.timezone = UTC
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: ES dead on all 3 nodes of the cluster

Post by gormank »

I was getting some out of resources messages so I restarted the hosts. ES still wouldn't run, but I removed the other hosts from each cluster_hosts file and then ES runs on each host. If I add the hosts back to the file and restart ES, it dies.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: ES dead on all 3 nodes of the cluster

Post by pbroste »

Hello @gormank

Thanks for providing the details, and would like to get the System Profile from your environment.

Please run the following script from the command:

Code: Select all

/usr/local/nagioslogserver/scripts/profile.sh
This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through the system. This is usually due to the logs in the Logstash and/or Elasticseach directories found in it. If it is too large, please run a split on the compressed archive and send each part (split -b xxxM /tmp/system-profile.tar.gz sysprofile-part)

Please send via private message.

Thanks,
Perry
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: ES dead on all 3 nodes of the cluster

Post by gormank »

I sent the PMs w/ system profiles but for the moment they're in the sent folder.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: ES dead on all 3 nodes of the cluster

Post by pbroste »

Hello @gormank

Thanks for sending along the System Profile.

In review we see that logstash is complaining about Too many open files; and when the exception breaks the chain we see that the elasticsearch service stops.

A few things to verify and then adjust; first to take a look at the number of open files by the logstash process:

Find the pid:

Code: Select all

ps -aux | grep -Ei 'logstash' | grep -Ei 'java'
Find current count:

Code: Select all

lsof -p LogstashPID | wc -l
or

Code: Select all

lsof | grep -Ei 'logstash' | wc -l
Checkout the status/mem in /proc/ for the logstashpid

Verify file handle count is not out of control:

Code: Select all

ls -al /proc/<PID>/fd |wc -l
Check on the limits:

Code: Select all

cat /proc/logstashpid/limits
or

Code: Select all

ulimit -Sn
and

Code: Select all

ulimit -Hn
Check on status:

Code: Select all

cat /proc/logstashpid/status
I understand that you removed host(s) from monitoring to see if elasticsearch would continue to run. When you add the host again you may want to watch the real-time counts.

Code: Select all

watch -n <intervaltimetowait> lsof -p <logstashpid> | wc -l
Once you have determined the status and the numbers you can then increase the limit. Here is an external web source that others have referenced:
Finally verify everything is working with curl:

Code: Select all

curl -XGET 'http://localhost:9200/_nodes?os=true&process=true&pretty=true'
Thanks,
Perry
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: ES dead on all 3 nodes of the cluster

Post by gormank »

Hmm, that seems strange since my indexes are only kept open for 14 days.
I see the syslog messages about open files.
I'm looking at the open files link you sent. I think I'll stop logstash and see if that helps ES form it's cluster.
The system that sends log data is being rebuit so that might be the issue. Maybe we're being flooded w/ messages.

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# ps -aux | grep -Ei 'logstash' | grep -Ei 'java'
nagios 1374 1.8 1.1 5188484 735364 ? SNsl Aug16 51:51 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -XX:HeapDumpPath=/usr/local/nagioslogserver/logstash/heapdump.hprof -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# lsof -p 1374 | wc -l
596

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# lsof | grep -Ei 'logstash' | wc -l
294110

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# ls -al /proc/1374/fd |wc -l
533

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# cat /proc/1374/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 4096 256973 processes
Max open files 16384 16384 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 256973 256973 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# ulimit -Sn
1024

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# ulimit -Hn
4096

[STAGING root@sandcaykhsc-v-sweslog-01 logstash]# cat /proc/1374/status
Name: java
Umask: 0022
State: S (sleeping)
Tgid: 1374
Ngid: 0
Pid: 1374
PPid: 1334
TracerPid: 0
Uid: 1922 1922 1922 1922
Gid: 1922 1922 1922 1922
FDSize: 1024
Groups: 48 1922 1923
VmPeak: 5192788 kB
VmSize: 5192788 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 735804 kB
VmRSS: 735716 kB
RssAnon: 718956 kB
RssFile: 16760 kB
RssShmem: 0 kB
VmData: 5021340 kB
VmStk: 132 kB
VmExe: 4 kB
VmLib: 18268 kB
VmPTE: 3560 kB
VmSwap: 0 kB
Threads: 489
SigQ: 0/256973
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000002
SigCgt: 2000000181005ccd
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: vulnerable
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 35
nonvoluntary_ctxt_switches: 146
Locked