Page 1 of 1

Elastic search issue in primary cluster

Posted: Fri May 19, 2017 6:26 am
by slumbard
I have installed 3 nodes as a clusted yesterday. I started sending syslog from 2400 devices. For some reason the dashboards were empty this monring. The primary cluster's status was red. I followed the following article: https://support.nagios.com/kb/article.php?id=90 , but the problem has become even more bigger now. Im new to nagios log server dont know what the issue is. Can someone help? My customer number is 39059.

Some of the error messages:

{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}

===================================

[root@a-sesklnglsprd1 ~]# service elasticsearch status
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: active (exited) since Fri 2017-05-19 11:05:36 GMT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 4969 ExecStop=/etc/rc.d/init.d/elasticsearch stop (code=exited, status=0/SUCCESS)
Process: 4991 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=0/SUCCESS)

May 19 11:05:35 a-sesklnglsprd1.astrazeneca.net systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session closed for user nagios
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: Starting elasticsearch: /bin/java: error while loading shared libraries: libjli.so: cannot open shared object file:... directory
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: [ OK ]
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net systemd[1]: Started LSB: This service manages the elasticsearch daemon.
Hint: Some lines were ellipsized, use -l to show in full.

===========================================

[root@a-sesklnglsprd1 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
curl: (7) Failed connect to localhost:9200; Connection refused

Re: Elastic search issue in primary cluster

Posted: Fri May 19, 2017 9:06 am
by slumbard
Now my other clusters are also down!

Someone please help ASAP!

Re: Elastic search issue in primary cluster

Posted: Fri May 19, 2017 9:21 am
by slumbard
Error message

Re: Elastic search issue in primary cluster

Posted: Fri May 19, 2017 9:22 am
by tmcdonald
I see you have also opened an email ticket for this, so we will be locking this thread and continuing in the email ticket. In the future please only open one or the other so we are not duplicating our efforts.