Elastic search issue in primary cluster

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
slumbard
Posts: 3
Joined: Thu May 18, 2017 4:34 am

Elastic search issue in primary cluster

Post by slumbard »

I have installed 3 nodes as a clusted yesterday. I started sending syslog from 2400 devices. For some reason the dashboards were empty this monring. The primary cluster's status was red. I followed the following article: https://support.nagios.com/kb/article.php?id=90 , but the problem has become even more bigger now. Im new to nagios log server dont know what the issue is. Can someone help? My customer number is 39059.

Some of the error messages:

{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}

===================================

[root@a-sesklnglsprd1 ~]# service elasticsearch status
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: active (exited) since Fri 2017-05-19 11:05:36 GMT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 4969 ExecStop=/etc/rc.d/init.d/elasticsearch stop (code=exited, status=0/SUCCESS)
Process: 4991 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=0/SUCCESS)

May 19 11:05:35 a-sesklnglsprd1.astrazeneca.net systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session closed for user nagios
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: Starting elasticsearch: /bin/java: error while loading shared libraries: libjli.so: cannot open shared object file:... directory
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: [ OK ]
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net systemd[1]: Started LSB: This service manages the elasticsearch daemon.
Hint: Some lines were ellipsized, use -l to show in full.

===========================================

[root@a-sesklnglsprd1 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
curl: (7) Failed connect to localhost:9200; Connection refused
slumbard
Posts: 3
Joined: Thu May 18, 2017 4:34 am

Re: Elastic search issue in primary cluster

Post by slumbard »

Now my other clusters are also down!

Someone please help ASAP!
slumbard
Posts: 3
Joined: Thu May 18, 2017 4:34 am

Re: Elastic search issue in primary cluster

Post by slumbard »

Error message
You do not have the required permissions to view the files attached to this post.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Elastic search issue in primary cluster

Post by tmcdonald »

I see you have also opened an email ticket for this, so we will be locking this thread and continuing in the email ticket. In the future please only open one or the other so we are not duplicating our efforts.
Former Nagios employee
Locked