Elastic search issue in primary cluster
Posted: Fri May 19, 2017 6:26 am
I have installed 3 nodes as a clusted yesterday. I started sending syslog from 2400 devices. For some reason the dashboards were empty this monring. The primary cluster's status was red. I followed the following article: https://support.nagios.com/kb/article.php?id=90 , but the problem has become even more bigger now. Im new to nagios log server dont know what the issue is. Can someone help? My customer number is 39059.
Some of the error messages:
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
===================================
[root@a-sesklnglsprd1 ~]# service elasticsearch status
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: active (exited) since Fri 2017-05-19 11:05:36 GMT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 4969 ExecStop=/etc/rc.d/init.d/elasticsearch stop (code=exited, status=0/SUCCESS)
Process: 4991 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=0/SUCCESS)
May 19 11:05:35 a-sesklnglsprd1.astrazeneca.net systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session closed for user nagios
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: Starting elasticsearch: /bin/java: error while loading shared libraries: libjli.so: cannot open shared object file:... directory
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: [ OK ]
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net systemd[1]: Started LSB: This service manages the elasticsearch daemon.
Hint: Some lines were ellipsized, use -l to show in full.
===========================================
[root@a-sesklnglsprd1 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
curl: (7) Failed connect to localhost:9200; Connection refused
Some of the error messages:
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2017-05-19T03:35:22.284000+0000", :message=>"retrying failed action with response code: 503", :level=>:warn}
===================================
[root@a-sesklnglsprd1 ~]# service elasticsearch status
● elasticsearch.service - LSB: This service manages the elasticsearch daemon
Loaded: loaded (/etc/rc.d/init.d/elasticsearch; bad; vendor preset: disabled)
Active: active (exited) since Fri 2017-05-19 11:05:36 GMT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 4969 ExecStop=/etc/rc.d/init.d/elasticsearch stop (code=exited, status=0/SUCCESS)
Process: 4991 ExecStart=/etc/rc.d/init.d/elasticsearch start (code=exited, status=0/SUCCESS)
May 19 11:05:35 a-sesklnglsprd1.astrazeneca.net systemd[1]: Starting LSB: This service manages the elasticsearch daemon...
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session opened for user nagios by (uid=0)
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net runuser[5013]: pam_unix(runuser:session): session closed for user nagios
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: Starting elasticsearch: /bin/java: error while loading shared libraries: libjli.so: cannot open shared object file:... directory
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net elasticsearch[4991]: [ OK ]
May 19 11:05:36 a-sesklnglsprd1.astrazeneca.net systemd[1]: Started LSB: This service manages the elasticsearch daemon.
Hint: Some lines were ellipsized, use -l to show in full.
===========================================
[root@a-sesklnglsprd1 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
curl: (7) Failed connect to localhost:9200; Connection refused