Page 1 of 4
Nagios Log Server is showing RED in it's status
Posted: Thu May 04, 2017 2:43 pm
by srinivasmandalika
Hello,
We are facing problems with our Nagios Log Server which are as below...
1. Status of Instance is showing RED
2. All the Shards in Health are showing as unassigned
3. User cannot login into Nagios using URL
Please find screenshots for above as attached...
This all started when our disk space was filled... We increased the disk space and rebooted the machine... But, still the said problems persisted... We followed all the said in the document,
https://support.nagios.com/kb/article.php?id=90 ... As we do not have any backups, we did not follow Index Deletion Command and as there are almost 1822 Shards, I am not sure if I can do reassignment command for all at once...
Please let me know if there should be anything else we need to do...
Thanks!
Srinivas Mandalika
Re: Nagios Log Server is showing RED in it's status
Posted: Thu May 04, 2017 3:17 pm
by mcapra
Can you send over the contents of your Elasticsearch logs? Please send all files in this path (if possible):
Can you also share the full outputs of the following commands executed from the CLI of your Nagios Log Server machine:
Code: Select all
df -h
curl -s localhost:9200/_cat/shards
curl 'localhost:9200/_cat/nodes?v'
curl 'localhost:9200/_cat/master?v'
curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards&pretty'
curl -XGET localhost:9200/_nodes/jvm?pretty
Re: Nagios Log Server is showing RED in it's status
Posted: Fri May 05, 2017 5:04 am
by srinivasmandalika
I am unable to upload files that are above certain space (I think greater than 2 or 3 MB and not more than 3 in number), I have uploaded all I can including the output of the given commands...
Let me know if there is any other method to send you the files...
Let me know if you need any further information...
Thanks!
Srinivas Mandalika
Re: Nagios Log Server is showing RED in it's status
Posted: Fri May 05, 2017 10:46 am
by dwhitfield
Although space will still be a concern, you can get them all into one file with tar -zcvf /tmp/supporttar.tar.gz /var/log/elasticsearch.
Assuming that is too large, PM me, and I'll have you email it. If it is too large for email, you can put it on Dropbox or some other file sharing location, we can download, and then you can delete it.
Re: Nagios Log Server is showing RED in it's status
Posted: Fri May 05, 2017 2:40 pm
by srinivasmandalika
I have uploaded it Jumpshare... Please find link as below...
http://jmp.sh/L8cUMN4
Thanks!
Srinivas Mandalika
Re: Nagios Log Server is showing RED in it's status
Posted: Mon May 08, 2017 9:33 am
by mcapra
It looks like this machine may be running out of space:
Code: Select all
TranslogException[[logstash-2017.05.03][0] failed to create new translog file]; nested: IOException[No space left on device]; ]]
Can you try freeing up some space on this machine or increasing the disk size? To free up some space, you can delete old indices with the following command (Elasticsearch will need to be running):
DISCLAIMER: THIS WILL DELETE SOME DATA AND IT WILL NOT BE RECOVERABLE
Code: Select all
curator delete indices --older-than 60 --time-unit days --timestring %Y.%m.%d
Replace the
60 part of that command with whatever is appropriate for your environment. In the above example, I am deleting any data older than 60 days.
Re: Nagios Log Server is showing RED in it's status
Posted: Mon May 08, 2017 12:26 pm
by srinivasmandalika
I did what you said, but, I still see status as RED...
Srinivas Mandalika
Re: Nagios Log Server is showing RED in it's status
Posted: Mon May 08, 2017 12:28 pm
by srinivasmandalika
Free space is around 138 GB
Re: Nagios Log Server is showing RED in it's status
Posted: Mon May 08, 2017 12:31 pm
by mcapra
Did you restart the Elasticsearch service afterwards on each Nagios Log Server instance?
Can I get fresh outputs of all the following commands:
Code: Select all
service elasticsearch restart
df -h
curl -s localhost:9200/_cat/shards
curl 'localhost:9200/_cat/nodes?v'
curl 'localhost:9200/_cat/master?v'
curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards&pretty'
curl -XGET localhost:9200/_nodes/jvm?pretty
Re: Nagios Log Server is showing RED in it's status
Posted: Tue May 09, 2017 9:06 am
by srinivasmandalika
Please find output in the attached file...
Srinivas Mandalika