Nagios Support Forum

Posted: **Wed Apr 15, 2015 1:40 am**

I have the following image in my server:

[root@NagiosLogServer elasticsearch]# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 99G 98G 30M 100% /
devtmpfs 7.9G 152K 7.9G 1% /dev
tmpfs 7.9G 0 7.9G 0% /dev/shm
/dev/sda1 99G 98G 30M 100% /
/dev/sdb 60G 30G 27G 53% /NLSBackup

I issued a find command (find / -size +100000000c > files.txt) for large files and the results (txt attached) show that some files are huge for some dates:
/usr/local/nagioslogserver/elasticsearch/data/2b249934-e049-4f18-96ed-db395faae965/nodes/0/indices/logstash-2015.04.*
Of course now with filesystem in 100% I cannot access the GUI only CLI.
Any hints?

Thanx a lot

Posted: **Wed Apr 15, 2015 9:21 am**

Your best option will likely be to expand the size of the disk that NLS is running on - is that a possibility?

If not, we can delete indices safely through elasticsearch with the following curl command:

Code: Select all

curl -XDELETE 'http://localhost:9200/indexname/'

Once you have enough free space, I recommend accessing the GUI and setting up more strict 'Backup and Maintenance' settings:

Capture.PNG

Posted: **Fri Apr 17, 2015 9:28 am**

The hd expansion at the moment is difficult.

I had the following:
[root@NagiosLogServer ~]# curl -XGET 'http://localhost:9200/_cluster/health?level=indices'
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}[root@NagiosLogServer ~]#
[root@NagiosLogServer ~]# curl -XGET localhost:9200/_cluster/health
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}[root@NagiosLogServer ~]#

Then I manually deleted (from both nodes (2node cluster) ) a few(!) logstash directories under:
/usr/local/nagioslogserver/elasticsearch/data/2b249934-e049-4f18-96ed-db395faae965/nodes/0/indices
and rebooted.

So from 100% now I am at 93% but Exception above persists. Both services are up.

Posted: **Fri Apr 17, 2015 9:46 am**

Are you able to run any curl queries on your cluster?

Code: Select all

see master: curl 'localhost:9200/_cat/master?v'
see nodes: curl 'localhost:9200/_cat/nodes?v'
see shard health: curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards'
see shard status: curl -XGET http://localhost:9200/_cat/shards

Does everything appear to be working alright?

Posted: **Mon Apr 20, 2015 5:48 am**

All the curl commands where failing with:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

So what I did was to alter on "/usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml" the "# discovery.zen.minimum_master_nodes:" from 2 to 1. With this I managed to access the GUI. Then I had many exceptions in elasticsearch log so I deleted the "problematic" indexes. So at least 1 node seems to be operational.
Now when I put back "# discovery.zen.minimum_master_nodes:" from 1 to 2, I get the:

[root@NagiosLogServer elasticsearch]# tail -f 2b249934-e049-4f18-96ed-db395faae965.log
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.net.UnknownHostException: NagiosLogServer: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
... 81 more
[2015-04-20 13:16:10,577][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] initialized
[2015-04-20 13:16:10,577][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] starting ...
[2015-04-20 13:16:10,724][INFO ][transport ] [1048634e-2f8f-4ec5-9432-edba342d51dd] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.1.11.10:9300]}
[2015-04-20 13:16:10,728][INFO ][discovery ] [1048634e-2f8f-4ec5-9432-edba342d51dd] 2b249934-e049-4f18-96ed-db395faae965/ms2E2EBkTmiO2E4zV-Mb9A

[2015-04-20 13:16:40,731][WARN ][discovery ] [1048634e-2f8f-4ec5-9432-edba342d51dd] waited for 30s and no initial state was set by the discovery
[2015-04-20 13:16:40,736][INFO ][http ] [1048634e-2f8f-4ec5-9432-edba342d51dd] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-04-20 13:16:40,737][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] started
[2015-04-20 13:16:40,797][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:40,813][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:41,715][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:41,716][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:47,036][DEBUG][action.admin.cluster.state] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:17:01,992][DEBUG][action.admin.cluster.state] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry

so I cannot reestablish the cluster.

Posted: **Mon Apr 20, 2015 7:37 am**

I know that I will have this issue soon. how do I proactively expand the drive out? (using your appliance image. - which would be awesome if it was thin provisioned to a much larger drive!)

Posted: **Mon Apr 20, 2015 9:11 am**

Since you have the appliance image, you will have the capability to expand the physical drive. You will also need to expand the drive via fdisk or similar, and then you'll need to expand the filesystem (likely ext4).

Expansions are normally safe, but any procedure of this nature needs to be preceded with a backup of the system.

First, expand the disk in your VM Hypervisor.

Second, follow this guide to extend your partition: https://access.redhat.com/documentation ... -part.html

Last, expand your ext4 or similar filesystem: https://access.redhat.com/documentation ... 4grow.html

Let me know if you need any additional help. Thanks!

Jesse

Posted: **Tue Apr 21, 2015 8:23 am**

Can you pls have a look at my post prior to myriads ...
Thanx.

teirekos (post owner)

Posted: **Tue Apr 21, 2015 9:11 am**

teirekos wrote:Can you pls have a look at my post prior to myriads ...
Thanx.

teirekos (post owner)

Did you do this on both cluster hosts? Also can you show the output of the following from both servers

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts

Posted: **Tue Apr 21, 2015 9:12 am**

myriad wrote:I know that I will have this issue soon. how do I proactively expand the drive out? (using your appliance image. - which would be awesome if it was thin provisioned to a much larger drive!)

Also, see this
http://library.nagios.com/library/produ ... store-path

Nagios Support Forum

Filesystem full

Filesystem full

Re: Filesystem full

Re: Filesystem full

Re: Filesystem full

Re: Filesystem full

Re: Filesystem full

Re: Filesystem full

Re: Filesystem full

Re: Filesystem full

Re: Filesystem full