Page 1 of 2

Filesystem full

Posted: Wed Apr 15, 2015 1:40 am
by teirekos
I have the following image in my server:

[root@NagiosLogServer elasticsearch]# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 99G 98G 30M 100% /
devtmpfs 7.9G 152K 7.9G 1% /dev
tmpfs 7.9G 0 7.9G 0% /dev/shm
/dev/sda1 99G 98G 30M 100% /
/dev/sdb 60G 30G 27G 53% /NLSBackup


I issued a find command (find / -size +100000000c > files.txt) for large files and the results (txt attached) show that some files are huge for some dates:
/usr/local/nagioslogserver/elasticsearch/data/2b249934-e049-4f18-96ed-db395faae965/nodes/0/indices/logstash-2015.04.*
Of course now with filesystem in 100% I cannot access the GUI only CLI.
Any hints?

Thanx a lot

Re: Filesystem full

Posted: Wed Apr 15, 2015 9:21 am
by jolson
Your best option will likely be to expand the size of the disk that NLS is running on - is that a possibility?

If not, we can delete indices safely through elasticsearch with the following curl command:

Code: Select all

curl -XDELETE 'http://localhost:9200/indexname/'
Once you have enough free space, I recommend accessing the GUI and setting up more strict 'Backup and Maintenance' settings:
Capture.PNG

Re: Filesystem full

Posted: Fri Apr 17, 2015 9:28 am
by teirekos
The hd expansion at the moment is difficult.

I had the following:
[root@NagiosLogServer ~]# curl -XGET 'http://localhost:9200/_cluster/health?level=indices'
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}[root@NagiosLogServer ~]#
[root@NagiosLogServer ~]# curl -XGET localhost:9200/_cluster/health
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}[root@NagiosLogServer ~]#

Then I manually deleted (from both nodes (2node cluster) ) a few(!) logstash directories under:
/usr/local/nagioslogserver/elasticsearch/data/2b249934-e049-4f18-96ed-db395faae965/nodes/0/indices
and rebooted.

So from 100% now I am at 93% but Exception above persists. Both services are up.

Re: Filesystem full

Posted: Fri Apr 17, 2015 9:46 am
by jolson
Are you able to run any curl queries on your cluster?

Code: Select all

see master: curl 'localhost:9200/_cat/master?v'
see nodes: curl 'localhost:9200/_cat/nodes?v'
see shard health: curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards'
see shard status: curl -XGET http://localhost:9200/_cat/shards
Does everything appear to be working alright?

Re: Filesystem full

Posted: Mon Apr 20, 2015 5:48 am
by teirekos
All the curl commands where failing with:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

So what I did was to alter on "/usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml" the "# discovery.zen.minimum_master_nodes:" from 2 to 1. With this I managed to access the GUI. Then I had many exceptions in elasticsearch log so I deleted the "problematic" indexes. So at least 1 node seems to be operational.
Now when I put back "# discovery.zen.minimum_master_nodes:" from 1 to 2, I get the:

[root@NagiosLogServer elasticsearch]# tail -f 2b249934-e049-4f18-96ed-db395faae965.log
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.net.UnknownHostException: NagiosLogServer: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
... 81 more
[2015-04-20 13:16:10,577][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] initialized
[2015-04-20 13:16:10,577][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] starting ...
[2015-04-20 13:16:10,724][INFO ][transport ] [1048634e-2f8f-4ec5-9432-edba342d51dd] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.1.11.10:9300]}
[2015-04-20 13:16:10,728][INFO ][discovery ] [1048634e-2f8f-4ec5-9432-edba342d51dd] 2b249934-e049-4f18-96ed-db395faae965/ms2E2EBkTmiO2E4zV-Mb9A



[2015-04-20 13:16:40,731][WARN ][discovery ] [1048634e-2f8f-4ec5-9432-edba342d51dd] waited for 30s and no initial state was set by the discovery
[2015-04-20 13:16:40,736][INFO ][http ] [1048634e-2f8f-4ec5-9432-edba342d51dd] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-04-20 13:16:40,737][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] started
[2015-04-20 13:16:40,797][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:40,813][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:41,715][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:41,716][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:47,036][DEBUG][action.admin.cluster.state] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:17:01,992][DEBUG][action.admin.cluster.state] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry

so I cannot reestablish the cluster.

Re: Filesystem full

Posted: Mon Apr 20, 2015 7:37 am
by myriad
I know that I will have this issue soon. how do I proactively expand the drive out? (using your appliance image. - which would be awesome if it was thin provisioned to a much larger drive!)

Re: Filesystem full

Posted: Mon Apr 20, 2015 9:11 am
by jolson
Since you have the appliance image, you will have the capability to expand the physical drive. You will also need to expand the drive via fdisk or similar, and then you'll need to expand the filesystem (likely ext4).

Expansions are normally safe, but any procedure of this nature needs to be preceded with a backup of the system.

First, expand the disk in your VM Hypervisor.

Second, follow this guide to extend your partition: https://access.redhat.com/documentation ... -part.html

Last, expand your ext4 or similar filesystem: https://access.redhat.com/documentation ... 4grow.html

Let me know if you need any additional help. Thanks!


Jesse

Re: Filesystem full

Posted: Tue Apr 21, 2015 8:23 am
by teirekos
Can you pls have a look at my post prior to myriads ...
Thanx.

teirekos (post owner)

Re: Filesystem full

Posted: Tue Apr 21, 2015 9:11 am
by scottwilkerson
teirekos wrote:Can you pls have a look at my post prior to myriads ...
Thanx.

teirekos (post owner)
Did you do this on both cluster hosts? Also can you show the output of the following from both servers

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts

Re: Filesystem full

Posted: Tue Apr 21, 2015 9:12 am
by scottwilkerson
myriad wrote:I know that I will have this issue soon. how do I proactively expand the drive out? (using your appliance image. - which would be awesome if it was thin provisioned to a much larger drive!)
Also, see this
http://library.nagios.com/library/produ ... store-path