Filesystem full

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Filesystem full

Post by teirekos »

I have the following image in my server:

[root@NagiosLogServer elasticsearch]# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 99G 98G 30M 100% /
devtmpfs 7.9G 152K 7.9G 1% /dev
tmpfs 7.9G 0 7.9G 0% /dev/shm
/dev/sda1 99G 98G 30M 100% /
/dev/sdb 60G 30G 27G 53% /NLSBackup


I issued a find command (find / -size +100000000c > files.txt) for large files and the results (txt attached) show that some files are huge for some dates:
/usr/local/nagioslogserver/elasticsearch/data/2b249934-e049-4f18-96ed-db395faae965/nodes/0/indices/logstash-2015.04.*
Of course now with filesystem in 100% I cannot access the GUI only CLI.
Any hints?

Thanx a lot
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Filesystem full

Post by jolson »

Your best option will likely be to expand the size of the disk that NLS is running on - is that a possibility?

If not, we can delete indices safely through elasticsearch with the following curl command:

Code: Select all

curl -XDELETE 'http://localhost:9200/indexname/'
Once you have enough free space, I recommend accessing the GUI and setting up more strict 'Backup and Maintenance' settings:
Capture.PNG
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Filesystem full

Post by teirekos »

The hd expansion at the moment is difficult.

I had the following:
[root@NagiosLogServer ~]# curl -XGET 'http://localhost:9200/_cluster/health?level=indices'
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}[root@NagiosLogServer ~]#
[root@NagiosLogServer ~]# curl -XGET localhost:9200/_cluster/health
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}[root@NagiosLogServer ~]#

Then I manually deleted (from both nodes (2node cluster) ) a few(!) logstash directories under:
/usr/local/nagioslogserver/elasticsearch/data/2b249934-e049-4f18-96ed-db395faae965/nodes/0/indices
and rebooted.

So from 100% now I am at 93% but Exception above persists. Both services are up.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Filesystem full

Post by jolson »

Are you able to run any curl queries on your cluster?

Code: Select all

see master: curl 'localhost:9200/_cat/master?v'
see nodes: curl 'localhost:9200/_cat/nodes?v'
see shard health: curl -XGET 'http://localhost:9200/_cluster/health/*?level=shards'
see shard status: curl -XGET http://localhost:9200/_cat/shards
Does everything appear to be working alright?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Filesystem full

Post by teirekos »

All the curl commands where failing with:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}

So what I did was to alter on "/usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml" the "# discovery.zen.minimum_master_nodes:" from 2 to 1. With this I managed to access the GUI. Then I had many exceptions in elasticsearch log so I deleted the "problematic" indexes. So at least 1 node seems to be operational.
Now when I put back "# discovery.zen.minimum_master_nodes:" from 1 to 2, I get the:

[root@NagiosLogServer elasticsearch]# tail -f 2b249934-e049-4f18-96ed-db395faae965.log
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70)
at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)
Caused by: java.net.UnknownHostException: NagiosLogServer: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
... 81 more
[2015-04-20 13:16:10,577][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] initialized
[2015-04-20 13:16:10,577][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] starting ...
[2015-04-20 13:16:10,724][INFO ][transport ] [1048634e-2f8f-4ec5-9432-edba342d51dd] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.1.11.10:9300]}
[2015-04-20 13:16:10,728][INFO ][discovery ] [1048634e-2f8f-4ec5-9432-edba342d51dd] 2b249934-e049-4f18-96ed-db395faae965/ms2E2EBkTmiO2E4zV-Mb9A



[2015-04-20 13:16:40,731][WARN ][discovery ] [1048634e-2f8f-4ec5-9432-edba342d51dd] waited for 30s and no initial state was set by the discovery
[2015-04-20 13:16:40,736][INFO ][http ] [1048634e-2f8f-4ec5-9432-edba342d51dd] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-04-20 13:16:40,737][INFO ][node ] [1048634e-2f8f-4ec5-9432-edba342d51dd] started
[2015-04-20 13:16:40,797][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:40,813][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:41,715][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:41,716][DEBUG][action.admin.indices.create] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:16:47,036][DEBUG][action.admin.cluster.state] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry
[2015-04-20 13:17:01,992][DEBUG][action.admin.cluster.state] [1048634e-2f8f-4ec5-9432-edba342d51dd] no known master node, scheduling a retry

so I cannot reestablish the cluster.
myriad
Posts: 26
Joined: Tue Dec 02, 2014 1:29 pm

Re: Filesystem full

Post by myriad »

I know that I will have this issue soon. how do I proactively expand the drive out? (using your appliance image. - which would be awesome if it was thin provisioned to a much larger drive!)
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Filesystem full

Post by jolson »

Since you have the appliance image, you will have the capability to expand the physical drive. You will also need to expand the drive via fdisk or similar, and then you'll need to expand the filesystem (likely ext4).

Expansions are normally safe, but any procedure of this nature needs to be preceded with a backup of the system.

First, expand the disk in your VM Hypervisor.

Second, follow this guide to extend your partition: https://access.redhat.com/documentation ... -part.html

Last, expand your ext4 or similar filesystem: https://access.redhat.com/documentation ... 4grow.html

Let me know if you need any additional help. Thanks!


Jesse
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Filesystem full

Post by teirekos »

Can you pls have a look at my post prior to myriads ...
Thanx.

teirekos (post owner)
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Filesystem full

Post by scottwilkerson »

teirekos wrote:Can you pls have a look at my post prior to myriads ...
Thanx.

teirekos (post owner)
Did you do this on both cluster hosts? Also can you show the output of the following from both servers

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Filesystem full

Post by scottwilkerson »

myriad wrote:I know that I will have this issue soon. how do I proactively expand the drive out? (using your appliance image. - which would be awesome if it was thin provisioned to a much larger drive!)
Also, see this
http://library.nagios.com/library/produ ... store-path
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked