Filesystem full

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Filesystem full

Post by teirekos »

In my first node the cluster_hosts has the IP of the current node, on the 2nd node the file is empty

Tue Apr 21 17:19:27 EEST 2015
[root@NagiosLogServer var]# cat /usr/local/nagioslogserver/var/cluster_hosts
10.1.11.10
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Filesystem full

Post by jolson »

teirekos,

The cluster_hosts file should have the IP Addresses of both nodes in the cluster. For instance, on one of my nodes:

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_hosts
localhost
192.168.1.2 (same box as localhost)
192.168.1.3
What happens if you manually set these to your cluster hosts, and restart elasticsearch?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Filesystem full

Post by teirekos »

1) Yes it worked. It is strange though why those files where without the correct IPs or even empty?!
2) Also I've put back the "discovery.zen.minimum_master_nodes: 2" in both nodes
3) Before that though the filesystem in my second node was again 100%, because of 2 huge elasticsearch logfiles. I guess the log rotation we have to do it ourselfs?

4) Below you will see that ownerships and permissions are different. This is the case in many files and folders. The first server was from an old ovf release with a few upgrades, the second node was ovf with 1.3.

[root@NagiosLogServer var]# ls -ltr
total 20
-rwxrwxr-x 1 nagios nagios 37 Nov 10 16:44 cluster_uuid
-rwxrwxr-x 1 nagios nagios 37 Nov 10 16:44 node_uuid
-rwxrwxr-x. 1 nagios nagios 136 Apr 21 17:45 poller.log
-rwxrwxr-x. 1 nagios nagios 21 Apr 21 17:45 cluster_hosts
-rwxrwxr-x. 1 nagios nagios 152 Apr 21 17:45 jobs.log


[root@NagiosLogServer2 var]# ls -ltr
total 16
-rw-rw-r-- 1 nagios nagios 36 Feb 13 22:24 cluster_uuid
-rw-rw-r-- 1 nagios nagios 37 Feb 13 22:24 node_uuid
-rwxrwxr-x 1 nagios nagios 0 Apr 20 22:18 cluster_hosts~
-rw-r--r--. 1 nagios users 0 Apr 21 17:45 jobs.log
-rwxrwxr-x. 1 nagios nagios 21 Apr 21 17:45 cluster_hosts
-rw-r--r--. 1 nagios users 222 Apr 21 17:45 poller.log

So I guess at some point I am planing to reinstall the "first" node again hoping to remember all the things I've done so far. Is there any migration procedure?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Filesystem full

Post by jolson »

The cluster_hosts file is truncated roughly once per minute by the poller cron job:

Code: Select all

cat /etc/cron.d/nagioslogserver
* * * * * nagios /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller.log 2>&1
This cron job ensures that all instances in your cluster have all of the proper hosts in their cluster_hosts file. When your filesystem filled up, my guess is that the file was truncated and could not be written to.
Before that though the filesystem in my second node was again 100%, because of 2 huge elasticsearch logfiles. I guess the log rotation we have to do it ourselfs?
You do not have to do this yourself - the log rotation policy is specified in 'Backup and Maintenance':
2015-04-21 10_03_24-Backup _ Maintenance • Nagios Log Server.png
Below you will see that ownerships and permissions are different. This is the case in many files and folders. The first server was from an old ovf release with a few upgrades, the second node was ovf with 1.3.
I do not think that the ownership differences you are seeing will be a problem. I am on the latest release, and all files are owned by 'nagios:nagios' with permissions of '775'.
So I guess at some point I am planing to reinstall the "first" node again hoping to remember all the things I've done so far. Is there any migration procedure?
Why are you planning on re-installing the first node? Doing this is a matter of adding another node to the cluster, and then shutting down the dysfunctional node. One of the new nodes will be elected master, and you can then remove the dysfunctional node from the cluster via the web GUI.

See the following for more details about adding another node: http://assets.nagios.com/downloads/nagi ... luster.pdf
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
teirekos
Posts: 110
Joined: Wed Nov 26, 2014 6:06 am

Re: Filesystem full

Post by teirekos »

Ok I will create a new node, I will shut down the "problematic" one and I will apply the IP in the new one.
Thanx a lot for your help!
Pls close the thread.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Filesystem full

Post by jolson »

Will do - please feel free to open another thread if you need additional help or have further questions. Thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked