Page 1 of 1

Freaky dissapearing /usr/local/nagios/var on Parallels VM

Posted: Wed Aug 17, 2016 1:26 pm
by rexconsulting
Parallels is not supported I guess but it was working great as a test system on an old MacPro until the MacPro rebooted causing the VMs to pause (gracefully it appeared). Now when it comes up, the /usr/local/nagios/var seems like it's been recreated since only a few pid files live there after the reboot. Gone are the subdirectories /usr/local/nagios/var/spool and /usr/local/nagios/var/rw so nagios will not start nor will the graphing engine.

I am working with Parallels support on this. They have asked for screenshots....

It's very perplexing. I can restore a snapshot from prior the reboot and it is OK, but then if I reboot it, it comes up with the /usr/local/nagios/var contents nuked again.

What's could be nuking my var? It's so freaky. It's a test system so not a huge deal. Maybe VirtualBox will behave better? Or Xen?

thanks

CP

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Wed Aug 17, 2016 4:09 pm
by tmcdonald
Yea, sort of a tricky spot for us. The tech side of me says a properly-coded VM hypervisor should not cause this, but the slightly more jaded tech side of me says it's a wonder computers are able to run at all. From a support perspective, I would say to check the permissions on those directories against a working server and compare, might be that something got created with the wrong user or permissions and the nagios user can't do its job.

The reboot thing definitely scares me a bit, sounds almost like disk corruption.

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Thu Jan 05, 2017 11:07 pm
by rexconsulting
Hello again.

I have seen this recur, this time on my shiny new lab Xen server. I am pretty sure it is LVM partition corruption of some sort, not sure why fsck doesn't catch it (guest instance running XI is running CentOS 7 with one partition system EXT4-fs xvda1 ).

This time I am pretty sure that the guest VM running XI suffered a hard reset (long story - involving some human error - too late a night in the lab last night).

I'm going to set this guest instance/ LVM partition aside and check into it later since I have no good answers for this at the moment and have to jump to TNGT(*), but the symptoms are worth noting here and are exactly the same: I re-create the missing "mkdir /usr/local/nagios/var/spool" and then it's OK until the next reboot (even a graceful one). When it comes back up without any system error, nagios fails to start because /usr/local/nagios/var/spool is again missing.

I'd like to think there was a nice easy fix for this but maybe a hard reset on VM images can cause irreparable corruption. Or maybe something else is going on here. I am not going to jump to conclusions while there is a lot still unknown here.

thanks
CP

* The Next Great Thing

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Fri Jan 06, 2017 10:39 am
by rkennedy
Is SELinux running by chance? I've seen odd problems similar to this, occur with Core, because of it. Could you PM over a system profile for us to look at, and see if anything sticks out here?

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Mon Jan 09, 2017 8:04 pm
by rexconsulting
Rats! it just happened again and this time after just a "reboot" of the VM... no crash or anything. When I "Download Profile", I get these files (see below). Which do you want to see?

grasshop:profile cp$ ls -ltr
total 312
-rw-r--r--@ 1 cp staff 6515 Jan 9 17:03 systemlog.txt
-rw-r--r--@ 1 cp staff 10968 Jan 9 17:03 psaef.txt
-rw-r--r--@ 1 cp staff 3565 Jan 9 17:03 profile.txt
-rw-r--r--@ 1 cp staff 94 Jan 9 17:03 perfdata.txt
-rw-r--r--@ 1 cp staff 932 Jan 9 17:03 npcd.txt
-rw-r--r--@ 1 cp staff 92 Jan 9 17:03 nagios.txt
-rw-r--r--@ 1 cp staff 13255 Jan 9 17:03 memorybyprocess.txt
-rw-r--r--@ 1 cp staff 6259 Jan 9 17:03 mariadblog.txt
-rw-r--r--@ 1 cp staff 787 Jan 9 17:03 filesystem.txt
-rw-r--r--@ 1 cp staff 337 Jan 9 17:03 eventman.txt
-rw-r--r--@ 1 cp staff 31 Jan 9 17:03 database_host.txt
-rwxr-xr-x@ 1 cp staff 8296 Jan 9 17:03 config.inc.php
-rw-r--r--@ 1 cp staff 82 Jan 9 17:03 cmdsubsys.txt
-rw-r--r--@ 1 cp staff 12175 Jan 9 17:03 apacheerrors.txt
-rw-r--r--@ 1 cp staff 11357 Jan 9 17:03 top.txt
-rw-r--r--@ 1 cp staff 701 Jan 9 17:03 mrtg.tar.gz
-rw-r--r--@ 1 cp staff 650 Jan 9 17:03 ip_addr.txt
-rw-r--r--@ 1 cp staff 178 Jan 9 17:03 File_Counts.txt
-rw-r--r--@ 1 cp staff 31162 Jan 9 17:03 1484007496.tar.gz

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Mon Jan 09, 2017 8:06 pm
by rexconsulting
SELinux is disabled:

[@xi01 ~]$ sudo sestatus -v
SELinux status: disabled

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Tue Jan 10, 2017 9:25 am
by rkennedy
All of them - please PM over the zip. With the issue you're having we'll need to investigate what the possibilities are.

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Tue Jan 10, 2017 1:12 pm
by rexconsulting
OK Note the relation to ticket 2017010910000226 opened with XI official support pls. I will attach them to that ticket as I prefer not to share that with the world.
Does that work?
thanks
CP

Re: Freaky dissapearing /usr/local/nagios/var on Parallels V

Posted: Tue Jan 10, 2017 1:17 pm
by dwhitfield
Locking due to ticket.