Page 1 of 3

Weird behavior after performing repair database script

Posted: Mon Mar 05, 2018 2:24 pm
by Berto
When attempting to log into NagiosXI yesterday, an error was received that stated the database was corrupted and to run the following:

/usr/local/nagiosxi/scripts/repair_databases.sh

That script was ran and once finished I was able to log in and also noticed that a previous issue that was being seen (data not being graphed) had then seemed to get fixed. Well today it has been noticed that /var was completely full and when looking into what filled up /var, it was noticed that /var/lib/mysql/nagiosxi/ had generated enough data in less than 24 hours to fill it up. It has also been noticed that when navigating to the hosts tab from Configure > CCM we'll receive a HTTP 500 error and when trying to apply changes to a service, we now receive the error "Backend login to the Core Config Manager failed.".

This happened after running that script but not sure if what is being seen is just a symptom of a much bigger issue that is being discovered.

Re: Weird behavior after performing repair database script

Posted: Mon Mar 05, 2018 3:27 pm
by npolovenko
Hello, @Berto. There's another script in /usr/local/nagiosxi/scripts/ folder that you can run:

Code: Select all

./reset_config_perms.sh
It resets all the config permissions. But in case it is actually a symptom of a much bigger issue, I'd like to check your system profile:
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file, upload it to a cloud storage of your choice and share a link with me via pm. After you do that please post something in this post to bring it up in the support queue.
Thank you.

Re: Weird behavior after performing repair database script

Posted: Tue Mar 06, 2018 10:27 am
by Berto
I have sent a PM to you npolovenko with the link to the profile.zip

Re: Weird behavior after performing repair database script

Posted: Tue Mar 06, 2018 3:43 pm
by npolovenko
@Berto, I did receive the file, but unfortunately, it appears to be corrupted. FTP server could be at fault. You could use a google drive instead to upload the profile and create a public download link.

Re: Weird behavior after performing repair database script

Posted: Fri Mar 09, 2018 9:40 am
by Berto
I've believe the corruption of the profile is happening when downloading, as I've tried different methods to get you the profile, but each time I test to make sure you'll be able to review the files, it says corrupted. I tried running the reset_config_perms.sh script and afterwards in the admin page I now see all the red items in the screenshot.

Re: Weird behavior after performing repair database script

Posted: Fri Mar 09, 2018 11:08 am
by scottwilkerson
This usually has to do with one of a few things but ultimately the crons are not running.

It could be the nagios user deactivated or expired

Code: Select all

chage -l nagios
Or permissions on the directory where the crons need to write their logs to

Code: Select all

ls -la /usr/local/nagios
Or a missing cron.d

Code: Select all

cat /etc/cron.d/nagiosxi

Re: Weird behavior after performing repair database script

Posted: Mon Mar 12, 2018 1:20 pm
by Berto
Here is the output for those commands. I also check the logs for cron and didn't see anything out of the ordinary. I attached the log.


# chage -l nagios
Last password change : Jul 13, 2016
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7


# ls -la /usr/local/nagios
total 36
drwxr-xr-x 9 root root 4096 Jan 6 2016 .
drwxr-xr-x. 16 root root 4096 Jan 6 2016 ..
drwxrwxr-x 2 nagios nagios 4096 Apr 19 2017 bin
drwsrwsr-x 7 apache nagios 4096 Mar 1 17:49 etc
drwxr-xr-x 2 root root 4096 Jan 6 2016 include
drwxrwsr-x 2 apache nagios 4096 Nov 29 18:09 libexec
drwxrwxr-x 2 nagios nagios 4096 Feb 12 2017 sbin
drwxrwxr-x 18 nagios nagios 4096 Feb 12 2017 share
drwxrwxr-x 6 nagios nagios 4096 Mar 12 14:10 var


# cat /etc/cron.d/nagiosxi
0 7 * * * root /root/scripts/autopostgresqlbackup > /dev/null 2>&1

* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php > /usr/local/nagiosxi/var/event_handler.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1
*/5 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1
01 * * * * nagios /usr/local/nagiosxi/cron/recurringdowntime.pl > /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
*/5 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php > /usr/local/nagiosxi/var/deadpool.log 2>&1

Re: Weird behavior after performing repair database script

Posted: Mon Mar 12, 2018 1:50 pm
by scottwilkerson
sorry, 3 more commands please

Code: Select all

df -h
ls -la /usr/local/nagiosxi/
ls -la /usr/local/nagiosxi/var

Re: Weird behavior after performing repair database script

Posted: Tue Mar 13, 2018 9:57 am
by Berto
Here is the additional info.

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-root 37G 17G 19G 49% /
tmpfs 4.9G 72K 4.9G 1% /dev/shm
/dev/sda1 239M 72M 154M 32% /boot
/dev/mapper/vg00-cv 9.6G 22M 9.0G 1% /cv
/dev/mapper/vg00-tmp 9.6G 109M 9.0G 2% /tmp
/dev/mapper/vg00-var 45G 43G 61M 100% /var


# ls -la /usr/local/nagiosxi/
total 76
drwxr-xr-x 10 nagios nagios 4096 Jan 6 2016 .
drwxr-xr-x. 16 root root 4096 Jan 6 2016 ..
drwxr-xr-x 2 nagios nagios 4096 Feb 12 2017 cron
drwxr-xr-x 3 nagios nagios 4096 Jan 6 2016 etc
drwxr-xr-x 19 nagios nagios 4096 Dec 13 10:43 html
drwxr-xr-x 3 nagios nagios 4096 Jan 6 2016 nom
drwxr-xr-x 2 nagios nagios 4096 Mar 12 10:43 scripts
drwsrwsr-x 2 nagios nagios 4096 Mar 12 12:00 tmp
drwxr-xr-x 2 nagios nagios 4096 Jan 15 11:15 tools
drwxr-xr-x 5 nagios nagios 36864 Mar 13 10:51 var


[root@lnsvr0370 ~]# ls -la /usr/local/nagiosxi/var
total 165120
drwxr-xr-x 5 nagios nagios 36864 Mar 13 10:51 .
drwxr-xr-x 10 nagios nagios 4096 Jan 6 2016 ..
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:52 cleaner.log
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:52 cmdsubsys.log
drwsrwsr-x 2 apache nagios 4096 Mar 8 18:20 components
-rw-r--r-- 1 nagios nagios 8 Mar 13 09:03 corelog.data
-rw-r--r-- 1 nagios nagios 24805 Mar 13 09:03 corelog.diff
-rw-r--r-- 1 nagios nagios 0 Mar 13 10:25 dbmaint.lock
-rw-r--r-- 1 nagios nagios 66 Mar 13 10:50 dbmaint.log
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:50 deadpool.log
-rw-r--r-- 1 nagios nagios 0 Mar 13 10:51 event_handler.lock
-rw-r--r-- 1 nagios nagios 72 Mar 13 10:52 event_handler.log
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:52 eventman.log
-rw-r--r-- 1 nagios nagios 1501 Mar 13 10:52 feedproc.log
-rw-r--r-- 1 nagios nagios 0 Mar 11 03:16 load_url.log
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:52 nom.log
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:52 perfdataproc.log
-rw-r--r-- 1 nagios nagios 401516 Mar 13 10:01 recurringdowntime.log
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:52 reportengine.log
drwxr-xr-x 2 nagios nagios 4096 Mar 8 17:06 subsys
-rw-r--r-- 1 nagios nagios 1797 Mar 13 10:52 sysstat.log
drwxr-xr-x 2 nagios nagios 4096 Jan 6 2016 upgrades
-rw-r--r-- 1 nagios nagios 12187 Feb 12 2017 xi-sys.cfg
-rw-r--r-- 1 nagios nagios 37 Feb 12 2017 xi-uuid
-rw-r--r-- 1 nagios nagios 196 Feb 12 2017 xiversion

Re: Weird behavior after performing repair database script

Posted: Tue Mar 13, 2018 10:19 am
by scottwilkerson
Can you go to Admin -> System Profile and PM myself or another staff member your profile.zip

Thanks