Page 1 of 2

XI logs taking up too much space

Posted: Thu Aug 16, 2018 12:51 am
by vazudevan
The nom.log and sysstat.log logs are taking up too huge a space and there appears to be no log rotation or purging.

Code: Select all

[root@nagios var]# ls -lh nom.log sysstat.log
-rw-r--r-- 1 nagios nagios 22G Aug 16 01:47 nom.log
-rw-r--r-- 1 nagios nagios 26G Aug 16 01:47 sysstat.log
Also, we are seeing a lot of these errors in nom.log

Code: Select all

chown: changing ownership of ‘/var/nagiosramdisk/spool/checkresults/XXXXXX’: Operation not permitted
Followed by

Code: Select all

Password: su: Authentication failure
ERROR: Could not create or update '/usr/local/nagios/var/nagios.configtest'
Config test failed.  Checkpoint aborted.
occurring every minute or so.
1. Why are these errors?
2. How to correct them?
3. How to tame these logs ?

Re: XI logs taking up too much space

Posted: Thu Aug 16, 2018 11:24 am
by cdienger
It looks like permission or write problems. First, feel free to remove the logs with "rm /usr/local/nagiosxi/var/sysstat.log" and "rm /usr/local/nagiosxi/var/nom.log".

Then check that /var/nagiosramdisk has rw permissions with "mount". It should look something like:

tmpfs on /var/nagiosramdisk type tmpfs (rw,size=100m)

/etc/sudoers should also contain entries like the following:

User_Alias NAGIOSXI=nagios
User_Alias NAGIOSXIWEB=apache
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios status
NAGIOSXI ALL = NOPASSWD:/etc/init.d/nagios checkconfig
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/ndo2db status
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd start
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd stop
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd restart
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd reload
NAGIOSXI ALL = NOPASSWD:/etc/init.d/npcd status
NAGIOSXI ALL = NOPASSWD:/usr/bin/php /usr/local/nagiosxi/html/includes/components/autodiscovery/scripts/autodiscover_new.php *
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/html/includes/components/profile/getprofile.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/upgrade_to_latest.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/change_timezone.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/reset_config_perms.sh
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_ssl_config.sh *
NAGIOSXI ALL = NOPASSWD:/usr/local/nagiosxi/scripts/backup_xi.sh *
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/messages
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/httpd/error_log
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/tail -100 /var/log/mysqld.log
NAGIOSXIWEB ALL = NOPASSWD:/usr/bin/php /usr/local/nagiosxi/html/includes/components/autodiscovery/scripts/autodiscover_new.php *
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/html/includes/components/profile/getprofile.sh
NAGIOSXIWEB ALL = NOPASSWD:/etc/init.d/snmptt restart
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/scripts/repair_databases.sh
NAGIOSXIWEB ALL = NOPASSWD:/usr/local/nagiosxi/scripts/manage_services.sh *


Finally, check if the naigos account has expired with "chage -l nagios" and update it with "chage -I -1 -m 0 -M 99999 -E -1 nagios" if needed.

Re: XI logs taking up too much space

Posted: Tue Aug 21, 2018 4:28 am
by vazudevan
Thank you for the details, Checked on these and verified the permissions. Sudoers entry are exactly the same. They all appear to be fine. With the Ramdisk we had the nom and sys stat logs getting in GBs very quickly.

Reverted back from Ramdisk to local disks and things seems to have settled down now. Those errors are not to be seen in nom.log anymore. There is only one error that is repeating on every cycle of nom cron.

Code: Select all

Password: su: Authentication failure
ERROR: Could not create or update '/usr/local/nagios/var/nagios.configtest'
Config test failed.  Checkpoint aborted.

Re: XI logs taking up too much space

Posted: Tue Aug 21, 2018 1:48 pm
by cdienger
The block of code that is causing that error is in /etc/init.d/nagios:

Code: Select all

        if ! su $NagiosUser -c "touch $NagiosCfgtestFile"; then
                echo "ERROR: Could not create or update '$NagiosCfgtestFile'"
                exit 8
        fi
Do you get anything interesting if you run:

su nagios -c "touch /usr/local/nagiosxi/var/nom.log"

?

Re: XI logs taking up too much space

Posted: Fri Aug 24, 2018 10:10 am
by vazudevan
it did not make a difference.

As stated this error is seen in nom.log and is related exactly every minute. its not coming from the init script, I believe its instead from the cron

Code: Select all

* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/nom.php >> /usr/local/nagiosxi/var/nom.log 2>&1

Re: XI logs taking up too much space

Posted: Fri Aug 24, 2018 11:41 am
by cdienger
Sorry - typo on my end. The command should have been:

su nagios -c "touch /usr/local/nagios/var/nagios.configtest"

This is the command that is automatically running and producing the error. By running it manually at the command line I'm not expecting it to "do" anything really besides hopefully giving us a clue as to why it isn't working.

Re: XI logs taking up too much space

Posted: Fri Aug 24, 2018 11:53 am
by lmiltchev
You should see the following error message:
Config test failed. Checkpoint aborted.
when configuration verification fails. What is the output that you see after running the following command from the CLI?

Code: Select all

/usr/local/nagiosxi/scripts/nom_create_nagioscore_checkpoint_cond.sh

Re: XI logs taking up too much space

Posted: Fri Aug 24, 2018 1:56 pm
by vazudevan

Code: Select all

/usr/local/nagiosxi/scripts/nom_create_nagioscore_checkpoint_cond.sh
OK.
RESETTING PERMS
/usr/local/nagiosxi/nom/checkpoints/nagioscore ~
tar: Removing leading `/' from member names
~
Config test passed.  Checkpoint created.

Re: XI logs taking up too much space

Posted: Fri Aug 24, 2018 2:02 pm
by scottwilkerson
Please show the output of the following

Code: Select all

ls -al /usr/local/nagios/var/nagios.configtest
If it is owned by something other than nagios run

Code: Select all

chown nagios /usr/local/nagios/var/nagios.configtest

Re: XI logs taking up too much space

Posted: Fri Aug 24, 2018 2:06 pm
by vazudevan

Code: Select all

[root@phlprcnagnxi001 etc]# ls -l /usr/local/nagios | grep var
drwxr-xr-x  6 nagios nagios 4096 Aug 24 15:04 var
[root@phlprcnagnxi001 etc]# ls -al /usr/local/nagios/var/nagios.configtest
ls: cannot access /usr/local/nagios/var/nagios.configtest: No such file or directory
[root@phlprcnagnxi001 etc]#