Page 1 of 1

Preventative Maintenance

Posted: Tue Aug 11, 2015 4:11 pm
by toodaly
I’m working in a relatively large environment (~2000 hosts, ~7000 services from 40 remote Nagios XI sites). Every now and then on the Nagios XI server that receives passive results from the remote sites, the /usr/local/nagios/var/spool/checkresults fills up with files to the point that Nagios (at least the UI) crashes. From what I read on these forums, something got corrupted in one of the databases or a service stopped running. What would be the repercussions of creating a cron job that runs a script with the lines below every midnight on the server that collects passive data? Any other recommendations on how to do a sort of preventative maintenance?

service nagiosxi stop
service nagios stop
service ndo2db stop
service npcd stop
service mysqld stop

# Remove checkresults older than 1 hour
for NAME in `find /usr/local/nagios/var/spool/checkresults -maxdepth 1 –type f –mmin +60`;do rm –f $NAME; done

# Remove perfdata older than 1 hour
for NAME in `find /usr/local/nagios/var/spool/perfdata -maxdepth 1 –type f –mmin +60`;do rm –f $NAME; done

# Repair databases
/usr/local/nagiosxi/scripts/repairmysql.sh nagios
/usr/local/nagiosxi/scripts/repairmysql.sh mysql

service mysqld start
service npcd start
service ndo2db start
# Sleep for 10 seconds to ensure ndo2db is up
sleep 10
service nagios start
service nagiosxi stop

I'm using Nagios XI version 2012R2.9 on a RHEL 6 (64-bit) VM. I can't upgrade since this is the version that the program requirements were verified against. Another thread mentioned a cron job (/usr/local/nagiosxi/cron/dbmaint.php ), but it does not show up as a cron job in my installation. I only have sysstat.php, eventman.php, perfdataproc.php, and freedproc.php running. I do see dbmaint.php in /usr/local/nagiosxi/cron though.

Re: Preventative Maintenance

Posted: Wed Aug 12, 2015 1:35 pm
by jdalrymple
I think you've pretty well covered all the bases. Here is what I would add:

If it were my system I would like to know what is going wrong, and of course if it's non-transient you're just going to stop getting check results anyway with your suggested maintenance methods. We've certainly seen the symptoms you've described before, but it can be for any number of reasons, without more data it's difficult to troubleshoot. If whatever occurrence is causing the symptom in your case can be cured though - I would prefer to do a check on the file age then execute an event handler when things go wonky, that is of course if you can create a recipe to repair your specific (repeating) wonkiness.

dbmaint.php primary purpose (I'm fairly certain) is the script that prunes the ndo db. I won't lie, I have no knowledge of how that process was handled prior to dbmaint.php - maybe it wasn't.

Re: Preventative Maintenance

Posted: Wed Aug 12, 2015 5:20 pm
by toodaly
I would prefer to do a check on the file age then execute an event handler when things go wonky
What type of event handler did you have in mind? Should I save any logs (/var/log/messages)? /usr/local/nagios/var/nagios.log gets archived once per day.

Re: Preventative Maintenance

Posted: Thu Aug 13, 2015 12:04 pm
by jdalrymple
toodaly wrote:What type of event handler did you have in mind?
jdalrymple wrote:a recipe to repair your specific (repeating) wonkiness
Nagios log rotation takes care of itself. Let Linux deal with the stuff in /var/log - we generally don't care about much that lands in /var/log/messages unless you're having nrpe issues.