Preventative Maintenance

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
toodaly
Posts: 63
Joined: Wed Jun 19, 2013 3:39 pm

Preventative Maintenance

Post by toodaly »

I’m working in a relatively large environment (~2000 hosts, ~7000 services from 40 remote Nagios XI sites). Every now and then on the Nagios XI server that receives passive results from the remote sites, the /usr/local/nagios/var/spool/checkresults fills up with files to the point that Nagios (at least the UI) crashes. From what I read on these forums, something got corrupted in one of the databases or a service stopped running. What would be the repercussions of creating a cron job that runs a script with the lines below every midnight on the server that collects passive data? Any other recommendations on how to do a sort of preventative maintenance?

service nagiosxi stop
service nagios stop
service ndo2db stop
service npcd stop
service mysqld stop

# Remove checkresults older than 1 hour
for NAME in `find /usr/local/nagios/var/spool/checkresults -maxdepth 1 –type f –mmin +60`;do rm –f $NAME; done

# Remove perfdata older than 1 hour
for NAME in `find /usr/local/nagios/var/spool/perfdata -maxdepth 1 –type f –mmin +60`;do rm –f $NAME; done

# Repair databases
/usr/local/nagiosxi/scripts/repairmysql.sh nagios
/usr/local/nagiosxi/scripts/repairmysql.sh mysql

service mysqld start
service npcd start
service ndo2db start
# Sleep for 10 seconds to ensure ndo2db is up
sleep 10
service nagios start
service nagiosxi stop

I'm using Nagios XI version 2012R2.9 on a RHEL 6 (64-bit) VM. I can't upgrade since this is the version that the program requirements were verified against. Another thread mentioned a cron job (/usr/local/nagiosxi/cron/dbmaint.php ), but it does not show up as a cron job in my installation. I only have sysstat.php, eventman.php, perfdataproc.php, and freedproc.php running. I do see dbmaint.php in /usr/local/nagiosxi/cron though.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Preventative Maintenance

Post by jdalrymple »

I think you've pretty well covered all the bases. Here is what I would add:

If it were my system I would like to know what is going wrong, and of course if it's non-transient you're just going to stop getting check results anyway with your suggested maintenance methods. We've certainly seen the symptoms you've described before, but it can be for any number of reasons, without more data it's difficult to troubleshoot. If whatever occurrence is causing the symptom in your case can be cured though - I would prefer to do a check on the file age then execute an event handler when things go wonky, that is of course if you can create a recipe to repair your specific (repeating) wonkiness.

dbmaint.php primary purpose (I'm fairly certain) is the script that prunes the ndo db. I won't lie, I have no knowledge of how that process was handled prior to dbmaint.php - maybe it wasn't.
toodaly
Posts: 63
Joined: Wed Jun 19, 2013 3:39 pm

Re: Preventative Maintenance

Post by toodaly »

I would prefer to do a check on the file age then execute an event handler when things go wonky
What type of event handler did you have in mind? Should I save any logs (/var/log/messages)? /usr/local/nagios/var/nagios.log gets archived once per day.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Preventative Maintenance

Post by jdalrymple »

toodaly wrote:What type of event handler did you have in mind?
jdalrymple wrote:a recipe to repair your specific (repeating) wonkiness
Nagios log rotation takes care of itself. Let Linux deal with the stuff in /var/log - we generally don't care about much that lands in /var/log/messages unless you're having nrpe issues.
Locked