On one of our customer's boxes, the load spikes about every 7 hours. The nagios instance has 675 services and 203 hosts. I've grabbed some logs during the spikes and it appears the checks are executing with the same amount of time during the spikes as they are when the system isn't spiking. I also grabbed the top processes during the spike and they are https and php but they are same top processes when the system isn't spiking.
top - 14:48:01 up 115 days, 22:19, 5 users, load average: 9.15, 5.65, 4.52
Tasks: 225 total, 1 running, 222 sleeping, 2 stopped, 0 zombie
Cpu(s): 4.9%us, 1.0%sy, 0.0%ni, 94.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 12198496k total, 11412836k used, 785660k free, 424584k buffers
Swap: 15727608k total, 35768k used, 15691840k free, 8165896k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
616 nagios 20 0 224m 28m 7872 S 26.9 0.2 0:00.15 php
16464 apache 20 0 435m 25m 5216 S 25.0 0.2 3:55.87 httpd
620 nagios 20 0 217m 22m 8184 S 17.3 0.2 0:00.11 php
626 nagios 20 0 217m 21m 7864 S 17.3 0.2 0:00.10 php
611 nagios 20 0 217m 22m 7820 S 15.4 0.2 0:00.10 php
618 nagios 20 0 217m 21m 7816 S 15.4 0.2 0:00.10 php
2013 mysql 20 0 2194m 29m 4240 S 1.9 0.2 1197:26 mysqld
1 root 20 0 19356 1276 1060 S 0.0 0.0 0:54.41 init
I know there are some nagiosxi cron jobs occurring, but I'm not sure whether these are causing the spike or not. Have you seen this before? Do you have any ideas what might be causing the spikes?
Thanks,
Load on nagios box spikes every ~7 hours.
Load on nagios box spikes every ~7 hours.
You do not have the required permissions to view the files attached to this post.
Re: Load on nagios box spikes every ~7 hours.
Are there any scheduled reports around the same time?
Lets check to make sure the nagiosxi cron is default:
Lets check to make sure the nagiosxi cron is default:
Code: Select all
cat /etc/cron.d/nagiosxiFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Load on nagios box spikes every ~7 hours.
No scheduled reports. They don't seem to happen at the same time every day.
Is it theoretically possible for all checks to be scheduled at the same time every so often if there are different check_intervals in use? The data I gathered seemed to show the checks weren't all scheduled at the same time, so it didn't appear to a scheduling issue. Just throwing that out there.
Is it theoretically possible for all checks to be scheduled at the same time every so often if there are different check_intervals in use? The data I gathered seemed to show the checks weren't all scheduled at the same time, so it didn't appear to a scheduling issue. Just throwing that out there.
Code: Select all
# /etc/cron.d/nagiosxi: crontab fragment for nagiosxi
# Backup MySQL & PostgreSQL Databases
0 7 * * * root /root/scripts/automysqlbackup
0 8 * * * root /root/scripts/autopostgresqlbackup
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1
*/5 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1
* * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1
01 * * * * nagios /usr/local/nagiosxi/cron/recurringdowntime.pl > /usr/local/nagiosxi/var/recurringdowntime.log 2>&1
*/5 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php > /usr/local/nagiosxi/var/deadpool.log 2>&1
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Load on nagios box spikes every ~7 hours.
Are they perhaps applying configuration around this time, or creating backups? It is theoretically possible that scheduling conflicts could happen, but not very probably, are you able to correlate this theory with the nagios logs/system logs? I have seen this happen when a client removed the retention.dat file and restarted nagios, thus causing all checks to reset.
Re: Load on nagios box spikes every ~7 hours.
I've found that the "Current Load" service didn't start alerting until more services added. The spikes were still happening but they weren't getting as high. Adding more services seems to aggravate it.
Configuration isn't being applied because the Nagios process is the same for weeks while it is happening. I did apply configuration today and saw that the spike happened shortly after it.
I gathered logs by running a script that would grab the execution time, last run time and the next run time of every service from mk-livestatus every 3 minutes. The execution times were the same as when the system wasn't busy. And the scheduled times seem to be pretty sequential, leading me to believe that they weren't all scheduled at the same time.
Looking at another system (124 services and 43 hosts), I see the same spike behavior happening. Maybe this is pretty normal?
Configuration isn't being applied because the Nagios process is the same for weeks while it is happening. I did apply configuration today and saw that the spike happened shortly after it.
I gathered logs by running a script that would grab the execution time, last run time and the next run time of every service from mk-livestatus every 3 minutes. The execution times were the same as when the system wasn't busy. And the scheduled times seem to be pretty sequential, leading me to believe that they weren't all scheduled at the same time.
Looking at another system (124 services and 43 hosts), I see the same spike behavior happening. Maybe this is pretty normal?
You do not have the required permissions to view the files attached to this post.
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Load on nagios box spikes every ~7 hours.
I would say that yes, it is normal, now, if this is impacting the system in a negative way then we should figure out how to lessen this, or disperse it more evenly. This spike is generally caused by backups, and things such as the nom snapshot cron which actually creates that backup. I observe this on two of my internal test boxes, and have noticed it from as far back as 2 years ago or so.
Re: Load on nagios box spikes every ~7 hours.
Ok thanks for your help. In this case I'm guessing it's not the nom because the interval is every 24 hours. There must be another one in there doing it. Maybe, dbmaint's events and commands as they are around 8 hours. I'll go back to the customer with this information.
Thanks again for your time.
Thanks again for your time.
Re: Load on nagios box spikes every ~7 hours.
OK, niebais. Let us know if you hear anything back from the customer.
Be sure to check out our Knowledgebase for helpful articles and solutions!