Very high CPU and RAM utilization

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
oneida
Posts: 15
Joined: Tue Sep 25, 2012 1:11 pm
Location: New York

Very high CPU and RAM utilization

Post by oneida »

Hi there,

I'm currently in the process of building a NagiosXI server from the ground up (VM - CentOS 6.3). I'm running the latest stable release of NagiosXI and the OS is up to date. In this project, I've been consolidating hosts and services that are currently being monitored on two separate Nagios core servers. I have added about 350 hosts and 600 services to NagiosXI and am noticing very significant utilization of CPU and Memory.

Memory:
Of the 2GB supplied for the server, it is currently eating up over 1700MB of this and is swapping.

CPU:
I have been receiving very frequent CPU utilization alerts and when I jump on the server to investigate, I noticed that several processes are intermittently hogging up a great amount of resources. Any help in troubleshooting what's going on?

Our setup is completely stock -- created straight from the VM.

Code: Select all

top - 08:50:46 up 1 day,  1:16,  2 users,  load average: 7.89, 6.00, 5.87
Tasks: 158 total,  16 running, 142 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.8%us,  3.8%sy,  0.0%ni,  0.0%id, 80.8%wa,  0.0%hi,  9.6%si,  0.0%st
Mem:   1918812k total,  1650208k used,   268604k free,    61448k buffers
Swap:   262136k total,    31452k used,   230684k free,   772084k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
32346 nagios    20   0 28760 3412  388 R 77.6  0.2   0:02.94 nagios
   16 root      20   0     0    0    0 R  6.3  0.0  66:58.55 kblockd/0
32367 nagios    20   0  172m 5808 2804 R  3.2  0.3   0:00.13 process_perfdat
  417 root      20   0     0    0    0 R  1.4  0.0  23:37.97 jbd2/dm-1-8
32026 postgres  20   0  210m 5180 3696 R  1.4  0.3   0:00.08 postmaster
32011 postgres  20   0  210m 5164 3668 R  0.6  0.3   0:00.88 postmaster
 5928 apache    20   0  435m  26m 4552 R  0.3  1.4  14:00.62 httpd
 6156 apache    20   0  436m  27m 4532 R  0.3  1.4  13:50.23 httpd
31474 nagios    20   0 28764 3968  952 R  0.3  0.2   0:02.40 nagios
32329 root      20   0 15032 1276  928 R  0.3  0.1   0:00.03 top
    1 root      20   0 19360 1328 1040 S  0.0  0.1   0:02.57 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    4 root      20   0     0    0    0 R  0.0  0.0   0:09.11 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    6 root      RT   0     0    0    0 S  0.0  0.0  26:16.96 watchdog/0
    7 root      20   0     0    0    0 S  0.0  0.0   1:42.93 events/0
    8 root      20   0     0    0    0 S  0.0  0.0   0:00.00 cgroup
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khelper
   10 root      20   0     0    0    0 S  0.0  0.0   0:00.00 netns
   11 root      20   0     0    0    0 S  0.0  0.0   0:00.00 async/mgr
   12 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pm
High CPU processes:
mysqld
postmaster
kblockd (>16%)
php
httpd (>40%)
flush-253:1
jbd2
process_perfdat (>13%)
watchdog (>17%)
vmtoolsd (>58%)
nagios (>77%)

Code: Select all

Linux 2.6.32-279.14.1.el6.x86_64 (RST-NAGIOSXI-1)       11/17/2012      _x86_64_                                                                                                                                                             (1 CPU)

12:00:02 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:04 AM     all     50.68      0.00      7.37      5.32      0.00     36.63
12:20:02 AM     all     40.41      0.00      5.93      2.04      0.00     51.62
12:30:01 AM     all     39.75      0.00      5.98      3.27      0.00     51.00
12:40:04 AM     all     38.13      0.00      5.74      1.44      0.00     54.69
12:50:06 AM     all     41.65      0.00      5.99      2.99      0.00     49.37
01:00:01 AM     all     40.68      0.00      6.15      1.47      0.00     51.70
01:10:01 AM     all     39.11      0.00      5.62      3.79      0.00     51.47
01:20:01 AM     all     38.88      0.00      5.86      3.12      0.00     52.15
01:30:04 AM     all     43.41      0.00      6.33      3.08      0.00     47.18
01:40:02 AM     all     40.44      0.00      5.95      1.81      0.00     51.80
01:50:03 AM     all     40.65      0.00      6.24      3.60      0.00     49.52
02:00:02 AM     all     43.88      0.00      6.47      1.39      0.00     48.26
02:10:01 AM     all     41.08      0.00      6.20      1.97      0.00     50.74
02:20:01 AM     all     36.91      0.00      5.52      1.05      0.00     56.52
02:30:01 AM     all     35.73      0.00      5.41      0.89      0.00     57.97
02:40:01 AM     all     35.64      0.00      5.41      1.14      0.00     57.81
02:50:01 AM     all     36.89      0.00      5.81      1.28      0.00     56.02
03:00:01 AM     all     40.58      0.00      6.02      1.99      0.00     51.41
03:10:03 AM     all     40.91      0.00      5.98      1.25      0.00     51.87
03:20:02 AM     all     39.63      0.00      5.91      1.67      0.00     52.79
03:30:01 AM     all     41.21      0.06      6.17     15.58      0.00     36.99
03:40:02 AM     all     50.04      0.08      7.85     34.98      0.00      7.04

03:40:02 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
03:50:01 AM     all     43.64      0.00      6.61      2.35      0.00     47.41
04:00:01 AM     all     41.31      0.00      6.41      1.90      0.00     50.38
04:10:01 AM     all     36.92      0.00      5.76      3.00      0.00     54.32
04:20:01 AM     all     38.18      0.00      5.80      3.45      0.00     52.57
04:30:01 AM     all     35.51      0.00      5.43      2.38      0.00     56.68
04:40:03 AM     all     31.15      0.00      5.02      1.49      0.00     62.34
04:50:01 AM     all     32.41      0.00      5.29      1.45      0.00     60.85
05:00:01 AM     all     30.06      0.00      4.79      0.42      0.00     64.72
05:10:01 AM     all     29.38      0.00      4.95      0.89      0.00     64.78
05:20:01 AM     all     27.83      0.00      4.44      0.29      0.00     67.44
05:30:01 AM     all     28.27      0.00      4.45      0.42      0.00     66.86
05:40:01 AM     all     27.81      0.00      4.51      0.43      0.00     67.25
05:50:01 AM     all     27.72      0.00      4.59      0.49      0.00     67.21
06:00:01 AM     all     27.91      0.00      4.46      0.39      0.00     67.24
06:10:02 AM     all     27.77      0.00      4.72      0.89      0.00     66.61
06:20:01 AM     all     27.87      0.00      4.57      0.73      0.00     66.83
06:30:01 AM     all     27.41      0.00      4.34      0.37      0.00     67.88
06:40:01 AM     all     27.70      0.00      4.51      0.39      0.00     67.41
06:50:01 AM     all     27.77      0.00      4.60      0.54      0.00     67.08
07:00:01 AM     all     27.37      0.00      4.30      0.33      0.00     68.00
07:10:01 AM     all     30.25      0.00      4.87      2.12      0.00     62.76
07:20:01 AM     all     29.09      0.00      4.67      0.58      0.00     65.66

07:20:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
07:30:01 AM     all     31.21      0.00      5.09      0.82      0.00     62.87
07:40:01 AM     all     32.87      0.00      5.39      0.79      0.00     60.94
07:50:01 AM     all     34.48      0.00      5.52      1.70      0.00     58.30
08:00:01 AM     all     32.41      0.00      5.27      1.71      0.00     60.61
08:10:01 AM     all     34.35      0.00      5.50      2.20      0.00     57.94
08:20:01 AM     all     37.07      0.00      5.90      1.33      0.00     55.70
08:30:02 AM     all     42.39      0.00      6.69      6.71      0.00     44.21
08:40:01 AM     all     44.53      0.00      6.73      3.09      0.00     45.65
08:50:02 AM     all     54.54      0.00      7.09      6.01      0.00     32.36
09:00:01 AM     all     34.86      0.00      5.47      1.57      0.00     58.10
09:10:01 AM     all     26.92      0.00      4.42      0.32      0.00     68.34
Average:        all     34.92      0.00      5.42      2.26      0.00     57.39

Code: Select all

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
10  0  31452 205068  64320 826280    0    0     7   555  496  234 32  5 61  2  0
Linux 2.6.32-279.14.1.el6.x86_64 (RST-NAGIOSXI-1)       11/17/2012      _x86_64_        (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          31.98    0.00    5.00    2.12    0.00   60.90

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              49.94        13.69      1110.86     964440   78235346
dm-0              0.18         0.47         0.98      33104      68696
dm-1            139.62        12.89      1109.89     908026   78166648
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very high CPU and RAM utilization

Post by scottwilkerson »

Your first TOP post shows 80.8%wa.

That is pretty severe IO wait time.

Not sure if there is another system utilizing this same disk, but you could benefit from a RAM disk
http://assets.nagios.com/downloads/nagi ... p#boosting
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
oneida
Posts: 15
Joined: Tue Sep 25, 2012 1:11 pm
Location: New York

Re: Very high CPU and RAM utilization

Post by oneida »

Thank you very much for your assistance. I have implemented a ram disk as suggested for objects.cache, status.dat, tmp path, check results, and performance data.

Ram utilization seemed to have improved (at least from the results I'm looking at now) -- it's only using about 1GB at the moment. Though CPU and IO wait tend to spike up quite a bit still:

CPU utilization seems to be spiking up the most for the httpd and postmaster processes every few seconds:

Code: Select all

top - 15:02:40 up 34 min,  3 users,  load average: 1.26, 1.10, 0.75
Tasks: 170 total,   1 running, 169 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.7%us,  0.7%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.3%us,  0.7%sy,  0.0%ni, 98.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3921112k total,   957368k used,  2963744k free,    35120k buffers
Swap:  1048568k total,        0k used,  1048568k free,   210244k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                          
 1426 root      20   0     0    0    0 S  0.7  0.0   0:15.97 flush-253:1                                                                                                                                                                      
  439 root      20   0     0    0    0 S  0.3  0.0   0:02.66 jbd2/dm-1-8                                                                                                                                                                      
 1428 root      20   0  165m 3768 2908 S  0.3  0.1   0:04.59 vmtoolsd                                                                                                                                                                         
 1861 mysql     20   0 1152m  50m 4596 S  0.3  1.3   0:43.02 mysqld                                                                                                                                                                           
 1908 postgres  20   0  208m 1276  564 S  0.3  0.0   0:00.68 postmaster                                                                                                                                                                       
 2118 nagios    20   0 32580 5688  960 S  0.3  0.1   0:11.95 nagios                                                                                                                                                                           
12840 root      20   0 15032 1308  932 R  0.3  0.0   0:00.49 top                                                                                                                                                                              
    1 root      20   0 19360 1552 1232 S  0.0  0.0   0:01.60 init                                                                                                                                                                             
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd                                                                                                                                                                         
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.12 migration/0                                                                                                                                                                      
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.04 ksoftirqd/0                                                                                                                                                                      
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0         
As for IO wait, it also spikes up to pretty high numbers intermittently. It is fairly infrequent, though when it happens, it seems to be tied to the process_perfdat process. It seems that pushing off writes to this to the ram disk may have helped a bit though.

Any ideas/suggestions for what's going on with the CPU spikes?
Last edited by oneida on Tue Nov 20, 2012 3:02 pm, edited 1 time in total.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very high CPU and RAM utilization

Post by scottwilkerson »

This definitely looks better.

Another thing you could do to reduce the load from the httpd process is install an opcode cache, which will reduce compiling every pageload.

To try this run

Code: Select all

yum install php-pecl-apc -y
service httpd restart
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
oneida
Posts: 15
Joined: Tue Sep 25, 2012 1:11 pm
Location: New York

Re: Very high CPU and RAM utilization

Post by oneida »

Thank you for the assistance. In response to CPU, RAM, and Disk capacity, and disk IO concerns, I have:

- Created a RAM disk
- Installed rrdcache
- Installed opcode cache
- Disabled logging in PHP and HTTP configs (temp., timezone errors filled disk utilization to 100%), deleted large log files
- Properly set timezone in php config
- Increased memory to 4GB
- Increased swap to 1GB
- Increased disk capacity to 100GB
- Added an additional CPU core

While IO occasionally jumps up pretty high and CPU will hit around 50%, this is still a big improvement over what I was seeing before. I guess with this many host/service checks and constant database read/writes, high IO and CPU utilization is to be expected.

Code: Select all

top - 15:02:40 up 34 min,  3 users,  load average: 1.26, 1.10, 0.75
Tasks: 170 total,   1 running, 169 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.7%us,  0.7%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.3%us,  0.7%sy,  0.0%ni, 98.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3921112k total,   957368k used,  2963744k free,    35120k buffers
Swap:  1048568k total,        0k used,  1048568k free,   210244k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                          
 1426 root      20   0     0    0    0 S  0.7  0.0   0:15.97 flush-253:1                                                                                                                                                                      
  439 root      20   0     0    0    0 S  0.3  0.0   0:02.66 jbd2/dm-1-8                                                                                                                                                                      
 1428 root      20   0  165m 3768 2908 S  0.3  0.1   0:04.59 vmtoolsd                                                                                                                                                                         
 1861 mysql     20   0 1152m  50m 4596 S  0.3  1.3   0:43.02 mysqld                                                                                                                                                                           
 1908 postgres  20   0  208m 1276  564 S  0.3  0.0   0:00.68 postmaster                                                                                                                                                                       
 2118 nagios    20   0 32580 5688  960 S  0.3  0.1   0:11.95 nagios                                                                                                                                                                           
12840 root      20   0 15032 1308  932 R  0.3  0.0   0:00.49 top                                                                                                                                                                              
    1 root      20   0 19360 1552 1232 S  0.0  0.0   0:01.60 init                                                                                                                                                                             
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd                                                                                                                                                                         
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.12 migration/0                                                                                                                                                                      
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.04 ksoftirqd/0                                                                                                                                                                      
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0         
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Very high CPU and RAM utilization

Post by slansing »

Looks good, you will continue to see a bump up in load when the check results are reaped and processed by Nagios, this is a normal thing.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Very high CPU and RAM utilization

Post by mguthrie »

I would also suggest a manual vacuum on postgresql, if the table structure starts getting cluttered it can eat up a lot of CPU:
http://support.nagios.com/wiki/index.ph ... .22_in_log
Locked