Page 1 of 5

System is slow, CPU usage skyhigh

Posted: Thu Jan 13, 2011 8:53 am
by Symfoni
We have a CentOS installation (version 5.5, 32-bit), with Nagios XI (version 2009R1.3G) installed.
The machine it is running on has a dual core Intel Xeon 3065 at 2.33GHz, and 4GB of RAM.
I installed nagiosxi as per the step-by-step guide, and had no errors that weren't caused by typos and such.

Currently, there are 71 hosts with 644 services being monitored, including localhost.
Most, if not all, checks "came in the box" and are not homegrown.

'uptime' gives "load average: 28.06, 27.04, 25.53".

'free -m' shows a total of 3927, used 3008, and free 919. And that is after having changed /proc/sys/vm/drop_caches from '0' to '3' to free up unused pagecache, dentries and inode caches.

'top' seems to name the command 'postmaster' by user 'postgres' as the one most to blame. Counting off of one screenshot taken a minute ago, 'postmaster' has 12 of the 16 most cpu-using processes, 3 of them being listed with a little more than 10% of cpu, 3 with a little above 15% and 1 at above 20%. From what i have seen, the most-cpu-using postmaster process has at times been aroung 40%, though it could have gotten higher when i haven't been looking. Although the percentages and number of postmaster processes ranking high does vary some, there are always atleast 3 postmaster processes in the top 5 of processes with the highest cpu%, and several times i've seen postmaster having the 8-10 most cpu% consuming processes.
That being said, 'mysqld' by user mysql and 'httpd' by user apache do also eat up a fair bit of cpu %.

The web console is painfully slow at times, mostly just being very slow, and applying changes can takes several minutes.
SSH-ing into the box can be quite sluggish, but is not quite a nightmare.

I've had a look at the FAQ at http://support.nagios.com/wiki/index.php/Nagios_XI:FAQs and can't seem to find anything quite fitting our problems. I've also had a look at http://nagios.sourceforge.net/docs/3_0/tuning.html, but couldn't find anything there related to postgres tuning.

Is there an easily fixable reason for postgres using up so much cpu?

Re: System is slow, CPU usage skyhigh

Posted: Thu Jan 13, 2011 10:28 am
by mguthrie
That definitely does seem a bit unusual. Usually on a much larger installation, mysql will be the #1 CPU grabber, followed by apache. The machine that you're using should be plenty for the load that you have. Does it change anything to restart postgres or the server itself?

Code: Select all

service postgresql restart

Re: System is slow, CPU usage skyhigh

Posted: Thu Jan 13, 2011 11:14 am
by Symfoni
Have tried rebooting the machine before with no improvement.

As you suggested, tried restarting the postgresql service, but the postmaster processes are again using up huge amounts of cpu%.
Rebooting didn't help this time either.

I could be wrong, but looking at the output of 'top', it seems like httpd and mysqld have climbed slightly higher in usage of cpu%, but they're still nowhere near the amounts the postmaster processes are using.

Re: System is slow, CPU usage skyhigh

Posted: Thu Jan 13, 2011 11:43 am
by tonyyarusso
Looking for more detailed info, but you could try this in the meantime:
Stop all of the following daemons:
  • nagiosxi
  • npcd
  • ndo2db
  • nagios
  • mysqld
  • postgresql
  • httpd
Start up postgresql again, but not the others. Take a look at the CPU usage. Is it high still? Wait ten or fifteen minutes and look again. Is it the same?

Re: System is slow, CPU usage skyhigh

Posted: Thu Jan 13, 2011 12:31 pm
by Symfoni
Stopping the nagiosxi daemon gave no output, and stopping the npcd daemon gave "NPCD was not running." The others gave output saying the daemons had been stopped, and i couldn't see anything with those names or similar in 'top' anymore.

Moments after starting the postgresql daemon up again, these lines appeared in my 'watch "ps ax|grep post"':
2646 ? S 0:00 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
2648 ? S 0:00 postgres: logger process
2650 ? S 0:00 postgres: writer process
2651 ? S 0:00 postgres: stats buffer process
2652 ? S 0:00 postgres: stats collector process

Those 5 processes didn't budge over the course of about 30 minutes, though i did notice a glimpse of a process very similar to the "/usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data" one before it went away again. The only difference i could see was that it had "-p 5434" instead of "-p 5432".

I also had 'top' running in another SSH-session, and postmaster was nowhere to be seen except for once when it popped up at the top of the list, but this time using less that 1 % of cpu, barely beating 'top' itself.

Re: System is slow, CPU usage skyhigh

Posted: Thu Jan 13, 2011 1:47 pm
by tonyyarusso
Okay, that's a good start. Now, try starting things one by one, in order, with a few minutes between each in which you again monitor 'top' to see if anything interesting happens.
  • httpd
  • mysqld
  • nagios
  • ndo2db
  • npcd
  • nagiosxi
Did anything significantly catch your eye in that process? If so, on which step(s)?

If nothing did, then we may need to look into some scheduled maintenance task configuration for you.

Re: System is slow, CPU usage skyhigh

Posted: Fri Jan 14, 2011 8:11 am
by Symfoni
I stopped the services in the list from your first post in this topic, in the order listed, except for postgresql which i didn't stop at all.
The number of postmaster processes didn't change much when i went through killing the first 5 servoces, though the number of select/update operations shown in my 'watch "ps ax|grep post"' did go down to almost zero. The biggest change came when i killed httpd. Postmaster completely disappeared from 'top' and the number of processes shown with 'watch "ps ax|grep post"' went down to the basic 5 lines, the same ones as i listed in my last post.

When starting the services up again, starting httpd had 'top' showing anything from 2 to 10 of the top cpu% using processes being httpd using anything from 1.0% to 9.0% cpu.
The 'watch "ps ax|grep post"' got ten new lines of output, all of them being similar to this, except for PID and, i assume, port numbers, being different:
16598 ? S 0:00 postgres: nagiosxi nagiosxi 127.0.0.1(38589) idle

Starting mysqld sent a few postmaster processes to the 'top' output. Postmaster started immediately hogging the top places, a random screenshot i took had postmaster in the top 5 places with the processes using anything from 11.8% to 45.45% of cpu, but there could be anything from 2 to 10 postmaster processes in the top places.
It also had the effect of giving a few more lines of output to 'watch "ps ax|grep post"', quite similar to the one i pasted in above, but actually performing work and using cpu time:
18526 ? S 0:21 postgres: nagiosxi nagiosxi 127.0.0.1(48751) UPDATE
18535 ? R 0:22 postgres: nagiosxi nagiosxi 127.0.0.1(48753) SELECT

After starting ndo2db i couldn't see much of a difference in the 'top' output (it does seem like there appeared a few more postmaster processes, but that could be just a coincidence), but 'watch "ps ax|grep post"' did show some more lines like the two i pasted in the previous paragraph.
Starting npcd also seems to have pushed up a couple of postmaster processes, but there was no huge difference here either.
Same when nagiosxi was started.

To me it seems that starting httpd and especially mysqld had the most significant results.

Perhaps a little off-topic, but a few days ago, while googling for ideas what might be causing the postmaster cpu usage, i came across a mailinglist post that mentioned there being some problems between certain versions of php and postgresql that could result in extreme cpu usage. Could this be the same problem? I can't remember the specifics of the post, but can find it again if it'd help.

Re: System is slow, CPU usage skyhigh

Posted: Fri Jan 14, 2011 10:21 am
by mguthrie
Go ahead and send us that link if you come across it. Those things are always helpful to know about.

How many users would you say actively use the XI interface? The nagios monitoring engine should still be able to run without httpd if you wanted to do a performance comparison. Most of the tables related to NagiosXI are used more for the web interface than anything else. Postgres stores a lot of the session, user data, dashlets, etc for the web interface. MySQL is the DB used for ndoutils and the Core Config Manager (ndoutils being the bigger performance grabber).

Re: System is slow, CPU usage skyhigh

Posted: Mon Jan 17, 2011 8:10 am
by Symfoni
The page with the cpu problem with php-postgresql was from 2001, not 2010 as i'd previously read it, so it's too outdated to be of any help. Sorry about that!

There's only a couple of users logged in, but they've got a few firefox windows open. We have 6 windows open on a network-status-display in the office (no automatic refresh is on, only the refresh from the nagiosxi web UI itself), and then maybe two or three people with a couple of pages open.
In total, about 10-15 pages from the nagiosxi web UI at most.

We're still in the process of adding all our hosts to be monitored by the nagiosxi installation, and then there's the tweaking of what services to monitor and which to notify on and such, not to mention using it for actually seeing what's going on in our network, so we use the web UI most days. Taking the web UI down permanently would not be a viable solution for us.

As far as i can tell, the monitoring and reporting works fine; we get notified when things go down or reach a critical limit.
The web UI, however, is much too slow. That is the only big problem we've encountered so far.

Re: System is slow, CPU usage skyhigh

Posted: Mon Jan 17, 2011 10:43 am
by mguthrie
Taking the web UI down permanently would not be a viable solution for us.
Certainly not, we were simply hoping to isolate where the bulk of the CPU usage was coming from.

We're going to run some tests to try and recreate an issue like this. We'll keep you posted on what you find.

Just out of curiousity, are you using group dashlets quite a bit?