High CPU
-
- Posts: 58
- Joined: Tue Sep 14, 2010 7:53 am
High CPU
I too can't seem to shake the local host current load from being flagged running Nagios XI 2011R1.2. I have about 300 hosts and 1250 services being monitored. We keep throwing more RAM and CPU at it, but it just sucks it all up and basically says 'thanks', then triggers that error again about the current load.
We currently have 4 processors and 8GB of RAM in a virtual machine feeding this XI environment. We currently have roughly 300 hosts and 1300 services.
Any thoughts?
We currently have 4 processors and 8GB of RAM in a virtual machine feeding this XI environment. We currently have roughly 300 hosts and 1300 services.
Any thoughts?
Re: High CPU
Could you post the relevant info regarding your setup here?
Thank you.From XI Staff To Our Customers,
In order to give you the best support possible, we ask that you submit your support requests with the following guidelines. These guidelines are intended to reduce resolution time to your issues.
For all support requests, we need to know:For Installation Issues:
- Linux Distribution and version?
- 32 or 64bit?
- VMware Image or Manual Install of XI?
- Are there specials configurations on your system, ie; is Gnome installed? Are you using a proxy? Are you using SSL?
- **If you are encountering multiple issues that may not be related, start a thread for each issue
- Above information
- For Redhat installs, you need to be registered with the RHN (Redhat Network) in order to have full access to their repos. XI will not be able to install correctly without full repo access since several critical packages depend on this.
- Any error output noticed during installation, and what scripts were being run when errors were noticed.
- Verify that both mysql and postgresql are installed and running after the "3-dbservers" script. Send us the output from the following commands:
Code: Select all
service mysqld restart service postgresql restart
Re: High CPU
We have the same issue. Our host load will report critical sometimes, and the SMTP service for localhost will show a socket timeout. Prior to the 2011 intsall we didn't have this issue.
Relevant information:
NagiosXI 2011R1.1
Linux 2.6.18-194.11.3.el5 #1 SMP Mon Aug 30 16:23:24 EDT 2010 i686 i686 i386 GNU/Linux, installed from
32 Bit
VMWare Image downloaded from the NagsiosXI downloads
None special configs, gnome, no proxy, no SSL, basically a default install
38 hosts
180 services
Relevant information:
NagiosXI 2011R1.1
Linux 2.6.18-194.11.3.el5 #1 SMP Mon Aug 30 16:23:24 EDT 2010 i686 i686 i386 GNU/Linux, installed from
32 Bit
VMWare Image downloaded from the NagsiosXI downloads
None special configs, gnome, no proxy, no SSL, basically a default install
38 hosts
180 services
Re: High CPU
mtkaschools,
Here's a Doc on understanding what affects XI performance:
http://library.nagios.com/library/produ ... erformance
It should probably be noted that mysql is fairly opportunistic with RAM, and it will cache as much memory as it can (up to about 90-95%) if it's not already in use. It's not all actually being used, but think of it as being on standby as needed. CPU load is mostly affect by the factors mentioned in the Doc above.
r.jaynes,
Your issue may be different, you don't have a very large amount of checks running. Can you give us more detail as to where you're finding out about the SMTP timeout, what kind of CPU load on a 15mn average, and how much CPU power you have?
Here's a Doc on understanding what affects XI performance:
http://library.nagios.com/library/produ ... erformance
It should probably be noted that mysql is fairly opportunistic with RAM, and it will cache as much memory as it can (up to about 90-95%) if it's not already in use. It's not all actually being used, but think of it as being on standby as needed. CPU load is mostly affect by the factors mentioned in the Doc above.
r.jaynes,
Your issue may be different, you don't have a very large amount of checks running. Can you give us more detail as to where you're finding out about the SMTP timeout, what kind of CPU load on a 15mn average, and how much CPU power you have?
Re: High CPU
The SMTP timeout shows up under the localhost services in NagiosXI, as a critical status. Currently the 15 minute average is 3.06. The VM is running on VMware ESX 4.0, and has 2048mhz assigned to it with 512mb of RAM.
Re: High CPU
I'll have to do some hunting to see if there were any relevant updates that might affect SMTP on your system. Your CPU load does seem high for that amount of checks. I'd like to have you try a few things.
Lets make sure there are no corrupted tables in mysql:
http://library.nagios.com/library/produ ... i-database
Lets keep the tables trimmed so they don't bug down the system:
http://library.nagios.com/library/produ ... timization
We made some important updates and bug fixes in 2011R1.2, so I definitely recommend upgrading to that when you're able.
Restart the server.
Let me know if you see any changes on your system.
Lets make sure there are no corrupted tables in mysql:
http://library.nagios.com/library/produ ... i-database
Lets keep the tables trimmed so they don't bug down the system:
http://library.nagios.com/library/produ ... timization
We made some important updates and bug fixes in 2011R1.2, so I definitely recommend upgrading to that when you're able.
Restart the server.
Let me know if you see any changes on your system.
Re: High CPU
Thank you for the help. This morning I've done the following:
1) Upgraded to 2011R1.2
2) Rebooted the VM
3) Repaired the tables per the PDF you linked (did not truncate yet)
4) Checked the values for the Performance->Database settings. They are all default and match the guide, except for the one in the guide that says "Repair Interval: 0". Our default value for this is "Optimize Interval: 60". Is that the same thing?
5) Noticed that VMware tools had not been configured. Ran the config utility, let it install the vmxnet driver, etc.
6) Upgraded the version of VMware tools to the latest version (successful upgrade, no issues)
7) Went back to the tables repair guide, truncated the two tables listed, and reran the repair script
Currently NagiosXI is sitting at 3.27 1-min, 3.38 5-min, 3.33 15-min.
While writing this post, I noticed the load average go high:
1) Upgraded to 2011R1.2
2) Rebooted the VM
3) Repaired the tables per the PDF you linked (did not truncate yet)
4) Checked the values for the Performance->Database settings. They are all default and match the guide, except for the one in the guide that says "Repair Interval: 0". Our default value for this is "Optimize Interval: 60". Is that the same thing?
5) Noticed that VMware tools had not been configured. Ran the config utility, let it install the vmxnet driver, etc.
6) Upgraded the version of VMware tools to the latest version (successful upgrade, no issues)
7) Went back to the tables repair guide, truncated the two tables listed, and reran the repair script
Currently NagiosXI is sitting at 3.27 1-min, 3.38 5-min, 3.33 15-min.
While writing this post, I noticed the load average go high:
Code: Select all
Host Service Status Duration Attempt Last Check Status Information
localhost Current Load Critical 10m 36s 4/4 2011-05-06 11:48:58 CRITICAL - load average: 10.09, 11.08, 7.94
SMTP Critical 5m 8s 5/5 2011-05-06 11:48:32 Connection refused
Re: High CPU
One more thing, I'm not sure if this is relevant or not but the user "postgres" currently has 26 running processes for the command "postmaster".
Re: High CPU
We had another user reporting something similar, I had him try the following, and it seemed to have positive results. This is pulled from the following thread (solution towards the end).
http://support.nagios.com/forum/viewtop ... =16&t=1494
http://support.nagios.com/forum/viewtop ... =16&t=1494
Lets try running some queries against the postgresql data base and see if anything stalls out. I'm suspicious there is damage in the somewhere in the postgres database, but it's hard to say for sure. As of yet we haven't had this issue reported by anyone else and we haven't ever been able to replicate it, so it's hard to pin point it exactly. Try running the below queries, and take note of any error messages, or if any of the queries take more than 2 or 3 seconds.
Code: Select all
psql nagiosxi nagiosxi
\d
select count(*) from xi_commands;
select count(*) from xi_events;
select count(*) from xi_meta;
select count(*) from xi_options;
select count(*) from xi_sysstat;
select count(*) from xi_usermeta;
select count(*) from xi_users;
The maintenance and cleaning commands are below, you can try running these as well. You'll get some warnings about not having permissions to some of the built-in postgres tables (those are normal), but post any error messages that might imply table damage or corruption.
Code: Select all
vacuum;
vacuum analyze;
vacuum full;
Re: High CPU
I ran all of the commands manually, and nothing seemed to take too long, including the vacuum commands (I did receive the warnings as you noted). Next, I ran all of the commands through "time", for example:
Here's the output of the select commands:
Here's the output of the vacuum time (second run):
Running the vacuum commands a second time were visibly faster than the first time. There are still 26 "postmaster" commands listed in top, and currently my load average is "load average: 4.83, 4.19, 3.83".
Code: Select all
time psql nagiosxi nagiosxi -c "select count(*) from xi_users;"
Code: Select all
[root@monitor mail]# time psql nagiosxi nagiosxi -c "\d"
List of relations
Schema | Name | Type | Owner
--------+-----------------------------+----------+----------
public | if_command_id_seq | sequence | nagiosxi
public | if_meta_id_seq | sequence | nagiosxi
public | if_option_id_seq | sequence | nagiosxi
public | if_sysstat_id_seq | sequence | nagiosxi
public | if_user_id_seq | sequence | nagiosxi
public | if_usermeta_id_seq | sequence | nagiosxi
public | xi_commands | table | nagiosxi
public | xi_commands_command_id_seq | sequence | nagiosxi
public | xi_events | table | nagiosxi
public | xi_events_event_id_seq | sequence | nagiosxi
public | xi_meta | table | nagiosxi
public | xi_meta_meta_id_seq | sequence | nagiosxi
public | xi_options | table | nagiosxi
public | xi_options_option_id_seq | sequence | nagiosxi
public | xi_sysstat | table | nagiosxi
public | xi_sysstat_sysstat_id_seq | sequence | nagiosxi
public | xi_usermeta | table | nagiosxi
public | xi_usermeta_usermeta_id_seq | sequence | nagiosxi
public | xi_users | table | nagiosxi
public | xi_users_user_id_seq | sequence | nagiosxi
(20 rows)
real 0m0.036s
user 0m0.004s
sys 0m0.012s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "select count(*) from xi_commands;"
count
-------
1
(1 row)
real 0m0.284s
user 0m0.003s
sys 0m0.011s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "select count(*) from xi_events;"
count
-------
845
(1 row)
real 0m0.041s
user 0m0.006s
sys 0m0.011s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "select count(*) from xi_meta;"
count
-------
877
(1 row)
real 0m0.037s
user 0m0.004s
sys 0m0.014s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "select count(*) from xi_options;"
count
-------
37
(1 row)
real 0m0.029s
user 0m0.006s
sys 0m0.009s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "select count(*) from xi_sysstat;"
count
-------
16
(1 row)
real 0m0.029s
user 0m0.005s
sys 0m0.012s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "select count(*) from xi_usermeta;"
count
-------
250
(1 row)
real 0m0.113s
user 0m0.005s
sys 0m0.011s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "select count(*) from xi_users;"
count
-------
9
(1 row)
real 0m0.026s
user 0m0.003s
sys 0m0.010s
Code: Select all
[root@monitor mail]# time psql nagiosxi nagiosxi -c "vacuum;"
WARNING: skipping "pg_authid" --- only table or database owner can vacuum it
WARNING: skipping "pg_tablespace" --- only table or database owner can vacuum it
WARNING: skipping "pg_pltemplate" --- only table or database owner can vacuum it
WARNING: skipping "pg_shdepend" --- only table or database owner can vacuum it
WARNING: skipping "pg_auth_members" --- only table or database owner can vacuum it
WARNING: skipping "pg_database" --- only table or database owner can vacuum it
VACUUM
real 0m0.783s
user 0m0.005s
sys 0m0.008s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "vacuum analyze;"
WARNING: skipping "pg_authid" --- only table or database owner can vacuum it
WARNING: skipping "pg_tablespace" --- only table or database owner can vacuum it
WARNING: skipping "pg_pltemplate" --- only table or database owner can vacuum it
WARNING: skipping "pg_shdepend" --- only table or database owner can vacuum it
WARNING: skipping "pg_auth_members" --- only table or database owner can vacuum it
WARNING: skipping "pg_database" --- only table or database owner can vacuum it
VACUUM
real 0m0.573s
user 0m0.006s
sys 0m0.011s
[root@monitor mail]# time psql nagiosxi nagiosxi -c "vacuum full;"
WARNING: skipping "pg_authid" --- only table or database owner can vacuum it
WARNING: skipping "pg_tablespace" --- only table or database owner can vacuum it
WARNING: skipping "pg_pltemplate" --- only table or database owner can vacuum it
WARNING: skipping "pg_shdepend" --- only table or database owner can vacuum it
WARNING: skipping "pg_auth_members" --- only table or database owner can vacuum it
WARNING: skipping "pg_database" --- only table or database owner can vacuum it
VACUUM
real 0m0.206s
user 0m0.002s
sys 0m0.015s