Nagios Support Forum

Posted: **Mon Feb 08, 2010 5:07 pm**

I'm currently evaluating XI (2009R1.1E.) and have seen a few issues - not sure if it's just my setup or what.

1) When viewing the "Tactical Overview" and clicking on a problem, nothing shows up. If I then click on the "Status Summary" desklet I can get to the problem resource.
2) If I leave my Nagios Core configs in etc/static, my several hundred hosts and several thousand services appear and seem ok. If I instead import the configs, only about 3/4 show up.
3) Most troubling, I removed the VM and re-installed and tried using the config wizards to create my configs from scratch. I've added 30 switches (nothing else) and the load on the box is steady at 30 (System CPU is >80%)

I set use_large_installation_tweaks=1 and restarted but it appeared to have no effect. Suggestions? What do I need to check?

Posted: **Wed Feb 10, 2010 11:41 pm**

1. There was a bug with the Tactical Overview screen links that was fixed in the most recent update (2009R1.1F). Install the update and see if that resolves the issues you noticed.

2. Do the services show up eventually? They may not show up immediately if they haven't been checked yet. You can verify the total number of hosts/services that Nagios Core is monitoring by running the following command at the prompt:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

3. Run "top" for a bit and see what's taking up most CPU. There are a few components that could be taking up cycles - MRTG, RRDTool, MySQL, Postgres, and php (the web interface).

Posted: **Mon Feb 22, 2010 10:46 am**

This is possible?
How?
Because i put this in a machine with 4 cpus and 2048MB of memory, but the load is too high, sometimes comes to 11 my load!!!

Thanks.

Posted: **Wed Feb 24, 2010 2:07 pm**

Your load is (11 11 11)?

What are the process names? top output?

I always find it amusing that an Idle computer is not reported nearly as much as a computer that is doing something, 11 is not vary bad I'd be alarmed if it was above 512.

Posted: **Wed Feb 24, 2010 2:16 pm**

The top process is this:
user: nagios comand:ncpd
user:mysql comand:mysqld
user:apache comand:httpd

And the load above 512??? OMG!!!!
11 is very very hight!

Thanks.

Posted: **Wed Feb 24, 2010 2:28 pm**

If there are no HTTP requests then it seams like apache(perhaps PHP) is fishy. I'd say kill apache and restart it. Why was PHP hung, what was it trying to accomplish?

11 simply means that 11 processes and threads had instructions for the CPU over a given time frame. To have 11 applications running sounds like a vary normal operation to me, I'd say if you had 11 programs Idle this would be an issue. It does not mean that there were 11 applications that could not run and the scheduler is more then capable of making sure 11 processes are taken care of.

Posted: **Wed Feb 24, 2010 3:32 pm**

I think that your concept is wrong.
Load means that we have 11 process is waiting to be executed, and no that are being executing 11 process.

The load recomended in a environment is less than 1.

That is true and in nagios xi server statistics show in red number above 8.

I'm not change the settings of in apache and php conf files. So the problem must be something else.
My company is changing environment and we are migrating hosts and checks.
We have more than 3400 checks, now in nagiosxi we have 840 checks and the load is very hight.
This problem is a concern for us.
So if anyone have any ideas?

Thanks.

Posted: **Fri Feb 26, 2010 12:49 am**

The biggest issue is that the load average is not directly related to something worth our attention, like CPU temp or the RPM meter on a car. High values are simply an indicator, by themselves they are meaningless. They indicate that you !!might!! have something needlessly wasting CPU time. If you can't discover the process doing so, you should pat your server on the back for being such a hard worker and let it get back to making you money. Putting some of the load on another server is an option, but these days the reverse in happening and server load is being pooled into a single object.

The /smallest/ load average is 1min, in that time did only 1 out of 11 jobs that were in the run queue actually run? During this minute what were the 11 processes? Were they the same processes for the whole minute or did several CPU heavy jobs stop and different jobs replace them in the run queue during that time?

The biggest question is, aside from a longer login time did you notice any performance problems during your shell session?

One thing to watch out for is an ever claiming load average, I've never seen it. This is where the CPU load is some how self perpetuating. Once again there is going to be some process or chain of processes causing this.

It's wasteful if the value is not above one for most of the day, think of CPU time as a human resource. Can you afford to have an employee with nothing to do for most of the day?

On most systems several hundred applications can run through the time span of a whole minute and the Linux kernel might be able to scale to many many more. You did say this was a quad-core, so the load average is actually not 11 it's 3. 3 is a vary manageable number for a single core machine.

This whole CPU and Memory bean counting isn't ever about high or low, it's all about proper allocation of resources. If resources are not being wasted needlessly then feel better about using the next minutes CPU time for a job that was in the run queue this minute(It's not like we humans can detect a few hundred clock cycle delay at 1Ghz) and let memory exist in slower disk storage(most of the time more then a small amount of memory does not actively get used).

Posted: **Fri Jun 11, 2010 12:38 pm**

Ok,
I've been running into this really bad problem in the last couple of days. We've been getting a really high load average on our nagios XI system. We only have 113 monitors up and running right now. Here's what I see happening on the server:
Load average: 12.29, 6.56, 4.47
and
24129 apache 15 0 39280 17m 5004 S 7.3 1.7 0:05.57 httpd
24133 apache 15 0 36776 17m 4808 S 7.3 1.7 0:04.40 httpd
24274 apache 15 0 36836 17m 4740 S 7.3 1.7 0:05.98 httpd
24304 apache 15 0 36852 17m 4764 S 7.3 1.7 0:04.24 httpd
24131 apache 15 0 36876 17m 4884 S 6.9 1.7 0:05.94 httpd
24134 apache 15 0 36924 17m 4808 S 6.9 1.7 0:05.75 httpd
24127 apache 15 0 36852 17m 4764 S 6.3 1.7 0:05.23 httpd
24336 apache 15 0 36700 17m 4764 S 5.0 1.7 0:05.08 httpd
24130 apache 17 0 39228 17m 4972 S 4.6 1.7 0:06.51 httpd
25703 apache 18 0 8304 3224 2444 R 4.6 0.3 0:00.14 rrdtool
24128 apache 15 0 36852 17m 4752 S 3.6 1.7 0:03.58 httpd
24132 apache 15 0 36924 17m 4840 S 3.0 1.7 0:05.21 httpd
24138 apache 15 0 36820 17m 4808 S 3.0 1.7 0:05.83 httpd
25700 nagios 19 0 27388 12m 5096 R 3.0 1.2 0:00.09 php

Using the apache service-handler module I noticed the following:
0-0 24127 0/57/57 W 8.85 1 0 0.0 0.30 0.30 10.35.42.180 something.myserver.net GET /nagiosxi//includes/components/perfdata/perfdata.php?cmd=ge
1-0 24128 0/48/48 _ 8.78 0 1430 0.0 0.32 0.32 10.35.42.180 something.myserver.net GET /nagiosxi//includes/components/perfdata/perfdata.php?cmd=ge
2-0 24129 0/62/62 W 10.19 0 0 0.0 0.43 0.43 10.35.42.180 something.myserver.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
3-0 24130 0/71/71 _ 11.63 0 1056 0.0 0.49 0.49 10.35.42.180 something.myserver.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
4-0 24131 0/61/61 W 9.75 1 0 0.0 0.43 0.43 10.35.42.180 something.myserver.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
5-0 24132 0/60/60 W 9.82 0 0 0.0 0.34 0.34 127.0.0.1 something.myserver.net GET /nagios/pnp/index.php?host=nixon.nuskin.net&display=image&s
6-0 24133 0/51/51 _ 8.35 1 1056 0.0 0.23 0.23 10.35.42.180 something.myserver.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
7-0 24134 0/60/60 W 8.56 0 0 0.0 0.38 0.38 10.35.42.180 something.myserver.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%
8-0 24138 0/69/69 W 12.70 0 0 0.0 0.48 0.48 10.35.42.98 something.myserver.net GET /server-status HTTP/1.1
9-0 24274 0/69/69 W 11.20 1 0 0.0 0.43 0.43 10.35.42.180 something.myserver.net GET /nagiosxi//includes/components/perfdata/perfdata.php?cmd=ge
10-0 24304 0/54/54 _ 8.88 0 395 0.0 0.30 0.30 127.0.0.1 something.myserver.net GET /nagios/pnp/index.php?host=nixon.nuskin.net&display=image&s
11-0 24336 0/61/61 W 10.07 1 0 0.0 0.37 0.37 10.35.42.180 something.myserver.net GET /nagiosxi/ajaxhelper.php?cmd=getxicoreajax&opts=%7B%22func%

Why is the load average so high? Is there something I can do to lower it? It looks like the ajaxhelper.php script is the problem.
PHP Version: PHP 5.1.6
Zend Engine v2.1.0

Posted: **Fri Jun 11, 2010 3:06 pm**

Ok, here's what I've found out so far.
The problem has to do with the apache GUI. If you stay on a "detail" page with any of the services with the GUI, then you see several processes running at the same time. Our load drops to almost nothing when we shut down our browsers. That being said, is there some way how I can move the apache service over to another server?

Nagios Support Forum

Is this normal?

Is this normal?

Re: Is this normal?

Nagios XI without VM???

Re: Nagios XI without VM???

Re: Nagios XI without VM???

Re: Nagios XI without VM???

Re: Nagios XI without VM???

Re: Nagios XI without VM???

Need help with high load

Re: Need help with high load