Interesting Load on Nagios Server
Posted: Mon Apr 29, 2013 11:08 am
Hello everyone,
We're ramping up our Nagios XI deployment. Our performance is fine. Nagios is running on a VM with 4 cores and 4GB of RAM.
Here is our current number of checks. Ignore the unhandled/cirtical issues. We haven't tweaked our thresholds yet. The majority of the service checks are some sort of WMI query via check_wmi_plus.pl .

We're looking to potentially decrease the time between service checks (meaning check more often) and was wondering what performance this has on the system. Here are our current system load graphs:
Localhost: Current_Load (12 hours):

Localhost: Current_Load (3 days):

This looks indicative of some sort of a garbage collection or scheduled "clean up" task. Can anyone explain why the graphs would look like this? I can always throw more CPU at it if necessary. Research says if the load is greater than the number of cores then you may see performance issues.
I was also curious to know if there was a way to do some "parallelization" of the check tasks. I have looked a the Nagios Performance writeup that says to enable the "large environment tweaks" variable, but it looks to already be set. I'm using the Nagios XI Enterprise OVA VM download. Is there any way to gather more information or visually see the check queue? I would be interested to see upcoming checks as needed.
Thanks,
Smark
Edit: Added additional info about our deployment.
We're ramping up our Nagios XI deployment. Our performance is fine. Nagios is running on a VM with 4 cores and 4GB of RAM.
Here is our current number of checks. Ignore the unhandled/cirtical issues. We haven't tweaked our thresholds yet. The majority of the service checks are some sort of WMI query via check_wmi_plus.pl .

We're looking to potentially decrease the time between service checks (meaning check more often) and was wondering what performance this has on the system. Here are our current system load graphs:
Localhost: Current_Load (12 hours):

Localhost: Current_Load (3 days):

This looks indicative of some sort of a garbage collection or scheduled "clean up" task. Can anyone explain why the graphs would look like this? I can always throw more CPU at it if necessary. Research says if the load is greater than the number of cores then you may see performance issues.
I was also curious to know if there was a way to do some "parallelization" of the check tasks. I have looked a the Nagios Performance writeup that says to enable the "large environment tweaks" variable, but it looks to already be set. I'm using the Nagios XI Enterprise OVA VM download. Is there any way to gather more information or visually see the check queue? I would be interested to see upcoming checks as needed.
Thanks,
Smark
Edit: Added additional info about our deployment.