Performance and Crash

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rocheryderm
Posts: 69
Joined: Fri Jul 13, 2018 1:09 pm

Re: Performance and Crash

Post by rocheryderm »

Well folks... I made a break-through last night.

It occurred to me that my vSphere cluster had been getting more loaded in the past few months, so... on a lark...

I updated the CPU "Shares" from Normal to High.

The difference was almost immediate.

I don't exactly understand the relationship or how this could impact Nagios in the way it has... but since I made that change last night I have not received a single false notification from Nagios XI. CPU usage now an average of 75%.

This morning, I made further changes to reserve vSphere CPU based on the average utilization (in this case 5000MHz).

Average load on the server dropped even further.

I would really love to understand why Nagios experiences such dramatic timeouts because of this change.

In any case, thank you for trying to help with this and reviewing my configuration. I think I'm on the road to recovery.

4768 services, 407 hosts, most service checks run in 10-minute intervals. host checks run at 5-minute intervals.
4 cores. CPU Share High. CPU reservation 5000MHz.
total CPU utilization averaging around 50%
GUI responds well.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Performance and Crash

Post by ssax »

Please see here:

https://geek-university.com/vmware-esxi ... explained/

It's essentially QoS (Quality of Service) for VM resources, high means more aggressive.
Locked