Page 2 of 2

Re: Performance and Crash

Posted: Fri Aug 02, 2019 11:09 am
by rocheryderm
Well folks... I made a break-through last night.

It occurred to me that my vSphere cluster had been getting more loaded in the past few months, so... on a lark...

I updated the CPU "Shares" from Normal to High.

The difference was almost immediate.

I don't exactly understand the relationship or how this could impact Nagios in the way it has... but since I made that change last night I have not received a single false notification from Nagios XI. CPU usage now an average of 75%.

This morning, I made further changes to reserve vSphere CPU based on the average utilization (in this case 5000MHz).

Average load on the server dropped even further.

I would really love to understand why Nagios experiences such dramatic timeouts because of this change.

In any case, thank you for trying to help with this and reviewing my configuration. I think I'm on the road to recovery.

4768 services, 407 hosts, most service checks run in 10-minute intervals. host checks run at 5-minute intervals.
4 cores. CPU Share High. CPU reservation 5000MHz.
total CPU utilization averaging around 50%
GUI responds well.

Re: Performance and Crash

Posted: Fri Aug 02, 2019 11:33 am
by ssax
Please see here:

https://geek-university.com/vmware-esxi ... explained/

It's essentially QoS (Quality of Service) for VM resources, high means more aggressive.