user: 75%
system: 18.5%
iowait: 0.32%
idle: 6.02%
If I add the user and system together, it's roughly 93% cpu usage. I set our service check to be:
check_cpu_stats.sh -w 85 -c 95
So according to my reasoning, we should have gotten warning alerts.
When I check on the code that comes with the Nagios XI agent, it appears that it is measuring on IOWait. Is that something we can confirm?
Secondly, why would we want to alert on IOWait and not on a combination of SYSTEM and USER for a busy CPU? If those stats are very high, but IOWait is low, does that not create any problems? I would have though that the check would be measuring all of them and taking all of these into account.
Can anyone provide an explanation? We provide monthly graphs of our usage to one of our clients and the question is going to be "If the CPU was that busy, why didn't you get any alerts?"