Page 1 of 1
CPU Load does not work properly
Posted: Mon Apr 27, 2015 4:38 am
by cloudcom
Hi,
I have run a .bat file that consumes cpu on my server for testing.
Althoug cpu usage is 99%-100% for 30 minutes, nagios shows the cpu load 'Ok' for that server.
When i click immediate check, it shows "critical" for a short time. After the next check (1 munite later) It shows the status "ok" again. When i click immediate check again, it still shows "ok"
But in fact, cpu usage is still 99%-100% on the server.
So, I can not see critical alarm on the Lastest alert menu.
Here some screenshots. Please note that check times:
cpu2.png
cpu3.png
Re: CPU Load does not work properly
Posted: Mon Apr 27, 2015 9:20 am
by cloudcom
I expected that the graph of cpu usage had a straight line because cpu usage was 99% during test.
But host graphs shows like below:
cpu4.png
Re: CPU Load does not work properly
Posted: Mon Apr 27, 2015 2:20 pm
by jdalrymple
WMI or NSCP?
When monitoring Windows hosts in the past I've discovered that monitoring CPU usage via WMI to be terrifically unreliable and unpredictable. This is unfortunately a shortcoming of WMI, not Nagios - Nagios just interprets the results it gets back.
I haven't spent a lot of time monitoring CPU with NSCP, all I can say is that from my experience it would be tough to do a worse job than WMI.
Re: CPU Load does not work properly
Posted: Tue Apr 28, 2015 3:43 am
by cloudcom
I use nscp. check command uses "check_nt" plugin.
this is default when i add a server via windows server wizard.
Then, what is the best way to check cpu usage?
Re: CPU Load does not work properly
Posted: Tue Apr 28, 2015 9:25 am
by jdalrymple
Generally speaking NSCP and check_nt give pretty good results. I'd say you are on the right track. My earlier post was simply to indicate that WMI has yielded crummy results (specifically for CPU usage - it's great for many other things that aren't as transient) and that I'd avoid it as your agent for this check. That said, you might want to check your version of nsclient and update to the most current. It's a pretty 'dynamic' piece of software, bugs come and go but the developer is generally pretty quick to respond when he sees them. It sounds like you've found one. If you update to the most current version and it still misbehaves we can debug and possibly file a bug with him, or it may just fix it.
Re: CPU Load does not work properly
Posted: Wed Apr 29, 2015 6:02 am
by cloudcom
I realized that if cpu usage is fixed at 99%, nagios shows its status as "cpu load 1% Ok".
But if cpu usage fluctuates between some values like between 94% and 99%, nagios shows correct result.
I wonder that this issue caused by servers behaviour itself because cpu usage never goes up to 100%.
it is fixed at 99% and then nagios gives wrong result
Re: CPU Load does not work properly
Posted: Wed Apr 29, 2015 12:58 pm
by jolson
Nagios is simply reading the results back from NSCP, it's possible but unlikely that there's a calculation issue on the Nagios side of things. Whatever the issue is, it seems that NSCP is handing the wrong results off to Nagios when the CPU is static - you could likely verify this with a tcpdump on the Nagios side.
Would you mind posting the batch script you are using? I would like to attempt reproduction. In the meantime, if you would take jdalrymples advice and upgrade to the latest version of NSClient it may yield better results.
Another thing you could do is attempt using the 'check_nrpe' plugin to communicate with NSClient - I doubt it would yield different results, but it's entirely possible.
Let us know - thanks!
Re: CPU Load does not work properly
Posted: Wed May 06, 2015 4:57 am
by cloudcom
Hi,
the problem was solved after installing latest version of nsclient
thanks for the help