vmware plugin spanking my cpu
vmware plugin spanking my cpu
I have just set up 16 esx hosts for checks and the check_esx3.pl is using all my cpu time...
You do not have the required permissions to view the files attached to this post.
Re: vmware plugin spanking my cpu
just to add some meaningful notes...
i added my esx hosts to be checked and all hell broke loose... xi web would freeze, plugins would timeout.
so i gave it 2gb of ram and 2 vcpus; up from 512mb ram and 1 cpu
made no difference - it ate all the resource and still needed more cpu.. the check_esx3.pl is not fit for production use.
i added my esx hosts to be checked and all hell broke loose... xi web would freeze, plugins would timeout.
so i gave it 2gb of ram and 2 vcpus; up from 512mb ram and 1 cpu
made no difference - it ate all the resource and still needed more cpu.. the check_esx3.pl is not fit for production use.
Re: vmware plugin spanking my cpu
seems this has already been discussed... http://communities.vmware.com/thread/21 ... 5&tstart=0
Re: vmware plugin spanking my cpu
I have uploaded the check_esx3.pl from the last page of the thread on the vm forum.. seems a lot better..
with the one in the wizard a VMFS check took over 8 seconds.. with the new one it took about 2. Loads of info on thread that goes way over my head..
version i am using attached,.
with the one in the wizard a VMFS check took over 8 seconds.. with the new one it took about 2. Loads of info on thread that goes way over my head..
version i am using attached,.
Re: vmware plugin spanking my cpu
ok cancel all that.. got all my esx hosts back monitored and its ground to a halt again.. maybe you should put something in place to just check the vcentre server? or the datacentre?
Re: vmware plugin spanking my cpu
I almost didn't want to read the rest of your posts after reading the first one. We saw this in testing, but decided since our hardware was overworked with so many ppl testing the same plugin against the same test ESX box that it wouldn't be an issue more hardware wouldn't solve... and as ESX essentially indicates that you can get a hold of big bulky machines that it would be fine. The other reason I discredited the issue was because of the context switch penalty that monitoring ESX while running under ESX was there. It doesn't make any sense to attempt to monitor ESX from itself, as your going to either have a hung Nagios or a working ESX.
From the sound of it that's what you are trying to do. It would be better to run the plugin from another machine. I could go into great detail as to why, but just think of it as a beach ball with all your VMs in a circle where ESX is just another member of this circle... Even if you are right next to ESX, ESX would still be a full circle away from you. Each plugin instance makes several(at least 3, but I'd say and there are ajax calls too so the number could be more like 7) HTTPS transactions made up of 50 or so ack/nacks. The ball has to make ~50 * ~5 trips around this circle, this takes unusually long because these two threads are passing data requiring a full context switch and cache flush I'm sure. This is not the case for OS level processes or threads because they are optimized for faster IPC, the write call knows it's talking to a corresponding read call and in many instances this is able to be completed with zero moves. With this setup each host is putting data into the network out buffer as if it's going to another device that would be totally asynchronous, instead the transaction is synchronous and not optimized at all. That said using the paravirtulized drivers would help, but they are still optimized for passing data to a real NIC.
Perhaps this is still an issue even from another machine, in this case it's totally on VMware to make a more efficient API. If you trace the process you'll see that an awful lot of time is spent with what I'd consider code obfuscation, but it's just loading in what seams to be 100 different perl files. Attempting to use epn required rebuilding Nagaios Core and didn't show any real benefit.
From the sound of it that's what you are trying to do. It would be better to run the plugin from another machine. I could go into great detail as to why, but just think of it as a beach ball with all your VMs in a circle where ESX is just another member of this circle... Even if you are right next to ESX, ESX would still be a full circle away from you. Each plugin instance makes several(at least 3, but I'd say and there are ajax calls too so the number could be more like 7) HTTPS transactions made up of 50 or so ack/nacks. The ball has to make ~50 * ~5 trips around this circle, this takes unusually long because these two threads are passing data requiring a full context switch and cache flush I'm sure. This is not the case for OS level processes or threads because they are optimized for faster IPC, the write call knows it's talking to a corresponding read call and in many instances this is able to be completed with zero moves. With this setup each host is putting data into the network out buffer as if it's going to another device that would be totally asynchronous, instead the transaction is synchronous and not optimized at all. That said using the paravirtulized drivers would help, but they are still optimized for passing data to a real NIC.
Perhaps this is still an issue even from another machine, in this case it's totally on VMware to make a more efficient API. If you trace the process you'll see that an awful lot of time is spent with what I'd consider code obfuscation, but it's just loading in what seams to be 100 different perl files. Attempting to use epn required rebuilding Nagaios Core and didn't show any real benefit.
Re: vmware plugin spanking my cpu
Did you read the thread on vm forums? goes into the check_esx3.pl and why it is not optimised for the latest version of the api...
hmm.. so it doesnt work is what I got from all that.
Not going to get into monitoring vmware from within vmware.. but it is perfectly viable to want to do so. PM me if you wish to discuss the reasons.
Let's not give up though! I am asking some ex colleagues who monitor esx from a full blown nagios install - see what they come back with.
In the mean time - are you saying I need another nagios server just for the esx plugin checks? if so will the license I have purchased cover that? I hope so.
hmm.. so it doesnt work is what I got from all that.
Not going to get into monitoring vmware from within vmware.. but it is perfectly viable to want to do so. PM me if you wish to discuss the reasons.
Let's not give up though! I am asking some ex colleagues who monitor esx from a full blown nagios install - see what they come back with.
In the mean time - are you saying I need another nagios server just for the esx plugin checks? if so will the license I have purchased cover that? I hope so.
Re: vmware plugin spanking my cpu
spoke to my friend who had this issue with nagios core and he said the underlying problem for him was a DNS issue.. he couldnt expand on it as he didnt fix it though...
Re: vmware plugin spanking my cpu
Thank you for that information, Reverse DNS "timeouts" is often an issue for both security considerate applications and any one else who feels like logging names instead of numbers. At the vary least have a DNS proxy/server reply with a server fault, if you can't afford to expose any information.
Our DNS replies promptly with NXDOMAIN, so I don't feel this is an issue for us. Though I will ask about internal RDNS.
Our DNS replies promptly with NXDOMAIN, so I don't feel this is an issue for us. Though I will ask about internal RDNS.
Re: vmware plugin spanking my cpu
I don't know if this will help any and this thread is somewhat old but I am having a similar issue. Im running nagiosxi on a dedicated box with CentOS 5 its a intel core2quad Q6600 with 3GB of ddr2 and 2x 1TB WD black drives. I have setup for it to monitor a few switches and a few hosts as well as 6 vm hosts along with all of their guest os's which is around 40+ guests. I am also seeing the cpu spikes but It is not slowing down I will have sometimes 3-4 check_esx3.pl process when looking in top each taking anywhere from 10-30% cpu. But it's usually only for a brief second but it does spike here and there and it does not cause any slowdowns. Also their are a lot of httpd requests now also but the most I have seen out of them is 7% cpu most are under 1 percent.

