More box293_check_vmware fun
More box293_check_vmware fun
We ran into another snag after we picked this ball back up this morning.
We have a vCenter cluster that has a large number of hosts, so consequently, each and every check type (vCenter, hosts, datastores, guests) times out.
Where can I change the timeout for these checks?
We have a vCenter cluster that has a large number of hosts, so consequently, each and every check type (vCenter, hosts, datastores, guests) times out.
Where can I change the timeout for these checks?
Re: More box293_check_vmware fun
Perhaps box293 will correct me if I am wrong... 
I opened the plugin in a text editor and found out that the "default" timeout is 60 seconds. You could probably increase these value as much as it is needed (around line 287)
I opened the plugin in a text editor and found out that the "default" timeout is 60 seconds. You could probably increase these value as much as it is needed (around line 287)
Code: Select all
timeout => {
type => '=i',
help => 'Specify the time a check is allowed to execute for. 60 seconds by default.',
required => 0,
default => 60,
}Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: More box293_check_vmware fun
Actually tried that - increased it to 300, but it still times out.lmiltchev wrote:Perhaps box293 will correct me if I am wrong...
I opened the plugin in a text editor and found out that the "default" timeout is 60 seconds. You could probably increase these value as much as it is needed (around line 287)
Code: Select all
timeout => { type => '=i', help => 'Specify the time a check is allowed to execute for. 60 seconds by default.', required => 0, default => 60, }
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: More box293_check_vmware fun
I believe you just have to add --timeout to the command. Something like this:
the first "-t 90" is for check_by_ssh, and "--timeout 90" is for box_293.
Code: Select all
check_by_ssh -E 1 -t 90 -l vi-admin -H xx.xx.xx.xx -C "~/box293_check_vmware.pl --concurrent_checks 400 --timeout 90 --check Host_CPU_Info --server xx.xx.xx.xx --host xx.xx.xx.xxRe: More box293_check_vmware fun
But which command would that need to be added to? All 41 of them?snapon_admin wrote:I believe you just have to add --timeout to the command. Something like this:
the first "-t 90" is for check_by_ssh, and "--timeout 90" is for box_293.Code: Select all
check_by_ssh -E 1 -t 90 -l vi-admin -H xx.xx.xx.xx -C "~/box293_check_vmware.pl --concurrent_checks 400 --timeout 90 --check Host_CPU_Info --server xx.xx.xx.xx --host xx.xx.xx.xx
UPDATE: Actually added the timeout to a couple of the commands (to 300 seconds) still no joy.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: More box293_check_vmware fun
Probably two or three things going in here....
How many checks have you created using the wizard?
I suspect that your vMA appliance may not have enough memory and CPU's. I would up it to 8GB and 2 CPUs. If the vMA does not have enough resources then almost all checks will timeout. Use the "top" command while logged onto the vMA as vi-admin to observe CPU and Memory usage.
Next, let's remove Nagios from the equation all together. We'll execute some checks on the vMA box directly and see how long they take.
LASTLY ... if you need a timeout greater than 60 then READ THIS:
CCM
Advanced
Nagios Core Main Config
Change service_check_timeout=
Click Save
Click Apply Configuration
Finally, answers to some of the other posts in the thread.
When looking at a service definition that is timing out, take not of the check command, this is the one that needs to be configured.

Let us know how you go and if any of this resolved your issues. This is some good troubleshooting I need to add to the manual.
How many checks have you created using the wizard?
I suspect that your vMA appliance may not have enough memory and CPU's. I would up it to 8GB and 2 CPUs. If the vMA does not have enough resources then almost all checks will timeout. Use the "top" command while logged onto the vMA as vi-admin to observe CPU and Memory usage.
Next, let's remove Nagios from the equation all together. We'll execute some checks on the vMA box directly and see how long they take.
- ssh to the vMA as vi-admin
Here's an example check:I've added a super long timeout so we can find out how long it runs forCode: Select all
Command: ~/box293_check_vmware.pl --server vcenter.box293.local --check Guest_Snapshot --guest centos07 --timeout 600 Output: OK: ['centos07' (Notes: Before Starting) (Age: 40)]
Now we can "time how long it takes" by simply starting the command with timeSo you can see that this took 4.194 seconds.Code: Select all
Command: time ~/box293_check_vmware.pl --server vcenter.box293.local --check Guest_Snapshot --guest centos07 --timeout 600 Output: OK: ['centos07' (Notes: Before Starting) (Age: 40)] real 0m4.194s user 0m3.964s sys 0m0.068s
LASTLY ... if you need a timeout greater than 60 then READ THIS:
If you need checks to run longer than 60 seconds then you need to change nagios.cfg and restart the Nagios service. In Nagios XI you can do this by:By default, Nagios Core has a timeout of 60 seconds by default, as defined in /usr/local/nagios/etc/nagios.cfg
service_check_timeout=60
CCM
Advanced
Nagios Core Main Config
Change service_check_timeout=
Click Save
Click Apply Configuration
Finally, answers to some of the other posts in the thread.
Yes this would work, however --timeout is defined in the commands by default at 90, so changing it in the script would ignore the setting.lmiltchev wrote:I opened the plugin in a text editor and found out that the "default" timeout is 60 seconds. You could probably increase these value as much as it is needed (around line 287)
Yes and no. There are 41 commands because the wizard was designed for some flexibility.highness wrote:But which command would that need to be added to? All 41 of them?
When looking at a service definition that is timing out, take not of the check command, this is the one that needs to be configured.
Don't ask me to give a valid reason why I chose 90 in the commands when Nagios uses 60 by defaultBox293 wrote:--timeout is defined in the commands by default at 90
Let us know how you go and if any of this resolved your issues. This is some good troubleshooting I need to add to the manual.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: More box293_check_vmware fun
Boosted the vMA box to 2 CPUs and 8GB. I can now get the "Retrieve vCenter Objects" (which returns 23 clusters) to complete, but the rest (hosts, datastores, guests) still time out.Box293 wrote:I suspect that your vMA appliance may not have enough memory and CPU's. I would up it to 8GB and 2 CPUs. If the vMA does not have enough resources then almost all checks will timeout. Use the "top" command while logged onto the vMA as vi-admin to observe CPU and Memory usage.
Here are a couple of checks I ran:Box293 wrote:We'll execute some checks on the vMA box directly and see how long they take.
Code: Select all
vi-admin@vma01:~> time ~/box293_check_vmware.pl --server 10.YYY.YYY.YYY --check Guest_Snapshot --guest guest1.host.com --timeout 600; time ~/box293_check_vmware.pl --server 10.YYY.YYY.YYY --check Guest_Snapshot --guest guest2.host.com --timeout 600
OK: No snapshots found
real 0m8.982s
user 0m6.420s
sys 0m0.080s
OK: No snapshots found
real 0m7.938s
user 0m6.648s
sys 0m0.128s
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: More box293_check_vmware fun
OK I've got a better idea of what's happening now ... I need to do some more investigating and I'll reply back in a couple of hours.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: More box293_check_vmware fun
Thanks for looking into this, Troy!
Be sure to check out our Knowledgebase for helpful articles and solutions!
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: More box293_check_vmware fun
OK So I didn't realise your problem was with the wizard ... this is somewhat simpler to resolve.
The wizard runs some "special" checks to perform the "Retrieve xxx Objects". The wizard has a hard coded timeout of 90 seconds. I will give you a command to change the timeout in the wizard, but first we need to know how long these "special" checks take to run.
When you execute these checks to time how long they take, a LOT of garbled text will be displayed ... we are only interested in the final time it takes to run the check.
ssh to the vMA as vi-admin
Type these four commands:
From the results, whichever one is the longest, double that number. I am going to use the number 300 in the next step.
Establish an ssh session to your Nagios XI Host
Type this command:
Now go and run the configuration wizard, everthing should work fine now.
Let us know how you go.
Don't worry about all the talk in the other posts about changing the check commands, I had the wrong idea as to what was happening.
Thanks for your patience so far helping us resolve this problem. I will take away what you have experienced here and include a timeout option in the next version of the wizard (probably out in a couple of months).
The wizard runs some "special" checks to perform the "Retrieve xxx Objects". The wizard has a hard coded timeout of 90 seconds. I will give you a command to change the timeout in the wizard, but first we need to know how long these "special" checks take to run.
When you execute these checks to time how long they take, a LOT of garbled text will be displayed ... we are only interested in the final time it takes to run the check.
ssh to the vMA as vi-admin
Type these four commands:
Code: Select all
time ~/box293_check_vmware.pl --server 10.YYY.YYY.YYY --timeout 600 --check List_Datastores
time ~/box293_check_vmware.pl --server 10.YYY.YYY.YYY --timeout 600 --check List_Guests
time ~/box293_check_vmware.pl --server 10.YYY.YYY.YYY --timeout 600 --check List_Hosts
time ~/box293_check_vmware.pl --server 10.YYY.YYY.YYY --timeout 600 --check List_vCenter_ObjectsEstablish an ssh session to your Nagios XI Host
Type this command:
Code: Select all
sed -i 's/-t 90/-t 300/g' /usr/local/nagiosxi/html/includes/configwizards/vmwarevirtualizationwizard/vmwarevirtualizationwizard_misc.phpLet us know how you go.
Don't worry about all the talk in the other posts about changing the check commands, I had the wrong idea as to what was happening.
Thanks for your patience so far helping us resolve this problem. I will take away what you have experienced here and include a timeout option in the next version of the wizard (probably out in a couple of months).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.