Probably two or three things going in here....
How many checks have you created using the wizard?
I suspect that your vMA appliance may not have enough memory and CPU's. I would up it to 8GB and 2 CPUs. If the vMA does not have enough resources then almost all checks will timeout. Use the "top" command while logged onto the vMA as vi-admin to observe CPU and Memory usage.
Next, let's remove Nagios from the equation all together. We'll execute some checks on the vMA box directly and see how long they take.
- ssh to the vMA as vi-admin
Here's an example check:
Code: Select all
Command:
~/box293_check_vmware.pl --server vcenter.box293.local --check Guest_Snapshot --guest centos07 --timeout 600
Output:
OK: ['centos07' (Notes: Before Starting) (Age: 40)]
I've added a super long timeout so we can find out how long it runs for
Now we can "time how long it takes" by simply starting the command with time
Code: Select all
Command:
time ~/box293_check_vmware.pl --server vcenter.box293.local --check Guest_Snapshot --guest centos07 --timeout 600
Output:
OK: ['centos07' (Notes: Before Starting) (Age: 40)]
real 0m4.194s
user 0m3.964s
sys 0m0.068s
So you can see that this took 4.194 seconds.
So with that information you now roughly know what --timeout value to use in the commands.
LASTLY ... if you need a timeout greater than 60 then READ THIS:
By default, Nagios Core has a timeout of 60 seconds by default, as defined in /usr/local/nagios/etc/nagios.cfg
service_check_timeout=60
If you need checks to run longer than 60 seconds then you need to change nagios.cfg and restart the Nagios service. In Nagios XI you can do this by:
CCM
Advanced
Nagios Core Main Config
Change service_check_timeout=
Click Save
Click Apply Configuration
Finally, answers to some of the other posts in the thread.
lmiltchev wrote:I opened the plugin in a text editor and found out that the "default" timeout is 60 seconds. You could probably increase these value as much as it is needed (around line 287)
Yes this would work, however --timeout is defined in the commands by default at 90, so changing it in the script would ignore the setting.
highness wrote:But which command would that need to be added to? All 41 of them?
Yes and no. There are 41 commands because the wizard was designed for some flexibility.
When looking at a service definition that is timing out, take not of the check command, this is the one that needs to be configured.
Box293 wrote:--timeout is defined in the commands by default at 90
Don't ask me to give a valid reason why I chose 90 in the commands when Nagios uses 60 by default
Let us know how you go and if any of this resolved your issues. This is some good troubleshooting I need to add to the manual.