Page 1 of 2
check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 1:41 am
by rajasegar
XI 2012R2.9 Enterprise Edition
RHEL 6.5
Offloaded DB
Firefox 30.0
Output from command line in XI server, not truncated
[nagios@nagiosprodxi1 cfgprep]$ /usr/local/nagios/libexec/check_esx3.pl -H 10.10.10.10 -u rrrrr -p ppppppp -l VMFS -w 70% -c 90%
ESX3 CRITICAL - storages : datastore1=280270.00 MB (99.80%), ADT1-TransportNode01-1-VNX7500=146616.00 MB (7.16%), ADT1-TransportNode01-2-VNX7500=224456.00 MB (14.62%), ADT1-TransportNode02-VNX7500=445628.00 MB (21.76%), ADT1-TransportNode03-VNX7500=445626.00 MB (21.76%), ADT1-ControlNode-VNX7500=384112.00 MB (18.76%) | datastore1=99.80%;70;90 ADT1-TransportNode01-1-VNX7500=7.16%;70;90 ADT1-TransportNode01-2-VNX7500=14.62%;70;90 ADT1-TransportNode02-VNX7500=21.76%;70;90 ADT1-TransportNode03-VNX7500=21.76%;70;90 ADT1-ControlNode-VNX7500=18.76%;70;90
Truncated Output in XI screen
325-06-2014 02-36-22 PM.png
Please advice how to fix this issue. Thanks
Another problem, is the very high CPU load.
When check_esx3.pl runs it hogs the CPU
25-06-2014 05-48-59 PM.png
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 9:39 am
by slansing
You can use '-h' on that plugin to get the full help output to sculpt your command:
Code: Select all
/usr/local/nagios/libexec/check_esx3.pl -h
Looks like just passing vmfs is giving you all datastore info, it looks like there should be a way to narrow that down I'll have to look around a bit. Something else you may want to take a look at is Box293's VMA hook-in vmware monitoring plugin:
http://exchange.nagios.org/directory/Pl ... re/details
The esx3 plugin does take quite a bit of resources based on how long you are querying the system, and for how much data. What type of hardware does your XI server have under it? You mentioned that you had a lot of users, which will also increase the load based on what they may be doing "running reports" etc.
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 3:29 pm
by Box293
To fix the truncated output we need to increase the size of the tables in Nagios XI.
Execute the following commands at the CLI:
Code: Select all
echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
The next time the check is executed the output will not be truncated in the XI interface.
As for the high CPU load, I recommend what slansing said and implement the
box293_check_vmware plugin / solution as it offloads the plugin to a vMA appliance.
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 6:52 pm
by rajasegar
slansing wrote:You can use '-h' on that plugin to get the full help output to sculpt your command:
Code: Select all
/usr/local/nagios/libexec/check_esx3.pl -h
Looks like just passing vmfs is giving you all datastore info, it looks like there should be a way to narrow that down I'll have to look around a bit. Something else you may want to take a look at is Box293's VMA hook-in vmware monitoring plugin:
http://exchange.nagios.org/directory/Pl ... re/details
The esx3 plugin does take quite a bit of resources based on how long you are querying the system, and for how much data. What type of hardware does your XI server have under it? You mentioned that you had a lot of users, which will also increase the load based on what they may be doing "running reports" etc.
I can hardcode the Datastores but change management process needs to be perfect E2E which it is not.
My DB server is offloaded. XI is using 8 cores and 8GB ram and storage is on Raid 10.
Most of the time it is 70 - 80% idle when the check_esx3 is not running.
The users are only 2 currently. It is not fully deployed yet.
I will take a look at the plugin you recommended. Thanks
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 6:55 pm
by rajasegar
Box293 wrote:To fix the truncated output we need to increase the size of the tables in Nagios XI.
Execute the following commands at the CLI:
Code: Select all
echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
The next time the check is executed the output will not be truncated in the XI interface.
As for the high CPU load, I recommend what slansing said and implement the
box293_check_vmware plugin / solution as it offloads the plugin to a vMA appliance.
Thanks for the assistance. I will check out the plugin.
BTW what is a vMA appliance? Sorry not well versed in this.
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 7:32 pm
by Box293
rajasegar wrote:BTW what is a vMA appliance?
vMA = VMware vSphere Management Assistant.
This is a free VM provided by VMware to manage ESXi and ESX servers (and vCenter). ESXi is a locked down OS, so VMware created vMA to allow you to do a lot of stuff to ESXi servers remotely.
The plugin simply leverages the VMware SDK which runs on PERL, so by using the vMA as a base to run the plugin on makes it a lot easier as it already has everything installed.
rajasegar wrote:Most of the time it is 70 - 80% idle when the check_esx3 is not running.
When you run certain checks with your existing check_esx3 plugin, the SDK is obtaining a heck of a lot of information from vCenter as it is querying all of the datastores, which means it uses a lot more CPU and memory to get the job done. box293_check_vmware was wrtitten to be as efficient as possible, only obtaining the information it needs using the VMware SDK. However even with this efficient design, it still needs to be offloaded to vMA as there is too much CPU and memory overhead, which can at times cripple a Nagios server.
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 7:50 pm
by rajasegar
Box293 wrote:rajasegar wrote:BTW what is a vMA appliance?
vMA = VMware vSphere Management Assistant.
This is a free VM provided by VMware to manage ESXi and ESX servers (and vCenter). ESXi is a locked down OS, so VMware created vMA to allow you to do a lot of stuff to ESXi servers remotely.
The plugin simply leverages the VMware SDK which runs on PERL, so by using the vMA as a base to run the plugin on makes it a lot easier as it already has everything installed.
rajasegar wrote:Most of the time it is 70 - 80% idle when the check_esx3 is not running.
When you run certain checks with your existing check_esx3 plugin, the SDK is obtaining a heck of a lot of information from vCenter as it is querying all of the datastores, which means it uses a lot more CPU and memory to get the job done. box293_check_vmware was wrtitten to be as efficient as possible, only obtaining the information it needs using the VMware SDK. However even with this efficient design, it still needs to be offloaded to vMA as there is too much CPU and memory overhead, which can at times cripple a Nagios server.
1) Can your plugin be run on the Nagios server? It will take a very long time to go through the Security people, setup VM etc etc.
2) Will querying individual datastore reduce the load?
Thanks.
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 8:07 pm
by Box293
rajasegar wrote:1) Can your plugin be run on the Nagios server?
It's possible however there will be no support. I specifically designed this plugin to run from a vMA appliance. I've killed Nagios XI servers doing a small amount of VMware monitoring directly from the XI server so I don't want to be responsible for creating something that could bring down a Nagios server.
rajasegar wrote:It will take a very long time to go through the Security people, setup VM etc etc.
Let me put a scenario to you. All the air conditioners in the server room stop working. Temperature goes up. Nagios happens to be overloaded with VMware checks and the temperature monitoring checks do not happen and alerts do not get sent out. In a matter of 15 minutes all of your equipment shuts down due to overheating.
How are you going to explain that to your boss when you knew there was a solution available that could have prevented this?
rajasegar wrote:2) Will querying individual datastore reduce the load?
It's possible however you would need to go and read the code in the check_esx3.
Honestly, it's not about one particular check, but when multiple VMware checks are running at the same time, they have a chain-reaction type of effect. This is why offloading it make sense.
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Wed Jun 25, 2014 8:14 pm
by rajasegar
Box293 wrote:rajasegar wrote:1) Can your plugin be run on the Nagios server?
It's possible however there will be no support. I specifically designed this plugin to run from a vMA appliance. I've killed Nagios XI servers doing a small amount of VMware monitoring directly from the XI server so I don't want to be responsible for creating something that could bring down a Nagios server.
rajasegar wrote:It will take a very long time to go through the Security people, setup VM etc etc.
Let me put a scenario to you. All the air conditioners in the server room stop working. Temperature goes up. Nagios happens to be overloaded with VMware checks and the temperature monitoring checks do not happen and alerts do not get sent out. In a matter of 15 minutes all of your equipment shuts down due to overheating.
How are you going to explain that to your boss when you knew there was a solution available that could have prevented this?
rajasegar wrote:2) Will querying individual datastore reduce the load?
It's possible however you would need to go and read the code in the check_esx3.
Honestly, it's not about one particular check, but when multiple VMware checks are running at the same time, they have a chain-reaction type of effect. This is why offloading it make sense.
Let me see if I can do something about concurrency with dependencies and I will check out the monitoring locally in Dev.
The VMA thing as mentioned will take some time.
Thanks
Re: check_esx3.pl output truncated in Nagios/ CPU Load
Posted: Thu Jun 26, 2014 1:53 pm
by scottwilkerson
Box293 wrote:To fix the truncated output we need to increase the size of the tables in Nagios XI.
Execute the following commands at the CLI:
Code: Select all
echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
The next time the check is executed the output will not be truncated in the XI interface.
As for the high CPU load, I recommend what slansing said and implement the
box293_check_vmware plugin / solution as it offloads the plugin to a vMA appliance.
As an FYI, this code does have a slight bug in it and should read
Code: Select all
echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_hoststatus modify long_output varchar(65535) not null;alter table nagios_hoststatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicechecks modify long_output varchar(65535) not null;alter table nagios_servicechecks modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_hostchecks modify long_output varchar(65535) not null;alter table nagios_hostchecks modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
Or slightly cleaner
Code: Select all
echo "
alter table nagios_servicestatus modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_hoststatus modify output varchar(65535) not null, modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_servicechecks modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_hostchecks modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
" | mysql -pnagiosxi nagios