check_esx3.pl output truncated in Nagios/ CPU Load

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

check_esx3.pl output truncated in Nagios/ CPU Load

Post by rajasegar »

XI 2012R2.9 Enterprise Edition
RHEL 6.5
Offloaded DB
Firefox 30.0

Output from command line in XI server, not truncated
[nagios@nagiosprodxi1 cfgprep]$ /usr/local/nagios/libexec/check_esx3.pl -H 10.10.10.10 -u rrrrr -p ppppppp -l VMFS -w 70% -c 90%
ESX3 CRITICAL - storages : datastore1=280270.00 MB (99.80%), ADT1-TransportNode01-1-VNX7500=146616.00 MB (7.16%), ADT1-TransportNode01-2-VNX7500=224456.00 MB (14.62%), ADT1-TransportNode02-VNX7500=445628.00 MB (21.76%), ADT1-TransportNode03-VNX7500=445626.00 MB (21.76%), ADT1-ControlNode-VNX7500=384112.00 MB (18.76%) | datastore1=99.80%;70;90 ADT1-TransportNode01-1-VNX7500=7.16%;70;90 ADT1-TransportNode01-2-VNX7500=14.62%;70;90 ADT1-TransportNode02-VNX7500=21.76%;70;90 ADT1-TransportNode03-VNX7500=21.76%;70;90 ADT1-ControlNode-VNX7500=18.76%;70;90


Truncated Output in XI screen
325-06-2014 02-36-22 PM.png
Please advice how to fix this issue. Thanks

Another problem, is the very high CPU load.
When check_esx3.pl runs it hogs the CPU
25-06-2014 05-48-59 PM.png
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by slansing »

You can use '-h' on that plugin to get the full help output to sculpt your command:

Code: Select all

/usr/local/nagios/libexec/check_esx3.pl -h
Looks like just passing vmfs is giving you all datastore info, it looks like there should be a way to narrow that down I'll have to look around a bit. Something else you may want to take a look at is Box293's VMA hook-in vmware monitoring plugin:

http://exchange.nagios.org/directory/Pl ... re/details

The esx3 plugin does take quite a bit of resources based on how long you are querying the system, and for how much data. What type of hardware does your XI server have under it? You mentioned that you had a lot of users, which will also increase the load based on what they may be doing "running reports" etc.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by Box293 »

To fix the truncated output we need to increase the size of the tables in Nagios XI.
Execute the following commands at the CLI:

Code: Select all

    echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
The next time the check is executed the output will not be truncated in the XI interface.

As for the high CPU load, I recommend what slansing said and implement the box293_check_vmware plugin / solution as it offloads the plugin to a vMA appliance.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by rajasegar »

slansing wrote:You can use '-h' on that plugin to get the full help output to sculpt your command:

Code: Select all

/usr/local/nagios/libexec/check_esx3.pl -h
Looks like just passing vmfs is giving you all datastore info, it looks like there should be a way to narrow that down I'll have to look around a bit. Something else you may want to take a look at is Box293's VMA hook-in vmware monitoring plugin:

http://exchange.nagios.org/directory/Pl ... re/details

The esx3 plugin does take quite a bit of resources based on how long you are querying the system, and for how much data. What type of hardware does your XI server have under it? You mentioned that you had a lot of users, which will also increase the load based on what they may be doing "running reports" etc.
I can hardcode the Datastores but change management process needs to be perfect E2E which it is not.

My DB server is offloaded. XI is using 8 cores and 8GB ram and storage is on Raid 10.
Most of the time it is 70 - 80% idle when the check_esx3 is not running.
The users are only 2 currently. It is not fully deployed yet.

I will take a look at the plugin you recommended. Thanks
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by rajasegar »

Box293 wrote:To fix the truncated output we need to increase the size of the tables in Nagios XI.
Execute the following commands at the CLI:

Code: Select all

    echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
The next time the check is executed the output will not be truncated in the XI interface.

As for the high CPU load, I recommend what slansing said and implement the box293_check_vmware plugin / solution as it offloads the plugin to a vMA appliance.
Thanks for the assistance. I will check out the plugin.
BTW what is a vMA appliance? Sorry not well versed in this.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by Box293 »

rajasegar wrote:BTW what is a vMA appliance?
vMA = VMware vSphere Management Assistant.
This is a free VM provided by VMware to manage ESXi and ESX servers (and vCenter). ESXi is a locked down OS, so VMware created vMA to allow you to do a lot of stuff to ESXi servers remotely.

The plugin simply leverages the VMware SDK which runs on PERL, so by using the vMA as a base to run the plugin on makes it a lot easier as it already has everything installed.
rajasegar wrote:Most of the time it is 70 - 80% idle when the check_esx3 is not running.
When you run certain checks with your existing check_esx3 plugin, the SDK is obtaining a heck of a lot of information from vCenter as it is querying all of the datastores, which means it uses a lot more CPU and memory to get the job done. box293_check_vmware was wrtitten to be as efficient as possible, only obtaining the information it needs using the VMware SDK. However even with this efficient design, it still needs to be offloaded to vMA as there is too much CPU and memory overhead, which can at times cripple a Nagios server.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by rajasegar »

Box293 wrote:
rajasegar wrote:BTW what is a vMA appliance?
vMA = VMware vSphere Management Assistant.
This is a free VM provided by VMware to manage ESXi and ESX servers (and vCenter). ESXi is a locked down OS, so VMware created vMA to allow you to do a lot of stuff to ESXi servers remotely.

The plugin simply leverages the VMware SDK which runs on PERL, so by using the vMA as a base to run the plugin on makes it a lot easier as it already has everything installed.
rajasegar wrote:Most of the time it is 70 - 80% idle when the check_esx3 is not running.
When you run certain checks with your existing check_esx3 plugin, the SDK is obtaining a heck of a lot of information from vCenter as it is querying all of the datastores, which means it uses a lot more CPU and memory to get the job done. box293_check_vmware was wrtitten to be as efficient as possible, only obtaining the information it needs using the VMware SDK. However even with this efficient design, it still needs to be offloaded to vMA as there is too much CPU and memory overhead, which can at times cripple a Nagios server.
1) Can your plugin be run on the Nagios server? It will take a very long time to go through the Security people, setup VM etc etc.

2) Will querying individual datastore reduce the load?

Thanks.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by Box293 »

rajasegar wrote:1) Can your plugin be run on the Nagios server?
It's possible however there will be no support. I specifically designed this plugin to run from a vMA appliance. I've killed Nagios XI servers doing a small amount of VMware monitoring directly from the XI server so I don't want to be responsible for creating something that could bring down a Nagios server.
rajasegar wrote:It will take a very long time to go through the Security people, setup VM etc etc.
Let me put a scenario to you. All the air conditioners in the server room stop working. Temperature goes up. Nagios happens to be overloaded with VMware checks and the temperature monitoring checks do not happen and alerts do not get sent out. In a matter of 15 minutes all of your equipment shuts down due to overheating.

How are you going to explain that to your boss when you knew there was a solution available that could have prevented this?
rajasegar wrote:2) Will querying individual datastore reduce the load?
It's possible however you would need to go and read the code in the check_esx3.
Honestly, it's not about one particular check, but when multiple VMware checks are running at the same time, they have a chain-reaction type of effect. This is why offloading it make sense.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by rajasegar »

Box293 wrote:
rajasegar wrote:1) Can your plugin be run on the Nagios server?
It's possible however there will be no support. I specifically designed this plugin to run from a vMA appliance. I've killed Nagios XI servers doing a small amount of VMware monitoring directly from the XI server so I don't want to be responsible for creating something that could bring down a Nagios server.
rajasegar wrote:It will take a very long time to go through the Security people, setup VM etc etc.
Let me put a scenario to you. All the air conditioners in the server room stop working. Temperature goes up. Nagios happens to be overloaded with VMware checks and the temperature monitoring checks do not happen and alerts do not get sent out. In a matter of 15 minutes all of your equipment shuts down due to overheating.

How are you going to explain that to your boss when you knew there was a solution available that could have prevented this?
rajasegar wrote:2) Will querying individual datastore reduce the load?
It's possible however you would need to go and read the code in the check_esx3.
Honestly, it's not about one particular check, but when multiple VMware checks are running at the same time, they have a chain-reaction type of effect. This is why offloading it make sense.
Let me see if I can do something about concurrency with dependencies and I will check out the monitoring locally in Dev.
The VMA thing as mentioned will take some time.

Thanks
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: check_esx3.pl output truncated in Nagios/ CPU Load

Post by scottwilkerson »

Box293 wrote:To fix the truncated output we need to increase the size of the tables in Nagios XI.
Execute the following commands at the CLI:

Code: Select all

    echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

    echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
The next time the check is executed the output will not be truncated in the XI interface.

As for the high CPU load, I recommend what slansing said and implement the box293_check_vmware plugin / solution as it offloads the plugin to a vMA appliance.
As an FYI, this code does have a slight bug in it and should read

Code: Select all

echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null;alter table nagios_servicestatus modify long_output varchar(65535) not null;alter table nagios_servicestatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

echo "use nagios;alter table nagios_hoststatus modify output varchar(65535) not null;alter table nagios_hoststatus modify long_output varchar(65535) not null;alter table nagios_hoststatus modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

echo "use nagios;alter table nagios_servicechecks modify output varchar(65535) not null;alter table nagios_servicechecks modify long_output varchar(65535) not null;alter table nagios_servicechecks modify perfdata varchar(65535) not null;" | mysql -pnagiosxi

echo "use nagios;alter table nagios_hostchecks modify output varchar(65535) not null;alter table nagios_hostchecks modify long_output varchar(65535) not null;alter table nagios_hostchecks modify perfdata varchar(65535) not null;" | mysql -pnagiosxi
Or slightly cleaner

Code: Select all

echo "
alter table nagios_servicestatus modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_hoststatus modify output varchar(65535) not null, modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_servicechecks modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_hostchecks modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
" | mysql -pnagiosxi nagios
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked