Check disk performance

junkertf · Post by **junkertf** » Thu Mar 29, 2018 4:03 am

Hello Support,

I plan to implement a disk performance monitor plugin in our XI environment. I had found many disk performance monitoring plugin on exchange.nagios.org, all same type, the disk name must come from an ARG field

At first sight the most nice is

https://exchange.nagios.org/directory/P ... 16/details

The reason i ask a question, that we have serious count of RAC environments (>10), where are many multipath disks (over 100 / RAC) witch performance must be monitored.

The question is, that is there such method or plugin wich can monitor by default all the disks metrics on the server (example: tps/s, wKB/s, rKB/s, await) or must i configure all the disk-metric-monitors by hand?

Thank you, best regards,

Ferenc

Post by **lmiltchev** » Thu Mar 29, 2018 10:29 am

The question is, that is there such method or plugin wich can monitor by default all the disks metrics on the server (example: tps/s, wKB/s, rKB/s, await) or must i configure all the disk-metric-monitors by hand?

I am not sure if such plugin exists. However, if you wanted to incorporate many different checks into one, you could use the check_multi plugin:

https://exchange.nagios.org/directory/P ... i/details?

You can set up all of your disk checks in a config file, e.g. multi.cfg:

Code: Select all

command [ tps_per_s ] = <plugin> <options>
command [ wKB_per_s ] = <plugin> <options>
command [ rKB_per_s ] = <plugin> <options>
command [ another_check ] = <different plugin> <options>

and run the following command on the Nagios XI server:

Code: Select all

/usr/local/nagios/libexec/check_multi -f multi.cfg

Once, you verified that your check works fine from the command line, you can create a new command and new service in Nagios XI by following the steps, outlined in the document below:

https://assets.nagios.com/downloads/nag ... ios-XI.pdf

Also, if you had many similar/identical servers that you need to monitor, you could set up one of them. Add all if the disk checks, and make sure they work as expected. Next, clone this host, along with its services as many times as you want via the Bulk Host Cloning And Import wizard:

https://assets.nagios.com/downloads/nag ... Wizard.pdf

Have you tried using NCPA? With NCPA, you can monitor various disc metrics, windows performance counters, etc.

Hope this helps.

junkertf · Post by **junkertf** » Tue Apr 03, 2018 1:38 am

Hello,

Thank you for the update!
I read after and make some checks as you wrote....

Best regards,

Ferenc

scottwilkerson · Post by **scottwilkerson** » Tue Apr 03, 2018 9:44 am

Let us know if you run into troubles

junkertf · Post by **junkertf** » Mon Apr 09, 2018 8:04 am

Hello,

A little state report regarding the thread.

check_multi is a good starting point at first sight, but not a good solution for a long range period.
The reason is, that the disks count can be changed timely (increased with newly attached storage disks or decrased with dettached storage disks), so the graph must be changed for time to time... -> the disk check must be aligned disk by disk.

Currently i have a script to create a list from the disks (local and multipath), and also have a plugin which is check the neccessary metrics.
(linked previously from nagios exchange... produce datas: await, tps, KBw/sec, KBr/sec) I think it is more preferrabeable if i have that metrics by disk.

My problem is that how i configure the check dynamically regarding the disk names (count), and also so way that i must not reconfigure every time the XI for that.

Solution is can be, for that problem if i use check_multi with a dinamicaly (for example by cron script created) check_multi config file on a local server? Eg. check_multi point on a list of the all disk check generated by cron script? But so way i will have one graph with many disk-metrics and not many bydisk graph.... is it the problem root cause understandable?

Second problem is the timing of the checks. The tests shows that on one of our environments one multipath (dm-xx) disk check is cca 5 sec long. So way on the previously mentioned server where we have 70 multipath disks the all-disk check is running more than 5 minutes.... -> so way i can monitor only in 5 minute performance counters (if there is no longer check-run time) and must restart same checks again.
(Actually we have environment where we have more than 150 multipath disks...) So way will be the performance check produce data on every 10 minutes to not reach same check itself?

Thank you, best regards,

Ferenc

Post by **lmiltchev** » Mon Apr 09, 2018 11:55 am

After reading your last post, I tend to think that check_multi is not going to be a good solution for you at all. With so many disk checks, combined into one, you would run into timing issues. Also, if you needed "bydisk" graph, you would need separate checks.

Any automation (monitoring dynamic environments) would require some coding, and it would be more involved.

As of now, you could try utilizing the REST API functionality in Nagios XI. From the Nagios XI web UI, go to Help > REST API Docs > Config Reference, and see the example of how to create a new service object. Perhaps you could create a script (e.g. run on a cron) that utilizes the REST API syntax, and adds the "new" disk checks to Nagios XI.

Note: In Nagios XI 5.5, which should be out sometime this summer, there is going to be an option to auto-configure Unconfigured Objects. So, if you had a script (e.g. run on a cron) on the remote machine that created the disk checks, ran them, and sent the passive check results to Nagiso XI, the "new" services would get automatically added to XI. This functionality is not available yet, so you can either wait for the release of Nagios XI 5.5 or try using the REST API.

Hope this helps.

junkertf · Post by **junkertf** » Tue Apr 10, 2018 7:51 am

We are almost there...

Have a conception and REST API is good starting point...

locally on a remote server
1) First cron script creates a list-file from (local, multipath) disks
2) Second cron script reads the list file by lines, grepping out the disks are configured before (that line contain a "DONE" pattern)
then create a check .sh script for nagios user, and create the matching check-lines in a separate storage-perf.cfg file in nrpe config dir (which is inculded as include_dir) -> so local checks for local/multipath disks also can work with dynamic warn and crit limits,

at these stage we are prepared the local node for the check-script run.

3) same Second script POST a service create config with REST API to the XI server
example:

curl -k -XPOST 'https://nagiosinfrauat.host.net/nagiosx ... W&pretty=1' -d 'host_name=db45.hu.host.com&service_description=DISK Performance&use=BB_NIX_LINUX_HW_Local-DISKperf_SvcTmpl&check_command=check_xi_by_ssh_DISKchk\!check_diskperf_cciss-c0d0&applyconfig=1'

but become error:
{
"error": "Missing required variables",
"missing": [
"max_check_attempts",
"check_interval",
"retry_interval",
"check_period",
"notification_interval",
"notification_period",
"contacts OR contact_groups"
]
}

So the trouble, that i cant use the "USE" directive for example pointing to a ServiceTemplate in the POST URL, so way i must use all the directives that i declare can in a Service Template... (what could make much comfortable the configuration of the neccessary parameters globally by time)

3) lastly the second scrip label the disk line as configured in the listfile with DONE label....

any suggestion?

regards,

Ferenc

Post by **lmiltchev** » Tue Apr 10, 2018 9:13 am

It's a good starting point. You can append the "use" directive to your POST command in order to inherit the missing variables. You don't need to specify each one individually. The trick is to also add "&force=1".

Simple example:

Code: Select all

curl -XPOST "http://192.168.5.151/nagiosxi/api/v1/config/service?apikey=LTltbjobR0X3V5ViDIitYaI8hjsjoFBaOcWYukamF7oAsD8lhJRvSPWq8I3PjTf7&pretty=1" -d "host_name=localhost&service_description=TEST-API-SERVICE&&use=xiwizard_generic_service&contacts=nagiosadmin&force=1&applyconfig=1"

Note: You need to be sure that your template contains all of the required directives, so that your configuration won't fail.

Let us know if this got you through the "error": "Missing required variables" error.

junkertf · Post by **junkertf** » Tue May 29, 2018 6:20 am

Hello,

Its a late comeback

I had created a script group what use check_iostst_v110 plugin from exchange.nagios.com. The script-group can check the phisycal disksk attached the linux host and create modify or delete the neccessary monitors vi API.

Its a bit sure, that the long term solution is ncpa, because the method i created is need huge resources on client side and on our rac environment with more than 50 disks is not a good solution, for example need long run times.

Summa at the end that these case can be closed, and many thanks for the many-many good advice, it si sure that on long term we will use many of them!

Best regards,

Ferenc

scottwilkerson · Post by **scottwilkerson** » Tue May 29, 2018 8:39 am

Good to hear it is resolved!

Nagios Support Forum

Check disk performance

Check disk performance

Re: Check disk performance

Re: Check disk performance

Re: Check disk performance

Re: Check disk performance

Re: Check disk performance

Re: Check disk performance

Re: Check disk performance

Re: Check disk performance

Re: Check disk performance