Capture Performance Data

rajasegar · Post by **rajasegar** » Tue Sep 30, 2014 6:10 pm

NagiosXI 2014R1.2
RHEL 6.5
Mod gearman

If I wanted to capture all the performance data of hosts and services as it is generated and pipe it to a text file, what would be the best way to go about this?
I just don't want to mess around with reading and exporting RRD files.

Thanks

sreinhardt · Post by **sreinhardt** » Wed Oct 01, 2014 1:55 pm

We do not do this or suggest it in any way, this is one of the few times where unless you have an absolute need and rrds cannot do what you are looking for, we really don't want you to make this change. There are one or two places where you could alter a command before the data is put into the rrd that might be able to capture it for you. However, and this is a huge one, this will cause massive IO load and data usage on your system and have completely untested results in both standard running, and when doing upgrades. We are talking 2-4 times the size of your rrds in storage at least, as none of this data is compressed or averaged, and it will likely be stored in a text format making it much less optimized. What is the issue with rrdtool? It provides some great options for extracting data, and works in scripts extremely well.

rajasegar · Post by **rajasegar** » Wed Oct 01, 2014 6:07 pm

sreinhardt wrote:We do not do this or suggest it in any way, this is one of the few times where unless you have an absolute need and rrds cannot do what you are looking for, we really don't want you to make this change. There are one or two places where you could alter a command before the data is put into the rrd that might be able to capture it for you. However, and this is a huge one, this will cause massive IO load and data usage on your system and have completely untested results in both standard running, and when doing upgrades. We are talking 2-4 times the size of your rrds in storage at least, as none of this data is compressed or averaged, and it will likely be stored in a text format making it much less optimized. What is the issue with rrdtool? It provides some great options for extracting data, and works in scripts extremely well.

I have almost 10000 RRD files. Wont processing them have the same I/O impact.
Not to mention the hassle of processing them in the first place.

Please advice the place to modify the commands. I/O issues and storage can be easily handled.
We will explore this to see if this is worthwhile in Dev env.

Does anyone have any routines to output RRD to CSV text data?

abrist · Post by **abrist** » Thu Oct 02, 2014 10:05 am

At 10k rrds, with the standard retention and steps, you will use a very large amount of data. 1/5 of thee rrd size is used for "yesterday". In plain text, this single day of data uses 580k per check on average. So at 10k checks, we could assume a minimum of 5gb disk usage a day.

abrist · Post by **abrist** » Thu Oct 02, 2014 10:11 am

rajasegar wrote:Does anyone have any routines to output RRD to CSV text data?

Take a look at rrd2csv:
https://code.google.com/p/rrd2csv/

rajasegar · Post by **rajasegar** » Thu Oct 02, 2014 6:14 pm

abrist wrote:At 10k rrds, with the standard retention and steps, you will use a very large amount of data. 1/5 of thee rrd size is used for "yesterday". In plain text, this single day of data uses 580k per check on average. So at 10k checks, we could assume a minimum of 5gb disk usage a day.

Storage is not an issue. IO is also not an issue. We have different RAID 10 group to use.
Please advice how to go about this. Thanks.

At the same time we will explore the RRD to CSV as a backup option.

abrist · Post by **abrist** » Fri Oct 03, 2014 9:22 am

You will need to either add some commands (like duplicating the move as a copy to an additional folder not watched by the reaper, or creating a script that performs the original move as well as your commands to do what you will with the data. The commands you will need to alter are process-host-perfdata-file-bulk and process-service-perfdata-file-bulk. (the commands below are the default commands for reference - you will need to implement your desired changes at this point in the check-result parsing)

Code: Select all

define command {
       command_name                             process-host-perfdata-file-bulk
       command_line                             /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
}
define command {
       command_name                             process-service-perfdata-file-bulk
       command_line                             /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
}

The format of those files resembles:

Code: Select all

DATATYPE::SERVICEPERFDATA       TIMET::1412345626       HOSTNAME::localhost     SERVICEDESC::Current Users      SERVICEPERFDATA::users=2;20;50;0
        SERVICECHECKCOMMAND::check_local_users!20!50    HOSTSTATE::UP   HOSTSTATETYPE::HARD     SERVICESTATE::OK        SERVICESTATETYPE::HARD
SERVICEOUTPUT::USERS OK - 2 users currently logged in

Do understand that if your drive fills up due to this issue, or if you break performance data graphing due to a bad command, you will technically have an unsupported change and support will most likely just revert the commands. It will be your responsibility to make sure your changes do not interfere with the perfdata spooler in XI. This will increase i/o on the XI server - be prepared. Also, if you just add a copy command you should understand that within 3 hours you will have so many files in the target directory that the folder will be un-stat()able. So you will need to have another script running on a cron to process those files as they get copied.

rajasegar · Post by **rajasegar** » Sat Oct 04, 2014 7:52 pm

abrist wrote:You will need to either add some commands (like duplicating the move as a copy to an additional folder not watched by the reaper, or creating a script that performs the original move as well as your commands to do what you will with the data. The commands you will need to alter are process-host-perfdata-file-bulk and process-service-perfdata-file-bulk. (the commands below are the default commands for reference - you will need to implement your desired changes at this point in the check-result parsing)
Code: Select all
define command {
       command_name                             process-host-perfdata-file-bulk
       command_line                             /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
}
define command {
       command_name                             process-service-perfdata-file-bulk
       command_line                             /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
}
The format of those files resembles:
Code: Select all
DATATYPE::SERVICEPERFDATA       TIMET::1412345626       HOSTNAME::localhost     SERVICEDESC::Current Users      SERVICEPERFDATA::users=2;20;50;0
        SERVICECHECKCOMMAND::check_local_users!20!50    HOSTSTATE::UP   HOSTSTATETYPE::HARD     SERVICESTATE::OK        SERVICESTATETYPE::HARD
SERVICEOUTPUT::USERS OK - 2 users currently logged in
Do understand that if your drive fills up due to this issue, or if you break performance data graphing due to a bad command, you will technically have an unsupported change and support will most likely just revert the commands. It will be your responsibility to make sure your changes do not interfere with the perfdata spooler in XI. This will increase i/o on the XI server - be prepared. Also, if you just add a copy command you should understand that within 3 hours you will have so many files in the target directory that the folder will be un-stat()able. So you will need to have another script running on a cron to process those files as they get copied.

Noted on your comments.
Thanks

tmcdonald · Post by **tmcdonald** » Mon Oct 06, 2014 9:08 am

Closing thread.

Nagios Support Forum

Capture Performance Data

Capture Performance Data

Re: Capture Performance Data

Re: Capture Performance Data

Re: Capture Performance Data

Re: Capture Performance Data

Re: Capture Performance Data

Re: Capture Performance Data

Re: Capture Performance Data

Re: Capture Performance Data