Page 1 of 1
Capture Performance Data
Posted: Tue Sep 30, 2014 6:10 pm
by rajasegar
NagiosXI 2014R1.2
RHEL 6.5
Mod gearman
If I wanted to capture all the performance data of hosts and services as it is generated and pipe it to a text file, what would be the best way to go about this?
I just don't want to mess around with reading and exporting RRD files.
Thanks
Re: Capture Performance Data
Posted: Wed Oct 01, 2014 1:55 pm
by sreinhardt
We do not do this or suggest it in any way, this is one of the few times where unless you have an absolute need and rrds cannot do what you are looking for, we really don't want you to make this change. There are one or two places where you could alter a command before the data is put into the rrd that might be able to capture it for you. However, and this is a huge one, this will cause massive IO load and data usage on your system and have completely untested results in both standard running, and when doing upgrades. We are talking 2-4 times the size of your rrds in storage at least, as none of this data is compressed or averaged, and it will likely be stored in a text format making it much less optimized. What is the issue with rrdtool? It provides some great options for extracting data, and works in scripts extremely well.
Re: Capture Performance Data
Posted: Wed Oct 01, 2014 6:07 pm
by rajasegar
sreinhardt wrote:We do not do this or suggest it in any way, this is one of the few times where unless you have an absolute need and rrds cannot do what you are looking for, we really don't want you to make this change. There are one or two places where you could alter a command before the data is put into the rrd that might be able to capture it for you. However, and this is a huge one, this will cause massive IO load and data usage on your system and have completely untested results in both standard running, and when doing upgrades. We are talking 2-4 times the size of your rrds in storage at least, as none of this data is compressed or averaged, and it will likely be stored in a text format making it much less optimized. What is the issue with rrdtool? It provides some great options for extracting data, and works in scripts extremely well.
I have almost 10000 RRD files. Wont processing them have the same I/O impact.
Not to mention the hassle of processing them in the first place.
Please advice the place to modify the commands. I/O issues and storage can be easily handled.
We will explore this to see if this is worthwhile in Dev env.
Does anyone have any routines to output RRD to CSV text data?
Re: Capture Performance Data
Posted: Thu Oct 02, 2014 10:05 am
by abrist
At 10k rrds, with the standard retention and steps, you will use a very large amount of data. 1/5 of thee rrd size is used for "yesterday". In plain text, this single day of data uses 580k per check on average. So at 10k checks, we could assume a minimum of 5gb disk usage a day.
Re: Capture Performance Data
Posted: Thu Oct 02, 2014 10:11 am
by abrist
rajasegar wrote:Does anyone have any routines to output RRD to CSV text data?
Take a look at rrd2csv:
https://code.google.com/p/rrd2csv/
Re: Capture Performance Data
Posted: Thu Oct 02, 2014 6:14 pm
by rajasegar
abrist wrote:At 10k rrds, with the standard retention and steps, you will use a very large amount of data. 1/5 of thee rrd size is used for "yesterday". In plain text, this single day of data uses 580k per check on average. So at 10k checks, we could assume a minimum of 5gb disk usage a day.
Storage is not an issue. IO is also not an issue. We have different RAID 10 group to use.
Please advice how to go about this. Thanks.
At the same time we will explore the RRD to CSV as a backup option.
Re: Capture Performance Data
Posted: Fri Oct 03, 2014 9:22 am
by abrist
You will need to either add some commands (like duplicating the move as a copy to an additional folder not watched by the reaper, or creating a script that performs the original move as well as your commands to do what you will with the data. The commands you will need to alter are process-host-perfdata-file-bulk and process-service-perfdata-file-bulk. (the commands below are the default commands for reference - you will need to implement your desired changes at this point in the check-result parsing)
Code: Select all
define command {
command_name process-host-perfdata-file-bulk
command_line /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
}
define command {
command_name process-service-perfdata-file-bulk
command_line /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
}
The format of those files resembles:
Code: Select all
DATATYPE::SERVICEPERFDATA TIMET::1412345626 HOSTNAME::localhost SERVICEDESC::Current Users SERVICEPERFDATA::users=2;20;50;0
SERVICECHECKCOMMAND::check_local_users!20!50 HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD
SERVICEOUTPUT::USERS OK - 2 users currently logged in
Do understand that if your drive fills up due to this issue, or if you break performance data graphing due to a bad command, you will technically have an unsupported change and support will most likely just revert the commands. It will be your responsibility to make sure your changes do not interfere with the perfdata spooler in XI. This will increase i/o on the XI server - be prepared. Also, if you just add a copy command you should understand that within 3 hours you will have so many files in the target directory that the folder will be un-stat()able. So you will need to have another script running on a cron to process those files as they get copied.
Re: Capture Performance Data
Posted: Sat Oct 04, 2014 7:52 pm
by rajasegar
abrist wrote:You will need to either add some commands (like duplicating the move as a copy to an additional folder not watched by the reaper, or creating a script that performs the original move as well as your commands to do what you will with the data. The commands you will need to alter are process-host-perfdata-file-bulk and process-service-perfdata-file-bulk. (the commands below are the default commands for reference - you will need to implement your desired changes at this point in the check-result parsing)
Code: Select all
define command {
command_name process-host-perfdata-file-bulk
command_line /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
}
define command {
command_name process-service-perfdata-file-bulk
command_line /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
}
The format of those files resembles:
Code: Select all
DATATYPE::SERVICEPERFDATA TIMET::1412345626 HOSTNAME::localhost SERVICEDESC::Current Users SERVICEPERFDATA::users=2;20;50;0
SERVICECHECKCOMMAND::check_local_users!20!50 HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD
SERVICEOUTPUT::USERS OK - 2 users currently logged in
Do understand that if your drive fills up due to this issue, or if you break performance data graphing due to a bad command, you will technically have an unsupported change and support will most likely just revert the commands. It will be your responsibility to make sure your changes do not interfere with the perfdata spooler in XI. This will increase i/o on the XI server - be prepared. Also, if you just add a copy command you should understand that within 3 hours you will have so many files in the target directory that the folder will be un-stat()able. So you will need to have another script running on a cron to process those files as they get copied.
Noted on your comments.
Thanks
Re: Capture Performance Data
Posted: Mon Oct 06, 2014 9:08 am
by tmcdonald
Closing thread.