Page 1 of 1

Graphs Not working Sorta

Posted: Tue Oct 01, 2013 11:42 am
by masmith
My problem :


I have a few systems that the amont of filesystems are filling the buffer for performance data.

I have made the following changes and it does not seem to help.

Edit /include/nagios.h
#define MAX_PLUGIN_OUTPUT_LENGTH 32768

Edit /include/common.h

#define MAX_INPUT_BUFFER 32768
#define MAX_EXTERNAL_COMMAND_LENGTH 32768


Does anyone have a suggetions as to why this did not solve the problem.


This is a copy of the performance data that is getting sent to service-perfdata. After the bold /data/oracl I have several more filesystems to report.

DATATYPE::SERVICEPERFDATA TIMET::1380645136 HOSTNAME::coldevap1-jpch SERVICEDESC::Check CPU Stats SERVICEPERFDATA::cpu_user=0%;70;90; cpu_sys=1%;70;90; cpu_iowait=0%;70;90; cpu_idle=99%; SERVICECHECKCOMMAND::check_nrpe!check_cpu!""!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::CPU OK : user=0% system=1% iowait=0% idle=99%
DATATYPE::SERVICEPERFDATA TIMET::1380645145 HOSTNAME::coloradw1 SERVICEDESC::Check Disks via NRPE SERVICEPERFDATA::/=1994MB;5442;5744;0;6047 /dev/shm=1MB;21720;22927;0;24134 /boot=48MB;88;93;0;98 /home=484MB;4081;4308;0;4535 /nsa/oracle=4888MB;9071;9575;0;10079 /usr=1823MB;2806;2962;0;3118 /var=509MB;1813;1914;0;2015 /nsa/java=1010MB;1359;1435;0;1511 /data/oracle/nsadwp/data01=48758MB;72570;76602;0;80634 /data/oracle/nsadwp/data02=34843MB;54427;57451;0;60475 /data/oracle/nsadwp/data03=37596MB;54427;57451;0;60475 /data/oracle/nsadwp/data04=21832MB;54427;57451;0;60475 /data/oracle/nsadwp/data05=2227MB;54427;57451;0;60475 /data/oracle/nsadwp/indx01=35909MB;45356;47876;0;50396 /data/oracle/nsadwp/indx02=21150MB;45356;47876;0;50396 /data/oracle/nsadwp/indx03=21145MB;45356;47876;0;50396 /data/oracle/nsadwp/indx04=21620MB;45356;47876;0;50396 /data/oracle/nsadwp/indx05=11785MB;45356;47876;0;50396 /data/oracle/nsadwp/sys01=2615MB;18142;19150;0;20158 /data/oracle/nsadwp/sys02=151MB;4535;4787;0;5039 /data/oracle/nsadwp/arch=51571MB;181427;191506;0;201586 /data/oracle/nsadwp/backup=50937MB;226784;239383;0;251983 /data/oracl SERVICECHECKCOMMAND::check_nrpe!check_disk_all!""!!!!!! HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD SERVICEOUTPUT::DISK OK

Re: Graphs Not working Sorta

Posted: Wed Oct 02, 2013 8:23 am
by slansing
Have you tried cutting some of the performance data out that you do not normally use? Or, have you tried possibly adding multiple services and each service would report a section of your performance data so one does not get truncated?

Re: Graphs Not working Sorta

Posted: Mon Oct 07, 2013 1:27 pm
by masmith
I am sure the mentioned approaches would work. but then it becomes a management overhead.

With this being a disk check I would like to only have a single check that check all the filesystems. Which works on all servers expect a few. I would like to address the issue so if I add more servers in the furture that has a large amount of filesystems I dont have to do anything but assign it to my service group and forget about it.

Re: Graphs Not working Sorta

Posted: Mon Oct 07, 2013 2:53 pm
by sreinhardt
So to clarify, you are looking to do a single check, that checks all of your servers for various disks, and combines the performance data? Alternatively, are you looking to write a single check that can be applied multiple servers via separate checks and performance data collections?

Re: Graphs Not working Sorta

Posted: Mon Oct 07, 2013 4:00 pm
by masmith
I am using the standard plugin

check_disk v1.4.16 (nagios-plugins 1.4.16)
Copyright (c) 1999 Ethan Galstad <[email protected]>
Copyright (c) 1999-2008 Nagios Plugin Development Team
<[email protected]>

I have the check already in place and works fine for a 100 plus servers. i have a few that happen to have a lot more filesystems on it and the ouput is larger then what nagios will accept so it is truncating that output .

So when I got to digging into the issue I found that the buffer was capped .

This is directly from the code

NOTE: Plugin length is artificially capped at 8k to prevent runaway plugins from returning MBs/GBs of data
back to Nagios. If you increase the 8k cap by modifying this value, make sure you also increase the value
of MAX_EXTERNAL_COMMAND_LENGTH in common.h to allow for passive checks results received through the external
command file. EG 10/19/07

So i increased it to allow for more data to be passed and it did not make a differance. So that is why I posted to see if one of the developers could explain why making the change that I made did not solve my problem.

If is more of a Nagios core issue then it is about the service checks and how it is setup up. I understand there are many ways to get around this But I hate having execptions for just a few servers.

Re: Graphs Not working Sorta

Posted: Tue Oct 08, 2013 12:53 pm
by sreinhardt
Well this was initially raised to 8k from 4k specifically for XI. Altering this further, is possible, however we will not support this as there are many unknown issues that could come up. Specifically relating to nrpe and the plugin that sends nrpe commands, could be buffer overflows in places where bounds checking is still limited to the 8k and you would be now sending much more data. In addition to that you would run into many issues rrd files as they would have to increase in track size, which is not possible without entire recreation. So while you changed the plugin, or agent that is sending this data, there are several other places that are prone to issues after this. Shortening all that up, the change did not make a difference, as there are more than just that location that would need to be modified to send\accept the increased amount.

At this point, unfortunately I do have to tell you that the only real option that is supported, is to logically split the disk checks into two or more checks. As mentioned, I would suggest breaking them into checks that are logical to that device such as tempfs, netmounted, and local disk or application specific mounts\disks.

Sorry to be a buzz kill.