Page 2 of 3

Re: check_multi support

Posted: Mon Nov 28, 2011 12:24 pm
by mguthrie
Not currently, this remains on our TODO list, but we haven't gotten to it yet.

Re: check_multi support

Posted: Mon Nov 28, 2011 2:53 pm
by mguthrie
On the send_nrdp.php file, try editing lines 403-405 from this:

Code: Select all

               
           while (!feof($fp)) {
                $response .= fgets($fp, 128);
            }}
To this:

Code: Select all

           
             while (!feof($fp)) {
                $response .= fgets($fp);
            }
Let us know if that resolves the issue and we'll patch the send script.

Re: check_multi support

Posted: Tue Nov 29, 2011 2:42 am
by SDohmen
From what i can see nothing has changed. The output of the dell script is as follows:

Code: Select all

OK - System: PowerEdge R510 II, SN: *******, 32 GB ram (8 dimms), 1 logical drives, 6 physical drivesbr/----- BIOS=1.5.3 10/25/2010, iDRAC6=1.54br/----- Ctrl 0 [PERC H700 Integrated]: Fw=12.10.0-0025, Dr=00.00.04.27-SL1br/----- Encl 0:0:0 [Backplane]: Fw=
and the performance data as follows:

Code: Select all

T0_System_Board_Ambient=21C;45;50 W2_System_Board_System_Level=196W;0;0 A0_PS_1_Current=0.4A;0;0 A1_PS_2_Current=0.4A;0;0 V19_PS_1_Voltage=224V;0;0 V20_PS_2_Voltage=224V;0;0 F0_System_Board_FAN_MOD_1A=3240rpm;0;0 F1_System_Board_FAN_MOD_2A=3240rpm;0;0 F2_
What i find strange is that the output of the script and performance data both been separated and only send part of the complete message. The complete message is as follows:

Code: Select all

OK - System: 'PowerEdge R510 II', SN: '*******', 32 GB ram (8 dimms), 1 logical drives, 6 physical drives
----- BIOS='1.5.3 10/25/2010', iDRAC6='1.54'
----- Ctrl 0 [PERC H700 Integrated]: Fw='12.10.0-0025', Dr='00.00.04.27-SL1'
----- Encl 0:0:0 [Backplane]: Fw='1.10'
----- Encl 0:1:0 [Backplane]: Fw='1.10'
----- OpenManage Server Administrator (OMSA) version: '6.4.0'

Code: Select all

T0_System_Board_Ambient=21C;45;50 W2_System_Board_System_Level=196W;0;0 A0_PS_1_Current=0.4A;0;0 A1_PS_2_Current=0.4A;0;0 V19_PS_1_Voltage=224V;0;0 V20_PS_2_Voltage=224V;0;0 F0_System_Board_FAN_MOD_1A=3240rpm;0;0 F1_System_Board_FAN_MOD_2A=3240rpm;0;0 F2_System_Board_FAN_MOD_4A=3360rpm;0;0 F3_System_Board_FAN_MOD_5A=3240rpm;0;0 F4_System_Board_FAN_MOD_3A=3240rpm;0;0
I do notice the br/ lines which are being sent as well but i doubt they are the cause since it stops somewhere in the middle of the line. The performance data doesn't even have line breaks.

Re: check_multi support

Posted: Tue Nov 29, 2011 2:26 pm
by mguthrie
Yeah, it's still cutting off at exactly 255 bytes. I'll keep looking and see if I can find out where the memory cap is.

Re: check_multi support

Posted: Tue Nov 29, 2011 2:47 pm
by SDohmen
Is there perhaps a way to check where its being capped off? For example does it happen on the slave or on the central.

Re: check_multi support

Posted: Wed Nov 30, 2011 5:48 pm
by mguthrie
Can you send me the entire output string that you're attempting to pass exactly as it's being passed to NRDP?

Re: check_multi support

Posted: Thu Dec 01, 2011 5:43 am
by SDohmen
Ok, lets see.

The output you requested is below. This is the input found in the nagios.log after the service check has passed. Since i dont see any nrdp debugging information i cant show you this.

Code: Select all

[1322694000] CURRENT SERVICE STATE: defauwes-xen01;adm-hardware;OK;HARD;1;OK - System: 'PowerEdge R510 II', SN: '*******', 32 GB ram (8 dimms), 1 logical drives, 6 physical drives<br/>----- BIOS='1.5.3 10/25/2010', iDRAC6='1.54'<br/>----- Ctrl 0 [PERC H700 Integrated]: Fw='12.10.0-0025', Dr='00.00.04.27-SL1'<br/>----- Encl 0:0:0 [Backplane]: Fw='1.10'<br/>----- Encl 0:1:0 [Backplane]: Fw='1.10'<br/>----- OpenManage Server Administrator (OMSA) version: '6.4.0'
For some reason i dont see any log from the performance data which is also being transfered from the same check. I have added a screenshot showing this data. If there is a way to get debugging information from nrdp in the log please let me know how i can make it visible.

Re: check_multi support

Posted: Thu Dec 01, 2011 11:08 am
by mguthrie

Code: Select all

For some reason i dont see any log from the performance data which is also being transfered from the same check. I have added a screenshot showing this data. If there is a way to get debugging information from nrdp in the log please let me know how i can make it visible.
Nagios doesn't directly handle the performance data, that gets processed by the PNP daemon (NPCD). The logs for those are:

/usr/local/nagios/var/perfdata.log
/usr/local/nagios/var/npcd.log

As for the plugin output, after doing some testing, and being confirmed by your screenshot as well. This isn't an issue with the data being chopped off by NRDP, Nagios is receiving the full output, and it can be viewed be looking at the service in Nagios Core. If it's not producing performance graphs, then the performance data has a syntax issue, or there could also be a problem with the graph template.

You can check an rrd to see if it has any data in it with the following commands:

Code: Select all

cd /usr/local/nagios/share/perfdata/<hostname>
rrdtool fetch <rrdfile>.rrd AVERAGE
If you're getting all "nan"s for the data, then it's a syntax issue.

It would have been helpful to know that you were able to see the full plugin output in the Core interface, it would have saved some debugging time ; )

Re: check_multi support

Posted: Thu Dec 01, 2011 1:14 pm
by SDohmen
I just checked the rrd file of the host with all the data and i see all data passing by. On the end however i see 2 lines with nan nan nan nan.

Code: Select all

1322765520: 2.2000000000e+01 1.8200000000e+02 4.0000000000e-01 4.0000000000e-01 2.2800000000e+02 2.2800000000e+02 3.3600000000e+03 3.2400000000e+03 3.3448000000e+03 3.2400000000e+03 3.2400000000e+03
1322765580: 2.2000000000e+01 1.8200000000e+02 4.0000000000e-01 4.0000000000e-01 2.2800000000e+02 2.2800000000e+02 3.3600000000e+03 3.2400000000e+03 3.3448000000e+03 3.2400000000e+03 3.2400000000e+03
1322765640: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
1322765700: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
I just checked the output but i cant see any char that might break the output. See the post below for a best guess.

Re: check_multi support

Posted: Thu Dec 01, 2011 1:52 pm
by SDohmen
I think i have some more info.

I was browsing all the logs when i came across the nagios.log. Here i found all the checks that where passed to the central from all slaves. Because of this i decided to search for the data that was being cut off and i found the following:

Code: Select all

[1322694000] CURRENT SERVICE STATE: defauwes-xen01;adm-hardware;OK;HARD;1;OK - System: PowerEdge R510 II, SN: *******, 32 GB ram (8 dimms), 1 logical drives, 6 physical drivesbr/----- BIOS=1.5.3 10/25/2010, iDRAC6=1.54br/----- Ctrl 0 [PERC H700 Integrated]: Fw=12.10.0-0025, Dr=00.00.04.27-SL1br/----- Encl 0:0:0 [Backplane]: Fw=1.10br/----- Encl 0:1:0 [Backplane]: Fw=1.10br/----- OpenManage Server Administrator (OMSA) version: 6.4.0
From this service the screenshot is
check1.PNG
A other example is the following:

Code: Select all

[1322694000] CURRENT SERVICE STATE: defauwes-ts002;events;CRITICAL;HARD;3;check_multi CRITICAL - 102 plugins checked, 1 critical (event_id_1074), 51 unknown (event_id_1501, event_id_1699, event_id_2001, event_id_2013, event_id_2019, event_id_2020, event_id_2025, event_id_2026, event_id_2027, event_id_2507, event_id_2511, event_id_3002, event_id_3003, event_id_3005, event_id_3020, event_id_3055, event_id_3056, event_id_3057, event_id_3096, event_id_4003, event_id_4005, event_id_4006, event_id_4102, event_id_4103, event_id_4105, event_id_4198, event_id_4199, event_id_4226, event_id_4319, event_id_5148, event_id_5149, event_id_5782, event_id_5789, event_id_6000, event_id_6008, event_id_7000, event_id_7016, event_id_7022, event_id_7023, event_id_7024, event_id_7025, event_id_7027, event_id_7031, event_id_7033, event_id_7034, event_id_7037, event_id_7038, event_id_12290, event_id_12302, event_id_12305, event_id_16387), 50 ok
In this case its a multi check which sends massive amounts of data which according to the log are being received but not being shown on the page. A screenshot for this is as follows:
check2.PNG

Seeing all this i would guess that the central cant show the complete output as its being limited to 255 characters. Is there a way to change this since we have multiple checks that have several 1000 chars to show.