check_multi support

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: check_multi support

Post by mguthrie »

Not currently, this remains on our TODO list, but we haven't gotten to it yet.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: check_multi support

Post by mguthrie »

On the send_nrdp.php file, try editing lines 403-405 from this:

Code: Select all

               
           while (!feof($fp)) {
                $response .= fgets($fp, 128);
            }}
To this:

Code: Select all

           
             while (!feof($fp)) {
                $response .= fgets($fp);
            }
Let us know if that resolves the issue and we'll patch the send script.
SDohmen
Posts: 240
Joined: Thu Jun 30, 2011 4:14 am

Re: check_multi support

Post by SDohmen »

From what i can see nothing has changed. The output of the dell script is as follows:

Code: Select all

OK - System: PowerEdge R510 II, SN: *******, 32 GB ram (8 dimms), 1 logical drives, 6 physical drivesbr/----- BIOS=1.5.3 10/25/2010, iDRAC6=1.54br/----- Ctrl 0 [PERC H700 Integrated]: Fw=12.10.0-0025, Dr=00.00.04.27-SL1br/----- Encl 0:0:0 [Backplane]: Fw=
and the performance data as follows:

Code: Select all

T0_System_Board_Ambient=21C;45;50 W2_System_Board_System_Level=196W;0;0 A0_PS_1_Current=0.4A;0;0 A1_PS_2_Current=0.4A;0;0 V19_PS_1_Voltage=224V;0;0 V20_PS_2_Voltage=224V;0;0 F0_System_Board_FAN_MOD_1A=3240rpm;0;0 F1_System_Board_FAN_MOD_2A=3240rpm;0;0 F2_
What i find strange is that the output of the script and performance data both been separated and only send part of the complete message. The complete message is as follows:

Code: Select all

OK - System: 'PowerEdge R510 II', SN: '*******', 32 GB ram (8 dimms), 1 logical drives, 6 physical drives
----- BIOS='1.5.3 10/25/2010', iDRAC6='1.54'
----- Ctrl 0 [PERC H700 Integrated]: Fw='12.10.0-0025', Dr='00.00.04.27-SL1'
----- Encl 0:0:0 [Backplane]: Fw='1.10'
----- Encl 0:1:0 [Backplane]: Fw='1.10'
----- OpenManage Server Administrator (OMSA) version: '6.4.0'

Code: Select all

T0_System_Board_Ambient=21C;45;50 W2_System_Board_System_Level=196W;0;0 A0_PS_1_Current=0.4A;0;0 A1_PS_2_Current=0.4A;0;0 V19_PS_1_Voltage=224V;0;0 V20_PS_2_Voltage=224V;0;0 F0_System_Board_FAN_MOD_1A=3240rpm;0;0 F1_System_Board_FAN_MOD_2A=3240rpm;0;0 F2_System_Board_FAN_MOD_4A=3360rpm;0;0 F3_System_Board_FAN_MOD_5A=3240rpm;0;0 F4_System_Board_FAN_MOD_3A=3240rpm;0;0
I do notice the br/ lines which are being sent as well but i doubt they are the cause since it stops somewhere in the middle of the line. The performance data doesn't even have line breaks.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: check_multi support

Post by mguthrie »

Yeah, it's still cutting off at exactly 255 bytes. I'll keep looking and see if I can find out where the memory cap is.
SDohmen
Posts: 240
Joined: Thu Jun 30, 2011 4:14 am

Re: check_multi support

Post by SDohmen »

Is there perhaps a way to check where its being capped off? For example does it happen on the slave or on the central.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: check_multi support

Post by mguthrie »

Can you send me the entire output string that you're attempting to pass exactly as it's being passed to NRDP?
SDohmen
Posts: 240
Joined: Thu Jun 30, 2011 4:14 am

Re: check_multi support

Post by SDohmen »

Ok, lets see.

The output you requested is below. This is the input found in the nagios.log after the service check has passed. Since i dont see any nrdp debugging information i cant show you this.

Code: Select all

[1322694000] CURRENT SERVICE STATE: defauwes-xen01;adm-hardware;OK;HARD;1;OK - System: 'PowerEdge R510 II', SN: '*******', 32 GB ram (8 dimms), 1 logical drives, 6 physical drives<br/>----- BIOS='1.5.3 10/25/2010', iDRAC6='1.54'<br/>----- Ctrl 0 [PERC H700 Integrated]: Fw='12.10.0-0025', Dr='00.00.04.27-SL1'<br/>----- Encl 0:0:0 [Backplane]: Fw='1.10'<br/>----- Encl 0:1:0 [Backplane]: Fw='1.10'<br/>----- OpenManage Server Administrator (OMSA) version: '6.4.0'
For some reason i dont see any log from the performance data which is also being transfered from the same check. I have added a screenshot showing this data. If there is a way to get debugging information from nrdp in the log please let me know how i can make it visible.
You do not have the required permissions to view the files attached to this post.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: check_multi support

Post by mguthrie »

Code: Select all

For some reason i dont see any log from the performance data which is also being transfered from the same check. I have added a screenshot showing this data. If there is a way to get debugging information from nrdp in the log please let me know how i can make it visible.
Nagios doesn't directly handle the performance data, that gets processed by the PNP daemon (NPCD). The logs for those are:

/usr/local/nagios/var/perfdata.log
/usr/local/nagios/var/npcd.log

As for the plugin output, after doing some testing, and being confirmed by your screenshot as well. This isn't an issue with the data being chopped off by NRDP, Nagios is receiving the full output, and it can be viewed be looking at the service in Nagios Core. If it's not producing performance graphs, then the performance data has a syntax issue, or there could also be a problem with the graph template.

You can check an rrd to see if it has any data in it with the following commands:

Code: Select all

cd /usr/local/nagios/share/perfdata/<hostname>
rrdtool fetch <rrdfile>.rrd AVERAGE
If you're getting all "nan"s for the data, then it's a syntax issue.

It would have been helpful to know that you were able to see the full plugin output in the Core interface, it would have saved some debugging time ; )
SDohmen
Posts: 240
Joined: Thu Jun 30, 2011 4:14 am

Re: check_multi support

Post by SDohmen »

I just checked the rrd file of the host with all the data and i see all data passing by. On the end however i see 2 lines with nan nan nan nan.

Code: Select all

1322765520: 2.2000000000e+01 1.8200000000e+02 4.0000000000e-01 4.0000000000e-01 2.2800000000e+02 2.2800000000e+02 3.3600000000e+03 3.2400000000e+03 3.3448000000e+03 3.2400000000e+03 3.2400000000e+03
1322765580: 2.2000000000e+01 1.8200000000e+02 4.0000000000e-01 4.0000000000e-01 2.2800000000e+02 2.2800000000e+02 3.3600000000e+03 3.2400000000e+03 3.3448000000e+03 3.2400000000e+03 3.2400000000e+03
1322765640: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
1322765700: -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan -nan
I just checked the output but i cant see any char that might break the output. See the post below for a best guess.
Last edited by SDohmen on Thu Dec 01, 2011 1:58 pm, edited 4 times in total.
SDohmen
Posts: 240
Joined: Thu Jun 30, 2011 4:14 am

Re: check_multi support

Post by SDohmen »

I think i have some more info.

I was browsing all the logs when i came across the nagios.log. Here i found all the checks that where passed to the central from all slaves. Because of this i decided to search for the data that was being cut off and i found the following:

Code: Select all

[1322694000] CURRENT SERVICE STATE: defauwes-xen01;adm-hardware;OK;HARD;1;OK - System: PowerEdge R510 II, SN: *******, 32 GB ram (8 dimms), 1 logical drives, 6 physical drivesbr/----- BIOS=1.5.3 10/25/2010, iDRAC6=1.54br/----- Ctrl 0 [PERC H700 Integrated]: Fw=12.10.0-0025, Dr=00.00.04.27-SL1br/----- Encl 0:0:0 [Backplane]: Fw=1.10br/----- Encl 0:1:0 [Backplane]: Fw=1.10br/----- OpenManage Server Administrator (OMSA) version: 6.4.0
From this service the screenshot is
check1.PNG
A other example is the following:

Code: Select all

[1322694000] CURRENT SERVICE STATE: defauwes-ts002;events;CRITICAL;HARD;3;check_multi CRITICAL - 102 plugins checked, 1 critical (event_id_1074), 51 unknown (event_id_1501, event_id_1699, event_id_2001, event_id_2013, event_id_2019, event_id_2020, event_id_2025, event_id_2026, event_id_2027, event_id_2507, event_id_2511, event_id_3002, event_id_3003, event_id_3005, event_id_3020, event_id_3055, event_id_3056, event_id_3057, event_id_3096, event_id_4003, event_id_4005, event_id_4006, event_id_4102, event_id_4103, event_id_4105, event_id_4198, event_id_4199, event_id_4226, event_id_4319, event_id_5148, event_id_5149, event_id_5782, event_id_5789, event_id_6000, event_id_6008, event_id_7000, event_id_7016, event_id_7022, event_id_7023, event_id_7024, event_id_7025, event_id_7027, event_id_7031, event_id_7033, event_id_7034, event_id_7037, event_id_7038, event_id_12290, event_id_12302, event_id_12305, event_id_16387), 50 ok
In this case its a multi check which sends massive amounts of data which according to the log are being received but not being shown on the page. A screenshot for this is as follows:
check2.PNG

Seeing all this i would guess that the central cant show the complete output as its being limited to 255 characters. Is there a way to change this since we have multiple checks that have several 1000 chars to show.
You do not have the required permissions to view the files attached to this post.
Locked