UCS disk failed alert for our infra.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
informatica
Posts: 99
Joined: Thu Jan 28, 2021 9:55 pm

Re: UCS disk failed alert for our infra.

Post by informatica »

i think we are talking about CISCO ucs health not windows/linux device.
Can you please recheck my post.

Hi Thanks,

Now i am able to get the help,
We want to implement this in prod.
As we are doing like go build check_cisco_ucs.go
If something happen due this go build check_cisco_ucs.go in production how to roll back??

But in this script i can't see any parameter such like to monitor the Physical drive monitoring on ucs device. Could you please help us on this ??

And even i don't see any suck like cpu/memory/hardware health check not available could you please help this too ??

Do we have any other plugin to get the only details for physicaldrive/cpu/memory/hardware health ???
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: UCS disk failed alert for our infra.

Post by vtrac »

Hi,
Since this script (module) came from the Nagios Exchange, we do not support them.
I have tried very hard to help you out, but we don't have UCS disk here for me to even test the script.

As to rolling back, since you used "yum" to install "golang-bin", then just use "yum" to delete it.

To delete:

Code: Select all

yum erase golang-bin

Here are usage examples I found on the web:

Cisco UCS rack server via CIMC:

$ ./check_cisco_ucs -H 10.18.4.7 -t class -q storageVirtualDrive -a "raidLevel vdStatus health" -e Optimal -u admin -p pls_change
OK - Cisco UCS storageVirtualDrive (raidLevel,vdStatus,health) RAID 10,Optimal,Good (1 of 1 ok)

$ ./check_cisco_ucs -H 10.18.4.7 -t class -q storageLocalDisk -a "id pdStatus driveSerialNumber" -e Online -u admin -p pls_change
OK - Cisco UCS storageLocalDisk (id,pdStatus,driveSerialNumber) 1,Online,6XP4QRVQ 2,Online,6XP4QS1G 3,Online,6XP4RT6A 4,Online,6XP4RT8V (4 of 4 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q equipmentPsu -a "id model operState serial" -e operable -u admin -p pls_change
CRIT - Cisco UCS equipmentPsu (id,model,operState,serial) 1,UCS-PSU-6248UP-AC,operable,POG164371G8 2,UCS-PSU-6248UP-AC,operable,POG1643721D 1,UCS-PSU-6248UP-AC,operable,POG164371C5 2,UCS-PSU-6248UP-AC,operable,POG1643721S 1,UCSB-PSU-2500ACPL,operable,AZS16210FFA 2,UCSB-PSU-2500ACPL,operable,AZS16210FH3 3,UCSB-PSU-2500ACPL,operable,AZS16210FH2 4,,removed (7 of 8 ok)

$ ./check_cisco_ucs -H 10.18.4.7 -t dn -q sys/rack-unit-1/indicator-led-4 -o equipmentIndicatorLed -a "id color name" -e green -u admin -p pls_change
OK - Cisco UCS sys/rack-unit-1/indicator-led-4 (id,color,name) 4,green,LED_FAN_STATUS (1 of 1 ok)

$ ./check_cisco_ucs -H 10.1.1.235 -t dn -q sys/rack-unit-1/indicator-led-4 -a "id color name" -e "green" -u admin -p pls_change -o equipmentIndicatorLed -M 1.2
OK - Cisco UCS sys/rack-unit-1/indicator-led-4 (id,color,name)
4,green,LED_HLTH_STATUS (1 of 1 ok)

Cisco UCS Manager:

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q equipmentPsu -a "id model operState serial" -e operable -u admin -p pls_change
CRIT - Cisco UCS equipmentPsu (id,model,operState,serial) 1,UCS-PSU-6248UP-AC,operable,POG164371G8 2,UCS-PSU-6248UP-AC,operable,POG1643721D 1,UCS-PSU-6248UP-AC,operable,POG164371C5 2,UCS-PSU-6248UP-AC,operable,POG1643721S 1,UCSB-PSU-2500ACPL,operable,AZS16210FFA 2,UCSB-PSU-2500ACPL,operable,AZS16210FH3 3,UCSB-PSU-2500ACPL,operable,AZS16210FH2 4,,removed (7 of 8 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t dn -q sys/switch-B/slot-1/switch-ether/port-1 -o etherPIo -a operState -e up -u admin -p pls_change
OK - Cisco UCS sys/switch-B/slot-1/switch-ether/port-1 (operState) up (1 of 1 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q faultInst -a "code severity ack" -e "cleared,no|cleared,yes|info,no|info,yes|warning,no|warning,yes|yes|^$" -z true -u admin -p pls_change
OK - Cisco UCS faultInst (code,severity,ack) (0 of 0 ok)

$ ./check_cisco_ucs -H 172.18.37.164 -t class -q faultInst -a "code rn descr" -z -F -u admin -p pls_change -s true -f "wcard:descr:^Log capacity.*"
OK - Cisco UCS faultInst (code,rn,descr)
F0461,,Log capacity on Management Controller on server 1/4 is very-low
F0461,,Log capacity on Management Controller on server 1/1 is very-low (0 of 2 ok)

$ ./check_cisco_ucs -H 172.18.37.164 -t class -q equipmentPsuStats -a "dn outputPower ambientTempAvg timeCollected" -z -F -u admin -p pls_change -s true -f gt:ambientTempAvg:24
OK - Cisco UCS equipmentPsuStats (dn,outputPower,ambientTempAvg,timeCollected)
sys/chassis-3/psu-3/stats,374.696991,24.307692,2018-11-20T07:57:19.396
sys/chassis-2/psu-4/stats,300.200012,25.666668,2018-11-20T07:57:42.627 (0 of 2 ok)


Regards,
Vinh
informatica
Posts: 99
Joined: Thu Jan 28, 2021 9:55 pm

Re: UCS disk failed alert for our infra.

Post by informatica »

hi Team,

we tried to execute to check the physical drive monitoring which are stays in ucs manager. we are getting the below output, we are not understanding what reply its giving. could you please help to monitor only physical drive monitoring.

[root@ittestnagiosxi toolsadmin]# ./check_cisco_ucs -H XXXX -t class -q storageLocalDisk -a "id pdStatus driveSerialNumber" -e Online -u nagiosadmin -p 'XXXX'
CRIT - Cisco UCS storageLocalDisk (id,pdStatus,driveSerialNumber)
1
2
1
2
2
1
1
2
1
2
1
2
2
1
1
2
2
1
1
2
2
1
2
1
1
2
1
2
1
2
1
2
1
2
2
1
1
2
2
1
1
2
2
1
2
1
2
1
1
2
2
1
2
1
2
1
2
1
2
1
2
1
1
2
1
2
2
1
1
2
2
1
2
1
2
1
1
2
1
2
1
2
2
1
1
2
2
1
1
2
1
2
2
1
2
1
1
2
1
2
1
2
2
1
1
2
1
2
1
2
1
2
2
1
1
2
1
2
1
2
2
1
2
1
1
2
1
2
2
1
2
1
2
1
2
1
2
1
1
2
1
2
1
2
2
1
2
1
1
2
2
1
2
1
1
2
2
1
1
2
1
2
2
1
2
1
2
1
1
2
1
2
3
4
5
6
7
8
2
1
1
2
1
2
3
2
1
2
1
2
1
2
1
2
1
1
2
3
4
5
6
7
8 (0 of 203 ok)
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: UCS disk failed alert for our infra.

Post by vtrac »

Hi,
I looked at the "check_cisco_ucs" web page, and do your UCS disk in this list (below), which were tested by the script:
1. UCSC-C240-M3S server and CIMC firmware version 1.5(1f).24
2. Cisco UCS Manager version 2.1(1e) and UCSB-B22-M3 blade center
3. Cisco UCS Manager version 2.2(1b) and UCSB-B200-M3
4. UCSC-C220-M4S server and CIMC firmware version 2.0(4c).36
5. UCS C240 M4S and CIMC firmware version 3.0(3a)
6. Cisco UCS Manager version 3.2(3g)



Another thing you could try is to get (download) the MIB files from your UCS provider and try to use SNMP, instead.

If you decide to try SNMP, please talk to your UCS provider on how to setup and install SNMP running agent on your UCS machine first.

Once you have SNMP agent running on your UCS machine, you could import those MIB files onto Nagios XI and try to use SNMP wizard to setup monitor your UCS machine based on the MIB file provided.


Regards,
Vinh
informatica
Posts: 99
Joined: Thu Jan 28, 2021 9:55 pm

Re: UCS disk failed alert for our infra.

Post by informatica »

we are using version of 4.0. Are you saying this plugin will not support to monitor physical drive ???

Ok i can see the list of mib files hear how can i know that which mib file is suitable to monitor physical drive could you please help.

ftp://ftp.cisco.com/pub/mibs/supportlis ... tlist.html
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: UCS disk failed alert for our infra.

Post by vtrac »

Hi,
We do not have UCS disk here for me to even test the script out.

Please play with the script and see what output you get or search the web for more info.

You could also try contact the owner of "check_cisco_ucs" for more help as we DO NOT support any script/modules from Nagios Exchange.

I have spent lot of hours trying to research from the web to help out, but there is only so much I could do with no UCS disk to test.

As to the MIB, if you want to try SNMP, the please contact CISCO for recommendation.


Regards,
Vinh
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: UCS disk failed alert for our infra.

Post by ssax »

Locking thread, ticket received, we will continue support through the ticket.
Locked