UCS disk failed alert for our infra.

informatica · Post by **informatica** » Mon Mar 01, 2021 1:14 am

i think we are talking about CISCO ucs health not windows/linux device.
Can you please recheck my post.

Hi Thanks,

Now i am able to get the help,
We want to implement this in prod.
As we are doing like go build check_cisco_ucs.go
If something happen due this go build check_cisco_ucs.go in production how to roll back??

But in this script i can't see any parameter such like to monitor the Physical drive monitoring on ucs device. Could you please help us on this ??

And even i don't see any suck like cpu/memory/hardware health check not available could you please help this too ??

Do we have any other plugin to get the only details for physicaldrive/cpu/memory/hardware health ???

Post by **vtrac** » Mon Mar 01, 2021 5:18 pm

Hi,
Since this script (module) came from the Nagios Exchange, we do not support them.
I have tried very hard to help you out, but we don't have UCS disk here for me to even test the script.

As to rolling back, since you used "yum" to install "golang-bin", then just use "yum" to delete it.

To delete:

Code: Select all

yum erase golang-bin

Here are usage examples I found on the web:

Cisco UCS rack server via CIMC:

$ ./check_cisco_ucs -H 10.18.4.7 -t class -q storageVirtualDrive -a "raidLevel vdStatus health" -e Optimal -u admin -p pls_change
OK - Cisco UCS storageVirtualDrive (raidLevel,vdStatus,health) RAID 10,Optimal,Good (1 of 1 ok)

$ ./check_cisco_ucs -H 10.18.4.7 -t class -q storageLocalDisk -a "id pdStatus driveSerialNumber" -e Online -u admin -p pls_change
OK - Cisco UCS storageLocalDisk (id,pdStatus,driveSerialNumber) 1,Online,6XP4QRVQ 2,Online,6XP4QS1G 3,Online,6XP4RT6A 4,Online,6XP4RT8V (4 of 4 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q equipmentPsu -a "id model operState serial" -e operable -u admin -p pls_change
CRIT - Cisco UCS equipmentPsu (id,model,operState,serial) 1,UCS-PSU-6248UP-AC,operable,POG164371G8 2,UCS-PSU-6248UP-AC,operable,POG1643721D 1,UCS-PSU-6248UP-AC,operable,POG164371C5 2,UCS-PSU-6248UP-AC,operable,POG1643721S 1,UCSB-PSU-2500ACPL,operable,AZS16210FFA 2,UCSB-PSU-2500ACPL,operable,AZS16210FH3 3,UCSB-PSU-2500ACPL,operable,AZS16210FH2 4,,removed (7 of 8 ok)

$ ./check_cisco_ucs -H 10.18.4.7 -t dn -q sys/rack-unit-1/indicator-led-4 -o equipmentIndicatorLed -a "id color name" -e green -u admin -p pls_change
OK - Cisco UCS sys/rack-unit-1/indicator-led-4 (id,color,name) 4,green,LED_FAN_STATUS (1 of 1 ok)

$ ./check_cisco_ucs -H 10.1.1.235 -t dn -q sys/rack-unit-1/indicator-led-4 -a "id color name" -e "green" -u admin -p pls_change -o equipmentIndicatorLed -M 1.2
OK - Cisco UCS sys/rack-unit-1/indicator-led-4 (id,color,name)
4,green,LED_HLTH_STATUS (1 of 1 ok)

Cisco UCS Manager:

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q equipmentPsu -a "id model operState serial" -e operable -u admin -p pls_change
CRIT - Cisco UCS equipmentPsu (id,model,operState,serial) 1,UCS-PSU-6248UP-AC,operable,POG164371G8 2,UCS-PSU-6248UP-AC,operable,POG1643721D 1,UCS-PSU-6248UP-AC,operable,POG164371C5 2,UCS-PSU-6248UP-AC,operable,POG1643721S 1,UCSB-PSU-2500ACPL,operable,AZS16210FFA 2,UCSB-PSU-2500ACPL,operable,AZS16210FH3 3,UCSB-PSU-2500ACPL,operable,AZS16210FH2 4,,removed (7 of 8 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t dn -q sys/switch-B/slot-1/switch-ether/port-1 -o etherPIo -a operState -e up -u admin -p pls_change
OK - Cisco UCS sys/switch-B/slot-1/switch-ether/port-1 (operState) up (1 of 1 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q faultInst -a "code severity ack" -e "cleared,no|cleared,yes|info,no|info,yes|warning,no|warning,yes|yes|^$" -z true -u admin -p pls_change
OK - Cisco UCS faultInst (code,severity,ack) (0 of 0 ok)

$ ./check_cisco_ucs -H 172.18.37.164 -t class -q faultInst -a "code rn descr" -z -F -u admin -p pls_change -s true -f "wcard:descr:^Log capacity.*"
OK - Cisco UCS faultInst (code,rn,descr)
F0461,,Log capacity on Management Controller on server 1/4 is very-low
F0461,,Log capacity on Management Controller on server 1/1 is very-low (0 of 2 ok)

$ ./check_cisco_ucs -H 172.18.37.164 -t class -q equipmentPsuStats -a "dn outputPower ambientTempAvg timeCollected" -z -F -u admin -p pls_change -s true -f gt:ambientTempAvg:24
OK - Cisco UCS equipmentPsuStats (dn,outputPower,ambientTempAvg,timeCollected)
sys/chassis-3/psu-3/stats,374.696991,24.307692,2018-11-20T07:57:19.396
sys/chassis-2/psu-4/stats,300.200012,25.666668,2018-11-20T07:57:42.627 (0 of 2 ok)

Regards,
Vinh

informatica · Post by **informatica** » Tue Mar 02, 2021 6:02 am

hi Team,

we tried to execute to check the physical drive monitoring which are stays in ucs manager. we are getting the below output, we are not understanding what reply its giving. could you please help to monitor only physical drive monitoring.

[root@ittestnagiosxi toolsadmin]# ./check_cisco_ucs -H XXXX -t class -q storageLocalDisk -a "id pdStatus driveSerialNumber" -e Online -u nagiosadmin -p 'XXXX'
CRIT - Cisco UCS storageLocalDisk (id,pdStatus,driveSerialNumber)
1
2
1
2
2
1
1
2
1
2
1
2
2
1
1
2
2
1
1
2
2
1
2
1
1
2
1
2
1
2
1
2
1
2
2
1
1
2
2
1
1
2
2
1
2
1
2
1
1
2
2
1
2
1
2
1
2
1
2
1
2
1
1
2
1
2
2
1
1
2
2
1
2
1
2
1
1
2
1
2
1
2
2
1
1
2
2
1
1
2
1
2
2
1
2
1
1
2
1
2
1
2
2
1
1
2
1
2
1
2
1
2
2
1
1
2
1
2
1
2
2
1
2
1
1
2
1
2
2
1
2
1
2
1
2
1
2
1
1
2
1
2
1
2
2
1
2
1
1
2
2
1
2
1
1
2
2
1
1
2
1
2
2
1
2
1
2
1
1
2
1
2
3
4
5
6
7
8
2
1
1
2
1
2
3
2
1
2
1
2
1
2
1
2
1
1
2
3
4
5
6
7
8 (0 of 203 ok)

Post by **vtrac** » Tue Mar 02, 2021 3:51 pm

Hi,
I looked at the "check_cisco_ucs" web page, and do your UCS disk in this list (below), which were tested by the script:
1. UCSC-C240-M3S server and CIMC firmware version 1.5(1f).24
2. Cisco UCS Manager version 2.1(1e) and UCSB-B22-M3 blade center
3. Cisco UCS Manager version 2.2(1b) and UCSB-B200-M3
4. UCSC-C220-M4S server and CIMC firmware version 2.0(4c).36
5. UCS C240 M4S and CIMC firmware version 3.0(3a)
6. Cisco UCS Manager version 3.2(3g)

Another thing you could try is to get (download) the MIB files from your UCS provider and try to use SNMP, instead.

If you decide to try SNMP, please talk to your UCS provider on how to setup and install SNMP running agent on your UCS machine first.

Once you have SNMP agent running on your UCS machine, you could import those MIB files onto Nagios XI and try to use SNMP wizard to setup monitor your UCS machine based on the MIB file provided.

Regards,
Vinh

informatica · Post by **informatica** » Wed Mar 03, 2021 2:42 am

we are using version of 4.0. Are you saying this plugin will not support to monitor physical drive ???

Ok i can see the list of mib files hear how can i know that which mib file is suitable to monitor physical drive could you please help.

ftp://ftp.cisco.com/pub/mibs/supportlis ... tlist.html

Post by **vtrac** » Wed Mar 03, 2021 5:01 pm

Hi,
We do not have UCS disk here for me to even test the script out.

Please play with the script and see what output you get or search the web for more info.

You could also try contact the owner of "check_cisco_ucs" for more help as we DO NOT support any script/modules from Nagios Exchange.

I have spent lot of hours trying to research from the web to help out, but there is only so much I could do with no UCS disk to test.

As to the MIB, if you want to try SNMP, the please contact CISCO for recommendation.

Regards,
Vinh

ssax · Post by **ssax** » Tue Mar 09, 2021 1:11 pm

Locking thread, ticket received, we will continue support through the ticket.

Nagios Support Forum

UCS disk failed alert for our infra.

Re: UCS disk failed alert for our infra.

Re: UCS disk failed alert for our infra.

Re: UCS disk failed alert for our infra.

Re: UCS disk failed alert for our infra.

Re: UCS disk failed alert for our infra.

Re: UCS disk failed alert for our infra.

Re: UCS disk failed alert for our infra.