Page 2 of 2

Re: UCS disk failed alert for our infra.

Posted: Mon Mar 01, 2021 1:14 am
by informatica
i think we are talking about CISCO ucs health not windows/linux device.
Can you please recheck my post.

Hi Thanks,

Now i am able to get the help,
We want to implement this in prod.
As we are doing like go build check_cisco_ucs.go
If something happen due this go build check_cisco_ucs.go in production how to roll back??

But in this script i can't see any parameter such like to monitor the Physical drive monitoring on ucs device. Could you please help us on this ??

And even i don't see any suck like cpu/memory/hardware health check not available could you please help this too ??

Do we have any other plugin to get the only details for physicaldrive/cpu/memory/hardware health ???

Re: UCS disk failed alert for our infra.

Posted: Mon Mar 01, 2021 5:18 pm
by vtrac
Hi,
Since this script (module) came from the Nagios Exchange, we do not support them.
I have tried very hard to help you out, but we don't have UCS disk here for me to even test the script.

As to rolling back, since you used "yum" to install "golang-bin", then just use "yum" to delete it.

To delete:

Code: Select all

yum erase golang-bin

Here are usage examples I found on the web:

Cisco UCS rack server via CIMC:

$ ./check_cisco_ucs -H 10.18.4.7 -t class -q storageVirtualDrive -a "raidLevel vdStatus health" -e Optimal -u admin -p pls_change
OK - Cisco UCS storageVirtualDrive (raidLevel,vdStatus,health) RAID 10,Optimal,Good (1 of 1 ok)

$ ./check_cisco_ucs -H 10.18.4.7 -t class -q storageLocalDisk -a "id pdStatus driveSerialNumber" -e Online -u admin -p pls_change
OK - Cisco UCS storageLocalDisk (id,pdStatus,driveSerialNumber) 1,Online,6XP4QRVQ 2,Online,6XP4QS1G 3,Online,6XP4RT6A 4,Online,6XP4RT8V (4 of 4 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q equipmentPsu -a "id model operState serial" -e operable -u admin -p pls_change
CRIT - Cisco UCS equipmentPsu (id,model,operState,serial) 1,UCS-PSU-6248UP-AC,operable,POG164371G8 2,UCS-PSU-6248UP-AC,operable,POG1643721D 1,UCS-PSU-6248UP-AC,operable,POG164371C5 2,UCS-PSU-6248UP-AC,operable,POG1643721S 1,UCSB-PSU-2500ACPL,operable,AZS16210FFA 2,UCSB-PSU-2500ACPL,operable,AZS16210FH3 3,UCSB-PSU-2500ACPL,operable,AZS16210FH2 4,,removed (7 of 8 ok)

$ ./check_cisco_ucs -H 10.18.4.7 -t dn -q sys/rack-unit-1/indicator-led-4 -o equipmentIndicatorLed -a "id color name" -e green -u admin -p pls_change
OK - Cisco UCS sys/rack-unit-1/indicator-led-4 (id,color,name) 4,green,LED_FAN_STATUS (1 of 1 ok)

$ ./check_cisco_ucs -H 10.1.1.235 -t dn -q sys/rack-unit-1/indicator-led-4 -a "id color name" -e "green" -u admin -p pls_change -o equipmentIndicatorLed -M 1.2
OK - Cisco UCS sys/rack-unit-1/indicator-led-4 (id,color,name)
4,green,LED_HLTH_STATUS (1 of 1 ok)

Cisco UCS Manager:

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q equipmentPsu -a "id model operState serial" -e operable -u admin -p pls_change
CRIT - Cisco UCS equipmentPsu (id,model,operState,serial) 1,UCS-PSU-6248UP-AC,operable,POG164371G8 2,UCS-PSU-6248UP-AC,operable,POG1643721D 1,UCS-PSU-6248UP-AC,operable,POG164371C5 2,UCS-PSU-6248UP-AC,operable,POG1643721S 1,UCSB-PSU-2500ACPL,operable,AZS16210FFA 2,UCSB-PSU-2500ACPL,operable,AZS16210FH3 3,UCSB-PSU-2500ACPL,operable,AZS16210FH2 4,,removed (7 of 8 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t dn -q sys/switch-B/slot-1/switch-ether/port-1 -o etherPIo -a operState -e up -u admin -p pls_change
OK - Cisco UCS sys/switch-B/slot-1/switch-ether/port-1 (operState) up (1 of 1 ok)

$ ./check_cisco_ucs -H 10.18.64.10 -t class -q faultInst -a "code severity ack" -e "cleared,no|cleared,yes|info,no|info,yes|warning,no|warning,yes|yes|^$" -z true -u admin -p pls_change
OK - Cisco UCS faultInst (code,severity,ack) (0 of 0 ok)

$ ./check_cisco_ucs -H 172.18.37.164 -t class -q faultInst -a "code rn descr" -z -F -u admin -p pls_change -s true -f "wcard:descr:^Log capacity.*"
OK - Cisco UCS faultInst (code,rn,descr)
F0461,,Log capacity on Management Controller on server 1/4 is very-low
F0461,,Log capacity on Management Controller on server 1/1 is very-low (0 of 2 ok)

$ ./check_cisco_ucs -H 172.18.37.164 -t class -q equipmentPsuStats -a "dn outputPower ambientTempAvg timeCollected" -z -F -u admin -p pls_change -s true -f gt:ambientTempAvg:24
OK - Cisco UCS equipmentPsuStats (dn,outputPower,ambientTempAvg,timeCollected)
sys/chassis-3/psu-3/stats,374.696991,24.307692,2018-11-20T07:57:19.396
sys/chassis-2/psu-4/stats,300.200012,25.666668,2018-11-20T07:57:42.627 (0 of 2 ok)


Regards,
Vinh

Re: UCS disk failed alert for our infra.

Posted: Tue Mar 02, 2021 6:02 am
by informatica
hi Team,

we tried to execute to check the physical drive monitoring which are stays in ucs manager. we are getting the below output, we are not understanding what reply its giving. could you please help to monitor only physical drive monitoring.

[root@ittestnagiosxi toolsadmin]# ./check_cisco_ucs -H XXXX -t class -q storageLocalDisk -a "id pdStatus driveSerialNumber" -e Online -u nagiosadmin -p 'XXXX'
CRIT - Cisco UCS storageLocalDisk (id,pdStatus,driveSerialNumber)
1
2
1
2
2
1
1
2
1
2
1
2
2
1
1
2
2
1
1
2
2
1
2
1
1
2
1
2
1
2
1
2
1
2
2
1
1
2
2
1
1
2
2
1
2
1
2
1
1
2
2
1
2
1
2
1
2
1
2
1
2
1
1
2
1
2
2
1
1
2
2
1
2
1
2
1
1
2
1
2
1
2
2
1
1
2
2
1
1
2
1
2
2
1
2
1
1
2
1
2
1
2
2
1
1
2
1
2
1
2
1
2
2
1
1
2
1
2
1
2
2
1
2
1
1
2
1
2
2
1
2
1
2
1
2
1
2
1
1
2
1
2
1
2
2
1
2
1
1
2
2
1
2
1
1
2
2
1
1
2
1
2
2
1
2
1
2
1
1
2
1
2
3
4
5
6
7
8
2
1
1
2
1
2
3
2
1
2
1
2
1
2
1
2
1
1
2
3
4
5
6
7
8 (0 of 203 ok)

Re: UCS disk failed alert for our infra.

Posted: Tue Mar 02, 2021 3:51 pm
by vtrac
Hi,
I looked at the "check_cisco_ucs" web page, and do your UCS disk in this list (below), which were tested by the script:
1. UCSC-C240-M3S server and CIMC firmware version 1.5(1f).24
2. Cisco UCS Manager version 2.1(1e) and UCSB-B22-M3 blade center
3. Cisco UCS Manager version 2.2(1b) and UCSB-B200-M3
4. UCSC-C220-M4S server and CIMC firmware version 2.0(4c).36
5. UCS C240 M4S and CIMC firmware version 3.0(3a)
6. Cisco UCS Manager version 3.2(3g)



Another thing you could try is to get (download) the MIB files from your UCS provider and try to use SNMP, instead.

If you decide to try SNMP, please talk to your UCS provider on how to setup and install SNMP running agent on your UCS machine first.

Once you have SNMP agent running on your UCS machine, you could import those MIB files onto Nagios XI and try to use SNMP wizard to setup monitor your UCS machine based on the MIB file provided.


Regards,
Vinh

Re: UCS disk failed alert for our infra.

Posted: Wed Mar 03, 2021 2:42 am
by informatica
we are using version of 4.0. Are you saying this plugin will not support to monitor physical drive ???

Ok i can see the list of mib files hear how can i know that which mib file is suitable to monitor physical drive could you please help.

ftp://ftp.cisco.com/pub/mibs/supportlis ... tlist.html

Re: UCS disk failed alert for our infra.

Posted: Wed Mar 03, 2021 5:01 pm
by vtrac
Hi,
We do not have UCS disk here for me to even test the script out.

Please play with the script and see what output you get or search the web for more info.

You could also try contact the owner of "check_cisco_ucs" for more help as we DO NOT support any script/modules from Nagios Exchange.

I have spent lot of hours trying to research from the web to help out, but there is only so much I could do with no UCS disk to test.

As to the MIB, if you want to try SNMP, the please contact CISCO for recommendation.


Regards,
Vinh

Re: UCS disk failed alert for our infra.

Posted: Tue Mar 09, 2021 1:11 pm
by ssax
Locking thread, ticket received, we will continue support through the ticket.