Page 1 of 2

UCS disk failed alert for our infra.

Posted: Wed Feb 10, 2021 1:06 am
by informatica
Hi Team,

We need to monitor the UCS disk failed alert which is happening frequently. could you please help to monitor whether its physical drive or Logical drive, Please let me know if you need more details.

Re: UCS disk failed alert for our infra.

Posted: Wed Feb 10, 2021 4:59 pm
by vtrac
Hi informatica,
I found couple URLs which related to UCS from Nagios Exchange that you can look at and maybe try it/them out.

Cisco UCS Manager plugin:
https://exchange.nagios.org/directory/P ... in/details

Cisco UCS XML API:
https://exchange.nagios.org/directory/P ... PI/details

Here's the URL for Nagios Exchange that you can search for more UCS modules if those above not quite what you needed.
https://exchange.nagios.org/


Regards,
Vinh

Re: UCS disk failed alert for our infra.

Posted: Fri Feb 19, 2021 2:01 am
by informatica
team,

we are already using the below plugin but we are not able to monitor PSU and Fan and CPU and Memory and physical drive, Could you please help on this.

and using below script we could see we can monitor fan/psu/chassic but we are getting below output. we want to monitor total UCS Hardware monitoring.
https://exchange.nagios.org/directory/P ... em/details
[root@qy-nagios-a ISP_NEW_DMZ]# /usr/local/nagios/libexec/check_ucs_manager -H XXXX -C XXXX -N -T "Chassis 1"
Missing test type (-T)!

USAGE: -H <HOST_IP> -T <TYPE> -N <OBJECT_NAME> [-C <COMMUNITY>]
[-u <USERNAME>] [-a <MD5|SHA>] [-A <PASSPHRASE>]
[-x <AES|DES>] [-X <PASSPHRASE>]

Parameters:
-H <HOST_IP> Target hostname
-T <TEST> Selected test
-N <NAME> Monitoring object name
-C <COMMUNITY> SNMPv2 community string
-u <USERNAME> SNMPv3 SecurityName
-a <MD5|SHA> SNMPv3 authProtocol (Default: MD5)
-A <PASSPHRASE> SNMPv3 authPassword
-x <AES|DES> SNMPv3 privProtocol (Default: DES)
-X <PASSPHRASE> SNMPv3 privKey

Test types:
ct - Chassis Temperature
ci - Chassis IOCard Status
f - Fans Status
po - PSUs Operate Status
fs - Faults Summary (Dont need -N)

Fabric Interconnects support only these test: f, po
Object name examples: switch, switch-A, chassis-1, chassis-10


###################################################################

And we tried another plugin where we are getting below error.
https://exchange.nagios.org/directory/P ... PI/details
[root@qy-nagios-a check_cisco_ucs-master]# ./check_cisco_ucs.go --help
./check_cisco_ucs.go: line 1: syntax error near unexpected token `('
./check_cisco_ucs.go: line 1: `// check_cisco_ucs is a Nagios plugin made by Herwig Grimm (herwig.grimm at aon.at)'

Re: UCS disk failed alert for our infra.

Posted: Fri Feb 19, 2021 3:20 pm
by vtrac
Hi informatica,
Since there are Exchange Nagios plugins, we do not support them.
Also, we don't have any UCS disk here for me to be able to even test these plugins.

Yes, looks like "check_usc" option test types (below) do not check UCS disk failure.
Test types:

Code: Select all

ct - Chassis Temperature
ci - Chassis IOCard Status
f - Fans Status
po - PSUs Operate Status
fs - Faults Summary (Dont need -N)
However, "check_cisco_ucs" seems to give you more options but it is written in google go.

I found the article below which will show you how to "install go" and "compile the check_cico_ucs.go" plugin before you can use it.
It does seem very simple.
https://topslakr.com/2017/09/setting-up ... scos-cimc/


Regards.
Vinh

Re: UCS disk failed alert for our infra.

Posted: Mon Feb 22, 2021 2:56 am
by informatica
After installing the google packages also we are getting the below error.


[root@ittestnagiosxi single_isp_monitoring]# yum install golang-bin
[root@ittestnagiosxi ucs]# cd ..
[root@ittestnagiosxi toolsadmin]# ./check_cisco_ucs.go
./check_cisco_ucs.go: line 1: //: Is a directory
./check_cisco_ucs.go: line 2: syntax error near unexpected token `('
./check_cisco_ucs.go: line 2: `// Version 0.6 (19.07.2017)'
[root@ittestnagiosxi toolsadmin]#

Re: UCS disk failed alert for our infra.

Posted: Mon Feb 22, 2021 4:42 pm
by vtrac
Hi informatica,
Based on the article below, you need to "compile" the "check_cisco_ucs.go" script.
https://topslakr.com/2017/09/setting-up ... scos-cimc/

Navigated to the folder containing the "check_cisco_ucs.go" script and ran the following command to compile the script:

Code: Select all

go build check_cisco_ucs.go
Once compiled, you can test out the script, maybe check the "help" option:

Code: Select all

./check_cisco_ucs --help
Regards,
Vinh

Re: UCS disk failed alert for our infra.

Posted: Tue Feb 23, 2021 12:39 am
by informatica
Hi Team,

Please find the below details and you can suggest if we have any other plugin to monitor hardware health for UCS device.

[root@ittestnagiosxi toolsadmin]# go build check_cisco_ucs.go
[root@ittestnagiosxi toolsadmin]# ./check_cisco_ucs.go --help
./check_cisco_ucs.go: line 1: //: Is a directory
./check_cisco_ucs.go: line 2: syntax error near unexpected token `('
./check_cisco_ucs.go: line 2: `// Version 0.6 (19.07.2017)'
[root@ittestnagiosxi toolsadmin]#

Re: UCS disk failed alert for our infra.

Posted: Tue Feb 23, 2021 6:04 pm
by vtrac
Hi,

I tested this on my machine.
After you compiled, it generated a file called "check_cisco_ucs" ..... NOTE: without the ".go" at the end.
You kept using the source file called "check_cisco_ucs.go", which is WRONG!!

Here are my example:

[root@vt-nagiosxi-61 GO]# ./check_cisco_ucs -V

Code: Select all

check_cisco_ucs version: 0.6
[root@vt-nagiosxi-61 GO]# ./check_cisco_ucs --help

Code: Select all

Usage of ./check_cisco_ucs:
  -E    print environment variables for debug purpose
  -F    display only faults in output
  -H string
        UCS Manager IP address or CIMC IP address
  -M string
        used TLS version, default: v1.1 (default "1.1")
  -P string
        proxy URL
  -V    print plugin version
  -a string
        space separated list of XML attributes for display in nagios output and match against *expect* string (default "id name")
  -d int
        print debug, level: 1 errors only, 2 warnings and 3 informational messages
  -e string
        expect string, ok if this is found, examples: 'Optimal' or 'Good' or 'Optimal|Good' (default "Optimal")
  -o string
        XML API object class name, examples: storageVirtualDrive or storageLocalDisk
  -p string
        XML API password
  -q string
        XML API object class name, examples: storageVirtualDrive or storageLocalDisk or storageControllerProps
        or Distinguished Name (DN) name, examples: "sys/rack-unit-1" (default "storageLocalDisk")
  -s string
        true or false. If true, the inHierarchical argument returns all child objects (default "false")
  -t string
        query type 'class' or 'dn' (default "class")
  -u string
        XML API username
  -z    true or false. if set to true the check will return OK status if zero instances where found. Default is false.
Regards,
Vinh

Re: UCS disk failed alert for our infra.

Posted: Wed Feb 24, 2021 1:30 am
by informatica
Hi Thanks,

Now i am able to get the help,
We want to implement this in prod.
As we are doing like go build check_cisco_ucs.go
If something happen due to how to roll back??

But in this script i can't see any parameter such like to monitor the Physical drive monitoring on ucs device. Could you please help us on this ??

And even i don't see any suck like cpu/memory/hardware health check not available could you please help this too ??

Do we have any other plugin to get the only details for physicaldrive/cpu/memory/hardware health ???

Re: UCS disk failed alert for our infra.

Posted: Wed Feb 24, 2021 5:30 pm
by vtrac
Hi,
Since you only want to check the health/CPU/memory ...., let try the NCPA wizard.

However, you will need to first download, install and configure the NCPA agent on your Linux machine.
Once done, you will have the "token" which will be needed to run the NCPA wizard.

Nagios XI GUI > Configure > Configuration Wizards > find "NCPA" wizard from the list.

You will see the NCPA agent download and installation instruction from the page.
w1.png

Regards,
Vinh