How do I monitor Cisco CPU utilisation ?

Post by **tgriep** » Tue Aug 11, 2015 2:36 pm

That folder is used to keep the state information for the last time the check was run to the current time it is run for performance / data calculations.
Some plugins have the ability to disable saving of the state information but I don't think the check_nwc_health has that ability.
Try running this to see if there is a command line to disable the state history and use that when checking the commands.

Code: Select all

/usr/local/nagios/libexec/check_nwc_health --help

vijilants · Post by **vijilants** » Wed Aug 12, 2015 8:01 am

tgriep wrote:That folder is used to keep the state information for the last time the check was run to the current time it is run for performance / data calculations.
Some plugins have the ability to disable saving of the state information but I don't think the check_nwc_health has that ability.
Try running this to see if there is a command line to disable the state history and use that when checking the commands.
Code: Select all
/usr/local/nagios/libexec/check_nwc_health --help

This is what I get:

Code: Select all

[nagios@nms1 mibs]$ /usr/local/nagios/libexec/check_nwc_health --help
check_nwc_health $Revision: 4.2 $ [http://labs.consol.de/nagios/check_nwc_health]

This monitoring plugin is free software, and comes with ABSOLUTELY NO WARRANTY.
It may be used, redistributed and/or modified under the terms of the GNU
General Public Licence (see http://www.fsf.org/licensing/licenses/gpl.txt).

This plugin checks various parameters of network components 

Usage: check_nwc_health [ -v|--verbose ] [ -t <timeout> ] --mode <what-to-do> --hostname <network-component> --community <snmp-community>  ...]
 -?, --usage
   Print usage information
 -h, --help
   Print detailed help screen
 -V, --version
   Print version information
 -t, --timeout=INTEGER
   Seconds before plugin times out (default: 15)
 -v, --verbose
   Show details for command-line debugging (can repeat up to 3 times)
 --hostname
   Hostname or IP-address of the switch or router
 --port
   The SNMP port to use (default: 161)
 --domain
   The transport domain to use (default: udp/ipv4, other possible values: udp6, udp/ipv6, tcp, tcp4, tcp/ipv4, tcp6, tcp/ipv6)
 --protocol
   The SNMP protocol to use (default: 2c, other possibilities: 1,3)
 --community
   SNMP community of the server (SNMP v1/2 only)
 --username
   The securityName for the USM security model (SNMPv3 only)
 --authpassword
   The authentication password for SNMPv3
 --authprotocol
   The authentication protocol for SNMPv3 (md5|sha)
 --privpassword
   The password for authPriv security level
 --privprotocol
   The private protocol for SNMPv3 (des|aes|aes128|3des|3desde)
 --contextengineid
   The context engine id for SNMPv3 (10 to 64 hex characters)
 --contextname
   The context name for SNMPv3 (empty represents the "default" context)
 --community2
   SNMP community which can be used to switch the context during runtime
 --snmpwalk
   A file with the output of a snmpwalk (used for simulation)
   Use it instead of --hostname
 --servertype
     The type of the network device: cisco (default). Use it if auto-detection
     is not possible
 --oids
   A list of oids which are downloaded and written to a cache file.
   Use it together with --mode oidcache
 --offline
   The maximum number of seconds since the last update of cache file before
   it is considered too old
 --mode
   A keyword which tells the plugin what to do
       hardware-health                  (Check the status of environmental equipment (fans, temperatures, power))
       cpu-load                         (Check the CPU load of the device)
       memory-usage                     (Check the memory usage of the device)
       interface-usage                  (Check the utilization of interfaces)
       interface-errors                 (Check the error-rate of interfaces (without discards))
       interface-discards               (Check the discard-rate of interfaces)
       interface-status                 (Check the status of interfaces (oper/admin))
       interface-nat-count-sessions     (Count the number of nat sessions)
       interface-nat-rejects            (Count the number of nat sessions rejected due to lack of resources)
       list-interfaces                  (Show the interfaces of the device and update the name cache)
       list-interfaces-detail           (Show the interfaces of the device and some details)
       interface-availability           (Show the availability (oper != up) of interfaces)
       link-aggregation-availability    (Check the percentage of up interfaces in a link aggregation)
       list-routes                      (Check the percentage of up interfaces in a link aggregation)
       route-exists                     (Check if a route exists. (--name is the dest, --name2 check also the next hop))
       count-routes                     (Count the routes. (--name is the dest, --name2 is the hop))
       vpn-status                       (Check the status of vpns (up/down))
       create-shinken-service           (Create a Shinken service definition)
       hsrp-state                       (Check the state in a HSRP group)
       hsrp-failover                    (Check if a HSRP group's nodes have changed their roles)
       list-hsrp-groups                 (Show the HSRP groups configured on this device)
       bgp-peer-status                  (Check status of BGP peers)
       count-bgp-peers                  (Count the number of BGP peers)
       watch-bgp-peers                  (Watch BGP peers appear and disappear)
       list-bgp-peers                   (Show BGP peers known to this device)
       count-bgp-prefixes               (Count the number of BGP prefixes (for specific peer with --name))
       ospf-neighbor-status             (Check status of OSPF neighbors)
       list-ospf-neighbors              (Show OSPF neighbors)
       ha-role                          (Check the role in a ha group)
       svn-status                       (Check the status of the svn subsystem)
       mngmt-status                     (Check the status of the management subsystem)
       fw-policy                        (Check the installed firewall policy)
       fw-connections                   (Check the number of firewall policy connections)
       session-usage                    (Check the session limits of a load balancer)
       security-status                  (Check if there are security-relevant incidents)
       pool-completeness                (Check the members of a load balancer pool)
       pool-connections                 (Check the number of connections of a load balancer pool)
       pool-complections                (Check the members and connections of a load balancer pool)
       list-pools                       (List load balancer pools)
       check-licenses                   (Check the installed licences/keys)
       count-users                      (Count the (connected) users/sessions)
       check-config                     (Check the status of configs (cisco, unsaved config changes))
       check-connections                (Check the quality of connections)
       count-connections                (Check the number of connections (-client, -server is possible))
       watch-fexes                      (Check if FEXes appear and disappear (use --lookup))
       accesspoint-status               (Check the status of access points)
       count-accesspoints               (Check if the number of access points is within a certain range)
       watch-accesspoints               (Check if access points appear and disappear (use --lookup))
       list-accesspoints                (List access points managed by this device)
       phone-cm-status                  (Check if the callmanager is up)
       phone-status                     (Check the number of registered/unregistered/rejected phones)
       list-smart-home-devices          (List Fritz!DECT 200 plugs managed by this device)
       smart-home-device-status         (Check if a Fritz!DECT 200 plug is on)
       smart-home-device-energy         (Show the current power consumption of a Fritz!DECT 200 plug)
       smart-home-device-consumption    (Show the cumulated power consumption of a Fritz!DECT 200 plug)
       uptime                           (Check the uptime of the device)
       walk                             (Show snmpwalk command with the oids necessary for a simulation)
       supportedmibs                    (Shows the names of the mibs which this devices has implemented (only lausser may run this command))


 --regexp
   Parameter name/name2/name3 will be interpreted as (perl) regular expression
 --warning
   The warning threshold
 --critical
   The critical threshold
 --warningx
   The extended warning thresholds
   e.g. --warningx db_msdb_free_pct=6: to override the threshold for a
   specific item 
 --criticalx
   The extended critical thresholds
 --units
   One of %, B, KB, MB, GB, Bit, KBi, MBi, GBi. (used for e.g. mode interface-usage)
 --name
   The name of an interface (ifDescr) or pool or ...
 --name2
   The secondary name of a component
 --name3
   The tertiary name of a component
 --blacklist
   Blacklist some (missing/failed) components
 --mitigation
   The parameter allows you to change a critical error to a warning.
 --lookback
   The amount of time you want to look back when calculating average rates.
   Use it for mode interface-errors or interface-usage. Without --lookback
   the time between two runs of check_nwc_health is the base for calculations.
   If you want your checkresult to be based for example on the past hour,
   use --lookback 3600. 
 --environment
   Add a variable to the plugin's environment
 --negate
   Emulate the negate plugin. --negate warning=critical --negate unknown=critical
 --morphmessage
   Modify the final output message
 --morphperfdata
   The parameter allows you to change performance data labels.
   It's a perl regexp and a substitution.
   Example: --morphperfdata '(.*)ISATAP(.*)'='$1patasi$2'
 --selectedperfdata
   The parameter allows you to limit the list of performance data. It's a perl regexp.
   Only matching perfdata show up in the output
 --report
   Can be used to shorten the output
 --multiline
   Multiline output
 --with-mymodules-dyn-dir
   Add-on modules for the my-modes will be searched in this directory
 --statefilesdir
   An alternate directory where the plugin can save files
 --isvalidtime
   Signals the plugin to return OK if now is not a valid check time
 --alias
   The alias name of a 64bit-interface (ifAlias)
 --ifspeedin
   Override the ifspeed oid of an interface (only inbound)
 --ifspeedout
   Override the ifspeed oid of an interface (only outbound)
 --ifspeed
   Override the ifspeed oid of an interface
 --role
   The role of this device in a hsrp group (active/standby/listen)

vijilants · Post by **vijilants** » Wed Aug 12, 2015 8:17 am

vijilants wrote:
snapon_admin wrote:This is ours:
Code: Select all
$USER1$/check_nwc_health --t 60 --hostname $HOSTADDRESS$ --community $ARG1$ --mode hardware-health
The only threshold I can think of would be for temperature which I'm not eve nsure if you can specify that. By default it looks like the alert threshold is set for 60, which is a good alert threshold if that's Celsius.
OK, so without any threshold info, say if a fan or psu failed, how would this be shown in nagios ?.....Would it bring up some sort of a red alarm indication ?

The --t 60 is the timeout and not temperature.....I tested it and turned it down to 1 and the request timed out.

Post by **snapon_admin** » Wed Aug 12, 2015 9:40 am

Yeah, that's not why I thought the alert was 60 though. We got a warning alarm on temp once and it was like 68 or something and it cleared after a bit when it went back below 60. I could be wrong on that threshold though, that was like a year ago that this happened.

jdalrymple · Post by **jdalrymple** » Wed Aug 12, 2015 11:43 am

Since it's not our plugin, I can't tell you where it's coming from - mine sets the warning threshold at 65 though even if I try to explicitly set it, furthermore it doesn't set a critical threshold:

Code: Select all

[jrdalrymple@localhost libexec]$ ./check_nwc_health --mode hardware-health --hostname <switch1> --community public --warning=15 --critical=20
OK - environmental hardware working fine | 'temp_1005'=43;65;;;

I can't promise that the threshold indicated is even honored - again not our software.

Reading through their bugs and such indicates to me that those thresholds should work, but perhaps it's only for specific hardware. That plugin monitors a zillion different network devices.

vijilants · Post by **vijilants** » Tue Sep 15, 2015 9:59 am

Thank you all for your help....I think that this has been a very useful thread and hopefully it will help others...

Now I need to find a plugin for Cisco Memory utilisation !!!

tmcdonald · Post by **tmcdonald** » Tue Sep 15, 2015 10:37 am

vijilants wrote:Now I need to find a plugin for Cisco Memory utilisation !!!

Would you mind opening a new topic for this for the sake of organization? I would like to close this one since the issue has been resolved and the topic has gotten quite long.

vijilants · Post by **vijilants** » Tue Sep 15, 2015 10:51 am

Certainly....and thanks again !

Nagios Support Forum

How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?

Re: How do I monitor Cisco CPU utilisation ?