Nagios Support Forum

Posted: **Fri Dec 16, 2011 10:05 am**

Hello,

I'm a Nagios newbie who's been tasked with developing a Nagios plugin to handle getting SNMP status from my company's "device". I'm also an SNMP newbie so that adds to the challenge.

I've installed Nagios and configured basic server and simple service checks (including using the SNMP module to get simple status from a test Windows box).

I think I understand enough about Nagios and setting up an SNMP check, to do simple SNMP checks on specific OIDs in the MIB. What I'm having trouble wrapping my head around is querying/handling the myriad OIDs in the full MIB of our device. Our MIB has lots of tables which will have varying numbers of records depending on how the device is configured. It's not clear to me how I set up checks in the Nagios config that cover *everything* and how I handle an SNMP query of a table that could contain any number of rows. On top of that, how is this information going to be presented in the web interface.

I guess one basic question is - do I set up one service check for my device and then have the plugin that gets called ask for all the OIDs and process them or do I setup tons of different service checks that each ask for one part of the MIB (by passing differnt params to a common plugin of course)?

I know there are lots of examples out there that I need to check and can use as a starting point but I'm just having trouble with the big picture for how this kind of thing is approached.

Thanks for any help or advice you can give,

Tom

Posted: **Sun Dec 18, 2011 4:58 pm**

Problem with SNMP is that it really depends how the vendor has designed their mibs as to what is the most sane way of handling this sort of thing... I normally query each oid that's a. Static and b. Can be considered top level and preferably contains all relevant fields below it (or has an easy way of identifying the non-relevant ones).

If it makes logical sense to make it a single check then take that approach (i.e. All the information retrieved is related), if not then split it up.

Posted: **Sun Dec 18, 2011 5:13 pm**

JSMurphy nails it here -- you're going to need a strong understanding of the MIB and how it, and all its structure, works. Some MIBs are pretty good about allowing one to "drill into" them; others are pathological and require the use of a bulkwalk -- or worse, a straight walk -- to find where to look for things.

Without knowing the MIB you're faced with, and what, precisely, you're looking for, we're kind of helpless here.

Posted: **Mon Dec 19, 2011 2:57 pm**

Thanks for the responses,

Fair enough (It depends...) and I guess it's good to know that it *does* depend and that one doesn't always do it one way or another.

The MIB for my device is split at the top level between Hardware and Services. The "device" is actually more of a cluster of machines and switches so under Hardware there is an entry for the Master node, the Worker nodes, a network switch, and several more things. Under each of these hardware components there is a table that has fields that relate to the device (disks, cpus, power supplies, batteries, etc.). Many of these fields have sub entries such as for multiple disks).

The thing I'm not understanding from a Nagios perspective is, if I create a check that looks at say, the Master Node, my plugin would do an SNMP query for the record from the Master Node OID (which is a table). I would get back (in one way or another) a bunch of information about that Master node. I will then have a combination of statuses from the Master node. If the power supply is good, the battery is bad, disk 1 has plenty of room but disk 2 is over 80%, etc., etc. - how is this kind of information displayed on the Nagios web console?

My simple checks up to now have been something on something that is either good or bad and I thought the plugin could only return an overall state of the check. How can I return this complex, hierarchical, status for a single check and display it logically in Nagios?

It *seems* like I couldn't and therefore would have to have specific checks for *each* item so that I could display the results for each separately. *That* seems onerous of course, so that's the basis for my questions.

I hope that makes sense.

Thanks again,

Tom

Posted: **Mon Dec 19, 2011 5:18 pm**

Ok so here is what I would do, in the script I would have an option of "hardware" or "service" with sub-options for each (cpu, disk, etc).

The command would look something like ./check_thing -H $HOSTNAME$ -o hardware -s disk_usage -w 80 -c 90

The return data would look something along the lines of CRITICAL: disk1 - 50%, disk2 - 95%, disk3 - 10% alternatively if the state is critical ONLY display the ones above threshold... both forms of display are acceptable. You could also of course go further and add another sub-option to allow individual disks to monitor but that's really up to your needs, you could also add performance data if you did individuals.

Taking this approach gives you logical separation of the components and more flexibility for fine tuning your monitoring and will likely result in less work later.

Posted: **Mon Dec 19, 2011 6:11 pm**

Right. So, for each combination of command line switches for the script, I need a corresponding "check" in Nagios that makes that call, correct?

So I need a check (not really sure what you call it) called "hardware_check_disks" (or whatever) that calls the script with the hardware and disk parameters. And I need another check called "hardware_check_battery" that calls the script with hardware are battery params, and so on.

So I need a check configuration in Nagios for each parameter combination. If I want to get more detailed, I have to configure additional checks that use the appropriate switches.

Actually this makes sense to me more now. I was thinking if I could just have one (or a few) high-level check(s) and make a call to ./check_thing and pass it only -o hardware (no sub params) and have it return info on everything. I realize now that that doens't really make much sense since I can't really pass meaningful thresholds since there are so many different "things" in the hardware tree.

This is helpful. Thanks for taking the time to respond,

Tom

Posted: **Tue Dec 20, 2011 12:22 am**

Well you create a command for calling the script (check_thing -H $HOSTNAME$ -o $ARG1$ $ARG2$) and then when you define your services you just pass the arguments as you normally would (something like check_thing!hardware!-sCPU).

You're welcome, glad we could help

Posted: **Tue Dec 20, 2011 4:37 pm**

An alternative to doing an initial "omnibus" check that takes a command-line argument to define the actual check might be to write separate ones first, prove them out individually, and then bundle them into one large script. This'll keep your development space clean from other non-related code. It's the "start small and build on that" approach.

As an aside, once you're ready to deploy these into production systems, you might consider using numeric OIDs internally to the plugin; this will eliminate the overhead of compiling the MIB every time you call the script. However, whilst you are developing the thing, I definitely recommend using symbolic OIDs; it'll just make it easier.

Nagios Support Forum

Developing Custom SNMP Plugin questions

Developing Custom SNMP Plugin questions

Re: Developing Custom SNMP Plugin questions

Re: Developing Custom SNMP Plugin questions

Re: Developing Custom SNMP Plugin questions

Re: Developing Custom SNMP Plugin questions

Re: Developing Custom SNMP Plugin questions

Re: Developing Custom SNMP Plugin questions

Re: Developing Custom SNMP Plugin questions