Page 1 of 1

Coding a Check Plugin: How to get status of service of host?

Posted: Thu May 31, 2018 12:53 pm
by MrWoodward
Recently, I was looking into ways of rolling up our Host and Service notifications into a single check. Or rather have the service checks not alert, but have the host check the status of the services.

Searching the Nagios Plugin repository, I found this Ruby script github repo called `check_check`.

After reading thru the code base, I was horrified to see that it works by reading in the entire `status.dat` file (all 11MB), parsing it for the status of the Host and its respective Services. Since we have 1000+ hosts. Having check_check run on each host, reading in an 11MB file each time and parsing it each time and looping over it would be incredibly slow / CPU intensive.

So my question: Is there a more direct way to query the status of a Host and its respective Services?

Re: Coding a Check Plugin: How to get status of service of h

Posted: Thu May 31, 2018 1:36 pm
by scottwilkerson
you would want to pass in the on-demand macros of your checks
https://assets.nagios.com/downloads/nag ... acros.html

Or if you have it enabled in the config you can also use Macros as Environment Variables from the same doc.

Re: Coding a Check Plugin: How to get status of service of h

Posted: Thu May 31, 2018 2:32 pm
by MrWoodward
Oh yes, I definitely plan on passing in on-demand macros.

What I would like to do is allow the Services to run their checks, per normal, but directly access the results of each service check in a quick and efficient manner outside of each check itself.

I see the results are stored in "status.dat". But is there a programmatic way to directly access these results (rather than looping over line-by-line in a file)?

Like, can I send a query to the named pipe (FIFO) `/usr/local/nagios/var/rw/nagios.cmd` and would it spit back the last result for said service or host?

Are these results stored in a database somewhere?

Or is "status.dat" the only location where results of checks are stored?

EDIT: Alternatively, do you know of a way to easily roll-up all Critical/Error states for a Host and/or its respective Services such that only 1 alert/notification goes out? We're exploring the use of OpsGenie and PagerDuty, but it looks like it'll be a few months before we adopt either.

That Ruby "check_check" script is a good idea and seems to do what we'd want, but it also seems very resource intensive and could easily bog down our server.

Re: Coding a Check Plugin: How to get status of service of h

Posted: Thu May 31, 2018 4:07 pm
by scottwilkerson
Please look at the "On-Demand Macros" section of the document I linked... it gives examples of getting them for different hosts/services.

such as:

Code: Select all

$HOSTMACRONAME:host_name$
$SERVICEMACRONAME:host_name:service_description$
Replace HOSTMACRONAME and SERVICEMACRONAME with the name of one of the standard host of service macros found here.

Note that the macro name is separated from the host or service identifier by a colon (:). For on-demand service macros, the service identifier consists of both a host name and a service description - these are separated by a colon (:) as well.

Tip: On-demand service macros can contain an empty host name field. In this case the name of the host associated with the service will automatically be used.

Examples of on-demand host and service macros follow:

Code: Select all

$HOSTDOWNTIME:myhost$                        <--- On-demand host macro
$SERVICESTATEID:novellserver:DS Database$    <--- On-demand service macro
$SERVICESTATEID::CPU Load$                   <--- On-demand service macro with blank host name field