Recently, I was looking into ways of rolling up our Host and Service notifications into a single check. Or rather have the service checks not alert, but have the host check the status of the services.
Searching the Nagios Plugin repository, I found this Ruby script github repo called `check_check`.
After reading thru the code base, I was horrified to see that it works by reading in the entire `status.dat` file (all 11MB), parsing it for the status of the Host and its respective Services. Since we have 1000+ hosts. Having check_check run on each host, reading in an 11MB file each time and parsing it each time and looping over it would be incredibly slow / CPU intensive.
So my question: Is there a more direct way to query the status of a Host and its respective Services?
Coding a Check Plugin: How to get status of service of host?
-
MrWoodward
- Posts: 66
- Joined: Fri Jan 06, 2017 1:58 pm
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Coding a Check Plugin: How to get status of service of h
you would want to pass in the on-demand macros of your checks
https://assets.nagios.com/downloads/nag ... acros.html
Or if you have it enabled in the config you can also use Macros as Environment Variables from the same doc.
https://assets.nagios.com/downloads/nag ... acros.html
Or if you have it enabled in the config you can also use Macros as Environment Variables from the same doc.
-
MrWoodward
- Posts: 66
- Joined: Fri Jan 06, 2017 1:58 pm
Re: Coding a Check Plugin: How to get status of service of h
Oh yes, I definitely plan on passing in on-demand macros.
What I would like to do is allow the Services to run their checks, per normal, but directly access the results of each service check in a quick and efficient manner outside of each check itself.
I see the results are stored in "status.dat". But is there a programmatic way to directly access these results (rather than looping over line-by-line in a file)?
Like, can I send a query to the named pipe (FIFO) `/usr/local/nagios/var/rw/nagios.cmd` and would it spit back the last result for said service or host?
Are these results stored in a database somewhere?
Or is "status.dat" the only location where results of checks are stored?
EDIT: Alternatively, do you know of a way to easily roll-up all Critical/Error states for a Host and/or its respective Services such that only 1 alert/notification goes out? We're exploring the use of OpsGenie and PagerDuty, but it looks like it'll be a few months before we adopt either.
That Ruby "check_check" script is a good idea and seems to do what we'd want, but it also seems very resource intensive and could easily bog down our server.
What I would like to do is allow the Services to run their checks, per normal, but directly access the results of each service check in a quick and efficient manner outside of each check itself.
I see the results are stored in "status.dat". But is there a programmatic way to directly access these results (rather than looping over line-by-line in a file)?
Like, can I send a query to the named pipe (FIFO) `/usr/local/nagios/var/rw/nagios.cmd` and would it spit back the last result for said service or host?
Are these results stored in a database somewhere?
Or is "status.dat" the only location where results of checks are stored?
EDIT: Alternatively, do you know of a way to easily roll-up all Critical/Error states for a Host and/or its respective Services such that only 1 alert/notification goes out? We're exploring the use of OpsGenie and PagerDuty, but it looks like it'll be a few months before we adopt either.
That Ruby "check_check" script is a good idea and seems to do what we'd want, but it also seems very resource intensive and could easily bog down our server.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Coding a Check Plugin: How to get status of service of h
Please look at the "On-Demand Macros" section of the document I linked... it gives examples of getting them for different hosts/services.
such as:
such as:
Replace HOSTMACRONAME and SERVICEMACRONAME with the name of one of the standard host of service macros found here.Code: Select all
$HOSTMACRONAME:host_name$ $SERVICEMACRONAME:host_name:service_description$
Note that the macro name is separated from the host or service identifier by a colon (:). For on-demand service macros, the service identifier consists of both a host name and a service description - these are separated by a colon (:) as well.
Tip: On-demand service macros can contain an empty host name field. In this case the name of the host associated with the service will automatically be used.
Examples of on-demand host and service macros follow:Code: Select all
$HOSTDOWNTIME:myhost$ <--- On-demand host macro $SERVICESTATEID:novellserver:DS Database$ <--- On-demand service macro $SERVICESTATEID::CPU Load$ <--- On-demand service macro with blank host name field