Dynamic Service Monitoring - Info Request

mrochelle · Post by **mrochelle** » Wed Oct 09, 2013 1:46 pm

We have a table that contains hundreds of entries containing various information about files that we deliver to our customers (i.e. file name, time the file is due, minimum and maximum allowable file size, etc.). We have a current process that checks each file at its corresponding applicable time and issues an alert if the file doesn’t exist or doesn’t meet the minimum or maximum allowable file size requirements. We would like to adjust the process to integrate it into Nagios. We would like a separate alert for each file incident_date_time, but do not want to have to create a separate Nagios service for each of the hundreds of files.
Is there a way to dynamically create a service to do this? I'm already aware of one major concern, which would be the restart of Nagios with each config modification if this were a possibility.
Any thoughts or comments are appreciated.

Post by **BanditBBS** » Wed Oct 09, 2013 1:52 pm

How often does the list of files get updated? You could write a script that could go through the list and write the hundreds of service definitions as static configuration files and then issue an "Apply Configuration" when completed. You could manually run this script anytime the list is updated or have it run daily by cron, or whatever.

mrochelle · Post by **mrochelle** » Wed Oct 09, 2013 2:05 pm

The list gets updated every 10 to 15 mins at a minimum.

tmcdonald · Post by **tmcdonald** » Wed Oct 09, 2013 2:09 pm

This won't create a separate alert for each file, but might come close to what you want:

Create a single check against your host system. The check should give Nagios a 0/OK if no files are old/small/big, and give 2/Critical if there is even 1 file that needs attention. In your script, in addition to exiting with that 0 or 2, you can echo out additional information which will be shown in XI. Here's an example plugin I wrote that accomplishes this:

Code: Select all

#!/bin/bash

echo "UNKNOWN - Have you checked me?| Dead=0,Alive=0"

exit 3

I call it check_schrodinger since it always exits with 3/UNKNOWN. The echoed message appears in XI when I click the service as follows:

Everything before the pipe is echoed to the screen. It would be trivially to make your script output an informative list of what files need to be addressed.

The only challenge would be if the table is hosted on a remote server, in which case you might need NRPE or something.

Post by **BanditBBS** » Wed Oct 09, 2013 2:10 pm

That'd be real ugly having to restart that often and is definitely not something I'd do.

I'm not sure how I'd handle it then. I'd perhaps write a script to read the list and send an alert if anything is wrong. Then, maybe have the service reset to OK on its own after 1 minute so next time the check ran if there was another error, it would error out again and alert. You could also make the check return a list of all the files that had issues in the one alert. I can't think of any way to handle it like you originally described with a separate alert for each.

EDIT: tmcdonald beat me to it, I basically say what he said. Mine just wasn't written as nice, LOL

tmcdonald · Post by **tmcdonald** » Wed Oct 09, 2013 2:13 pm

BanditBBS wrote:EDIT: tmcdonald beat me to it, I basically say what he said. Mine just wasn't written as nice, LOL

I'm a dangerous man, Bandit.

mrochelle · Post by **mrochelle** » Wed Oct 09, 2013 2:36 pm

Thanks for the info. I Believe I'm leaning toward a single script check and passing the results back to nagios for appropriate alerts. You can lock this down.

Nagios Support Forum

Dynamic Service Monitoring - Info Request

Dynamic Service Monitoring - Info Request

Re: Dynamic Service Monitoring - Info Request

Re: Dynamic Service Monitoring - Info Request

Re: Dynamic Service Monitoring - Info Request

Re: Dynamic Service Monitoring - Info Request

Re: Dynamic Service Monitoring - Info Request

Re: Dynamic Service Monitoring - Info Request