We have a table that contains hundreds of entries containing various information about files that we deliver to our customers (i.e. file name, time the file is due, minimum and maximum allowable file size, etc.). We have a current process that checks each file at its corresponding applicable time and issues an alert if the file doesn’t exist or doesn’t meet the minimum or maximum allowable file size requirements. We would like to adjust the process to integrate it into Nagios. We would like a separate alert for each file incident_date_time, but do not want to have to create a separate Nagios service for each of the hundreds of files.
Is there a way to dynamically create a service to do this? I'm already aware of one major concern, which would be the restart of Nagios with each config modification if this were a possibility.
Any thoughts or comments are appreciated.
Dynamic Service Monitoring - Info Request
Re: Dynamic Service Monitoring - Info Request
How often does the list of files get updated? You could write a script that could go through the list and write the hundreds of service definitions as static configuration files and then issue an "Apply Configuration" when completed. You could manually run this script anytime the list is updated or have it run daily by cron, or whatever.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Dynamic Service Monitoring - Info Request
The list gets updated every 10 to 15 mins at a minimum.
Re: Dynamic Service Monitoring - Info Request
This won't create a separate alert for each file, but might come close to what you want:
Create a single check against your host system. The check should give Nagios a 0/OK if no files are old/small/big, and give 2/Critical if there is even 1 file that needs attention. In your script, in addition to exiting with that 0 or 2, you can echo out additional information which will be shown in XI. Here's an example plugin I wrote that accomplishes this:
I call it check_schrodinger since it always exits with 3/UNKNOWN. The echoed message appears in XI when I click the service as follows:

Everything before the pipe is echoed to the screen. It would be trivially to make your script output an informative list of what files need to be addressed.
The only challenge would be if the table is hosted on a remote server, in which case you might need NRPE or something.
Create a single check against your host system. The check should give Nagios a 0/OK if no files are old/small/big, and give 2/Critical if there is even 1 file that needs attention. In your script, in addition to exiting with that 0 or 2, you can echo out additional information which will be shown in XI. Here's an example plugin I wrote that accomplishes this:
Code: Select all
#!/bin/bash
echo "UNKNOWN - Have you checked me?| Dead=0,Alive=0"
exit 3

Everything before the pipe is echoed to the screen. It would be trivially to make your script output an informative list of what files need to be addressed.
The only challenge would be if the table is hosted on a remote server, in which case you might need NRPE or something.
Former Nagios employee
Re: Dynamic Service Monitoring - Info Request
That'd be real ugly having to restart that often and is definitely not something I'd do.
I'm not sure how I'd handle it then. I'd perhaps write a script to read the list and send an alert if anything is wrong. Then, maybe have the service reset to OK on its own after 1 minute so next time the check ran if there was another error, it would error out again and alert. You could also make the check return a list of all the files that had issues in the one alert. I can't think of any way to handle it like you originally described with a separate alert for each.
EDIT: tmcdonald beat me to it, I basically say what he said. Mine just wasn't written as nice, LOL
I'm not sure how I'd handle it then. I'd perhaps write a script to read the list and send an alert if anything is wrong. Then, maybe have the service reset to OK on its own after 1 minute so next time the check ran if there was another error, it would error out again and alert. You could also make the check return a list of all the files that had issues in the one alert. I can't think of any way to handle it like you originally described with a separate alert for each.
EDIT: tmcdonald beat me to it, I basically say what he said. Mine just wasn't written as nice, LOL
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Dynamic Service Monitoring - Info Request
I'm a dangerous man, Bandit.BanditBBS wrote:EDIT: tmcdonald beat me to it, I basically say what he said. Mine just wasn't written as nice, LOL
Former Nagios employee
Re: Dynamic Service Monitoring - Info Request
Thanks for the info. I Believe I'm leaning toward a single script check and passing the results back to nagios for appropriate alerts. You can lock this down.