Hello Nagios Support,
I'm running into some problems scaling our Config Management tool to configure systems in Nagios. At a high level, I am wondering if there is a better way to accomplish what we are trying to do.
We have a config management system (Puppet) managing configs on approximately 50 servers. The config management manifest will run individual API commands against the Nagios server to see if each and every expected Host and Service is managed. To do this, it makes a GET object/host and GET object/service for expected Hosts and Services - and looks specifically for the recordcount return to be 1. If not 1, then assume Host/Service is not monitored, and issue a POST config/host and POST config/service where needed. It will only perform one apply-config at the end of the manifest run, and only if any changes were made to Nagios.
Problem we are encountering: When running an Apply Config, the recordcount will return 0 until the configs are in place and appropriate Nagios services are running. If one of our other systems happens to perform a config-management run and check in during this time, and run its GET object/host, GET object/service commands -- it too will get a "recordcount": "0", so it will run the API commands to configure the "missing" monitoring and perform another Apply Config. If another system checks in during that time... you can guess what happens. It creates a race condition or storm of Apply Configs which cause a lot of problems for anyone trying to use the UI to manage systems - until things settle out. We are hitting this race condition or storm of Apply-Configs trying to manage approximately 60 Hosts and 700 Services.
Is there a better way to accomplish what we are trying to do?
I understand this may be outside the scope of traditional Nagios product support. Any hints at what others might do, or any workarounds in general would be greatly appreciated.
Thanks,
-marc
Help with API integration with Config Management tool.
Re: Help with API integration with Config Management tool.
After talking with the developer you should first take a look and see if is_currently_running is set to 1 from this output:
That's what development uses internally to signify that ndo is finished being built from the restart/apply.
Try that and see if that alleviates your issue.
Let us know the results.
Thank you!
Code: Select all
http://YOURXISERVER/nagiosxi/help/api-system-reference.php#system-statusTry that and see if that alleviates your issue.
Let us know the results.
Thank you!
Re: Help with API integration with Config Management tool.
Thank you! That is exactly the path I was starting to investigate. I am trying to weigh the pros/cons of checking is_currently_running once at the start of the config-management run, or for each and every API call that's made to ensure xyz is monitored.
Re: Help with API integration with Config Management tool.
I would do it for each API call because you cannot determine when someone will run an apply config unless you're doing the API stuff off-hours.