Help with API integration with Config Management tool.
Posted: Fri Jun 05, 2020 8:48 am
Hello Nagios Support,
I'm running into some problems scaling our Config Management tool to configure systems in Nagios. At a high level, I am wondering if there is a better way to accomplish what we are trying to do.
We have a config management system (Puppet) managing configs on approximately 50 servers. The config management manifest will run individual API commands against the Nagios server to see if each and every expected Host and Service is managed. To do this, it makes a GET object/host and GET object/service for expected Hosts and Services - and looks specifically for the recordcount return to be 1. If not 1, then assume Host/Service is not monitored, and issue a POST config/host and POST config/service where needed. It will only perform one apply-config at the end of the manifest run, and only if any changes were made to Nagios.
Problem we are encountering: When running an Apply Config, the recordcount will return 0 until the configs are in place and appropriate Nagios services are running. If one of our other systems happens to perform a config-management run and check in during this time, and run its GET object/host, GET object/service commands -- it too will get a "recordcount": "0", so it will run the API commands to configure the "missing" monitoring and perform another Apply Config. If another system checks in during that time... you can guess what happens. It creates a race condition or storm of Apply Configs which cause a lot of problems for anyone trying to use the UI to manage systems - until things settle out. We are hitting this race condition or storm of Apply-Configs trying to manage approximately 60 Hosts and 700 Services.
Is there a better way to accomplish what we are trying to do?
I understand this may be outside the scope of traditional Nagios product support. Any hints at what others might do, or any workarounds in general would be greatly appreciated.
Thanks,
-marc
I'm running into some problems scaling our Config Management tool to configure systems in Nagios. At a high level, I am wondering if there is a better way to accomplish what we are trying to do.
We have a config management system (Puppet) managing configs on approximately 50 servers. The config management manifest will run individual API commands against the Nagios server to see if each and every expected Host and Service is managed. To do this, it makes a GET object/host and GET object/service for expected Hosts and Services - and looks specifically for the recordcount return to be 1. If not 1, then assume Host/Service is not monitored, and issue a POST config/host and POST config/service where needed. It will only perform one apply-config at the end of the manifest run, and only if any changes were made to Nagios.
Problem we are encountering: When running an Apply Config, the recordcount will return 0 until the configs are in place and appropriate Nagios services are running. If one of our other systems happens to perform a config-management run and check in during this time, and run its GET object/host, GET object/service commands -- it too will get a "recordcount": "0", so it will run the API commands to configure the "missing" monitoring and perform another Apply Config. If another system checks in during that time... you can guess what happens. It creates a race condition or storm of Apply Configs which cause a lot of problems for anyone trying to use the UI to manage systems - until things settle out. We are hitting this race condition or storm of Apply-Configs trying to manage approximately 60 Hosts and 700 Services.
Is there a better way to accomplish what we are trying to do?
I understand this may be outside the scope of traditional Nagios product support. Any hints at what others might do, or any workarounds in general would be greatly appreciated.
Thanks,
-marc