Nagios Support Forum

Posted: **Fri Jun 05, 2020 8:48 am**

Hello Nagios Support,

I'm running into some problems scaling our Config Management tool to configure systems in Nagios. At a high level, I am wondering if there is a better way to accomplish what we are trying to do.
We have a config management system (Puppet) managing configs on approximately 50 servers. The config management manifest will run individual API commands against the Nagios server to see if each and every expected Host and Service is managed. To do this, it makes a GET object/host and GET object/service for expected Hosts and Services - and looks specifically for the recordcount return to be 1. If not 1, then assume Host/Service is not monitored, and issue a POST config/host and POST config/service where needed. It will only perform one apply-config at the end of the manifest run, and only if any changes were made to Nagios.

Problem we are encountering: When running an Apply Config, the recordcount will return 0 until the configs are in place and appropriate Nagios services are running. If one of our other systems happens to perform a config-management run and check in during this time, and run its GET object/host, GET object/service commands -- it too will get a "recordcount": "0", so it will run the API commands to configure the "missing" monitoring and perform another Apply Config. If another system checks in during that time... you can guess what happens. It creates a race condition or storm of Apply Configs which cause a lot of problems for anyone trying to use the UI to manage systems - until things settle out. We are hitting this race condition or storm of Apply-Configs trying to manage approximately 60 Hosts and 700 Services.

Is there a better way to accomplish what we are trying to do?

I understand this may be outside the scope of traditional Nagios product support. Any hints at what others might do, or any workarounds in general would be greatly appreciated.

Thanks,
-marc

Posted: **Fri Jun 05, 2020 3:21 pm**

After talking with the developer you should first take a look and see if is_currently_running is set to 1 from this output:

Code: Select all

http://YOURXISERVER/nagiosxi/help/api-system-reference.php#system-status

That's what development uses internally to signify that ndo is finished being built from the restart/apply.

Try that and see if that alleviates your issue.

Let us know the results.

Thank you!

Posted: **Mon Jun 08, 2020 8:43 am**

Thank you! That is exactly the path I was starting to investigate. I am trying to weigh the pros/cons of checking is_currently_running once at the start of the config-management run, or for each and every API call that's made to ensure xyz is monitored.

Posted: **Mon Jun 08, 2020 4:12 pm**

I would do it for each API call because you cannot determine when someone will run an apply config unless you're doing the API stuff off-hours.

Nagios Support Forum

Help with API integration with Config Management tool.

Help with API integration with Config Management tool.

Re: Help with API integration with Config Management tool.

Re: Help with API integration with Config Management tool.

Re: Help with API integration with Config Management tool.