Why must we restart Nagios to apply configurations?

mp4783 · Post by **mp4783** » Tue Feb 03, 2015 9:03 am

I've spent some time trying to find an answer to this, but I couldn't.

Why must the Nagios monitoring engine be restarted every time there is a configuration change?

Are there plans to eliminate this requirement in the future?

Does anyone else worry about the "non-monitored" period of time that the Nagios monitoring engine is restarting, particularly if you are doing a lot of configuration changes?

Thank you.

tmcdonald · Post by **tmcdonald** » Tue Feb 03, 2015 9:42 am

I'm not a developer, but I believe it is easier to tell Nagios that there are changes with a restart than it would be to have Nagios monitor for those changes constantly. If it picked up changes as soon as they were made, what happens if you make a mistake or there is an invalid config? Or you are in the middle of editing a file, but you have to take a call so you save it mid-way through?

mp4783 · Post by **mp4783** » Tue Feb 03, 2015 10:12 am

Your concerns are valid, but the "pre-flight" check that Nagios performs prior to restarting itself could still be run before telling Nagios to re-read it's configuration.

I have to believe there's a solid reason for this requirement, I just can't figure out what it is. I know software like HP OpenView does not have such a requirement. Moreover, I don't know what happens to service checks, SNMP traps, etc. when Nagios is reconfiguring itself. Are they lost? Do they queue?

My concerns are based, admittedly, on some ignorance of the inner workings of the nagios application itself. It's also driven by the fear that when we have tens of thousands of service checks configured, that this restart could take quite some time.

Perhaps the reason is that Nagios core was originally developed without a backend configuration database (i.e. it relied solely upon configuration files that could be easily understood and maintained), so this restart (equivalent to a "kill -HUP" perhaps) was required for the daemon to see the new configuration. With Nagios XI, which is what we're using, there is a backend database that holds configuration information.

scottwilkerson · Post by **scottwilkerson** » Tue Feb 03, 2015 11:09 am

In Nagios Core 4+ restarting takes only a second, and the state is retained in the retention.dat

mp4783 wrote:Perhaps the reason is that Nagios core was originally developed without a backend configuration database (i.e. it relied solely upon configuration files that could be easily understood and maintained), so this restart (equivalent to a "kill -HUP" perhaps) was required for the daemon to see the new configuration. With Nagios XI, which is what we're using, there is a backend database that holds configuration information.

Actually, Nagios XI writes out the configuration files (from the DB) and Nagios Core in XI utilizes them the same way as if Core was standalone

mp4783 · Post by **mp4783** » Tue Feb 03, 2015 4:56 pm

Scott,

Firstly, thank you for the prompt answer.

Are you saying that any passive alarm, SNMP trap, etc. (i.e. not an active check) will still be received and evaluated?

Why is there such a delay in the Nagios XI GUI interface when you apply changes? It's very tedious.

These question mainly concern making changes via external configuration file import or via the command pipeline, but my "experience" with the reconfiguration (apply changes) process is primarily through the GUI. Hence, these questions.

abrist · Post by **abrist** » Tue Feb 03, 2015 5:29 pm

mp4783 wrote: Are you saying that any passive alarm, SNMP trap, etc. (i.e. not an active check) will still be received and evaluated?

Yes, the snmptt daemon should still spool the trap.

mp4783 wrote:Why is there such a delay in the Nagios XI GUI interface when you apply changes? It's very tedious.

What version are you on? 2012 was slow to apply, but 2014 should be very quick.

mp4783 · Post by **mp4783** » Sat Feb 14, 2015 11:50 am

We're on Nagios XI 2014R1.5. However, we have applied an external "restart" utility for any Nagios reconfiguration restarts. This external system may in fact be the cause of the delays in the Nagios XI GUI.

While I realize it is difficult to answer, what should my expectations be for completion of the reconfiguration after you click "Apply Configuration" in the Nagios XI GUI? Assume the system is lightly loaded. On average, it's taking 20 to 45 seconds before it indicates that the configuration was applied correctly. During that time, the /usr/local/nagios/nagiosxi/scripts/reconfigure_nagios.lock file is present.

mp4783 · Post by **mp4783** » Sat Feb 14, 2015 12:18 pm

Actually, I still don't have my question answered.

Why, on a technical level, does the Nagios monitoring engine need to restart itself each time it gets a new configuration item? Why can't it read it dynamically?

I can think of a few reasons why, but I'd like to know what in its architecture requires this.

scottwilkerson · Post by **scottwilkerson** » Sun Feb 15, 2015 1:42 pm

mp4783 wrote:I can think of a few reasons why, but I'd like to know what in its architecture requires this.

It is really more about what ISN'T in it's architecture. There is no mechanism in Nagios Core to dynamically add hosts/services, or any configurations. Additionally,this would lead to multiple sources of truth, what is the running configuration vs. what is the configuration in memory.

At present, the objects in the .cfg files are only read when Nagios starts, adding additional items to the config doesn't change anything because the config will not be re-read until the process starts again.

mp4783 · Post by **mp4783** » Mon Feb 23, 2015 8:55 am

What I have discovered is that reconfiguration of Nagios (i.e. the time the monitoring daemon(s) is offline) varies with the number of changes being made. While this should be self-evident, any large scale change will cause active monitoring from that Nagios XI server to "go dark" for an extended period of time. A recent test which imported 1000 new service checks for a single host took the monitor offline for 2 minutes and 45 seconds.

Previous posts as to the underlying architectural reasons for this behavior make sense within the context of trying to support the legacy functionality of Nagios core while adding the enterprise features of Nagios XI. I suppose a good compromise has been struck on that account. This does not mitigate the fact that your core monitoring engine must be restarted for even the most trivial changes.

I suppose if I wanted to spend a couple of days really digging into the "why" of this issue, I might come up with a better answer, or at least satisfy myself that this is necessary even though it's not the most desirable behavior. However, I'll just accept that this is the way things are and move on.

Nagios Support Forum

Why must we restart Nagios to apply configurations?

Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?

Re: Why must we restart Nagios to apply configurations?