I've spent some time trying to find an answer to this, but I couldn't.
Why must the Nagios monitoring engine be restarted every time there is a configuration change?
Are there plans to eliminate this requirement in the future?
Does anyone else worry about the "non-monitored" period of time that the Nagios monitoring engine is restarting, particularly if you are doing a lot of configuration changes?
Thank you.
Why must we restart Nagios to apply configurations?
Re: Why must we restart Nagios to apply configurations?
I'm not a developer, but I believe it is easier to tell Nagios that there are changes with a restart than it would be to have Nagios monitor for those changes constantly. If it picked up changes as soon as they were made, what happens if you make a mistake or there is an invalid config? Or you are in the middle of editing a file, but you have to take a call so you save it mid-way through?
Former Nagios employee
Re: Why must we restart Nagios to apply configurations?
Your concerns are valid, but the "pre-flight" check that Nagios performs prior to restarting itself could still be run before telling Nagios to re-read it's configuration.
I have to believe there's a solid reason for this requirement, I just can't figure out what it is. I know software like HP OpenView does not have such a requirement. Moreover, I don't know what happens to service checks, SNMP traps, etc. when Nagios is reconfiguring itself. Are they lost? Do they queue?
My concerns are based, admittedly, on some ignorance of the inner workings of the nagios application itself. It's also driven by the fear that when we have tens of thousands of service checks configured, that this restart could take quite some time.
Perhaps the reason is that Nagios core was originally developed without a backend configuration database (i.e. it relied solely upon configuration files that could be easily understood and maintained), so this restart (equivalent to a "kill -HUP" perhaps) was required for the daemon to see the new configuration. With Nagios XI, which is what we're using, there is a backend database that holds configuration information.
I have to believe there's a solid reason for this requirement, I just can't figure out what it is. I know software like HP OpenView does not have such a requirement. Moreover, I don't know what happens to service checks, SNMP traps, etc. when Nagios is reconfiguring itself. Are they lost? Do they queue?
My concerns are based, admittedly, on some ignorance of the inner workings of the nagios application itself. It's also driven by the fear that when we have tens of thousands of service checks configured, that this restart could take quite some time.
Perhaps the reason is that Nagios core was originally developed without a backend configuration database (i.e. it relied solely upon configuration files that could be easily understood and maintained), so this restart (equivalent to a "kill -HUP" perhaps) was required for the daemon to see the new configuration. With Nagios XI, which is what we're using, there is a backend database that holds configuration information.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Why must we restart Nagios to apply configurations?
In Nagios Core 4+ restarting takes only a second, and the state is retained in the retention.dat
Actually, Nagios XI writes out the configuration files (from the DB) and Nagios Core in XI utilizes them the same way as if Core was standalonemp4783 wrote:Perhaps the reason is that Nagios core was originally developed without a backend configuration database (i.e. it relied solely upon configuration files that could be easily understood and maintained), so this restart (equivalent to a "kill -HUP" perhaps) was required for the daemon to see the new configuration. With Nagios XI, which is what we're using, there is a backend database that holds configuration information.
Re: Why must we restart Nagios to apply configurations?
Scott,
Firstly, thank you for the prompt answer.
Are you saying that any passive alarm, SNMP trap, etc. (i.e. not an active check) will still be received and evaluated?
Why is there such a delay in the Nagios XI GUI interface when you apply changes? It's very tedious.
These question mainly concern making changes via external configuration file import or via the command pipeline, but my "experience" with the reconfiguration (apply changes) process is primarily through the GUI. Hence, these questions.
Firstly, thank you for the prompt answer.
Are you saying that any passive alarm, SNMP trap, etc. (i.e. not an active check) will still be received and evaluated?
Why is there such a delay in the Nagios XI GUI interface when you apply changes? It's very tedious.
These question mainly concern making changes via external configuration file import or via the command pipeline, but my "experience" with the reconfiguration (apply changes) process is primarily through the GUI. Hence, these questions.
Re: Why must we restart Nagios to apply configurations?
Yes, the snmptt daemon should still spool the trap.mp4783 wrote: Are you saying that any passive alarm, SNMP trap, etc. (i.e. not an active check) will still be received and evaluated?
What version are you on? 2012 was slow to apply, but 2014 should be very quick.mp4783 wrote:Why is there such a delay in the Nagios XI GUI interface when you apply changes? It's very tedious.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Why must we restart Nagios to apply configurations?
We're on Nagios XI 2014R1.5. However, we have applied an external "restart" utility for any Nagios reconfiguration restarts. This external system may in fact be the cause of the delays in the Nagios XI GUI.
While I realize it is difficult to answer, what should my expectations be for completion of the reconfiguration after you click "Apply Configuration" in the Nagios XI GUI? Assume the system is lightly loaded. On average, it's taking 20 to 45 seconds before it indicates that the configuration was applied correctly. During that time, the /usr/local/nagios/nagiosxi/scripts/reconfigure_nagios.lock file is present.
While I realize it is difficult to answer, what should my expectations be for completion of the reconfiguration after you click "Apply Configuration" in the Nagios XI GUI? Assume the system is lightly loaded. On average, it's taking 20 to 45 seconds before it indicates that the configuration was applied correctly. During that time, the /usr/local/nagios/nagiosxi/scripts/reconfigure_nagios.lock file is present.
Re: Why must we restart Nagios to apply configurations?
Actually, I still don't have my question answered.
Why, on a technical level, does the Nagios monitoring engine need to restart itself each time it gets a new configuration item? Why can't it read it dynamically?
I can think of a few reasons why, but I'd like to know what in its architecture requires this.
Why, on a technical level, does the Nagios monitoring engine need to restart itself each time it gets a new configuration item? Why can't it read it dynamically?
I can think of a few reasons why, but I'd like to know what in its architecture requires this.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Why must we restart Nagios to apply configurations?
It is really more about what ISN'T in it's architecture. There is no mechanism in Nagios Core to dynamically add hosts/services, or any configurations. Additionally,this would lead to multiple sources of truth, what is the running configuration vs. what is the configuration in memory.mp4783 wrote:I can think of a few reasons why, but I'd like to know what in its architecture requires this.
At present, the objects in the .cfg files are only read when Nagios starts, adding additional items to the config doesn't change anything because the config will not be re-read until the process starts again.
Re: Why must we restart Nagios to apply configurations?
What I have discovered is that reconfiguration of Nagios (i.e. the time the monitoring daemon(s) is offline) varies with the number of changes being made. While this should be self-evident, any large scale change will cause active monitoring from that Nagios XI server to "go dark" for an extended period of time. A recent test which imported 1000 new service checks for a single host took the monitor offline for 2 minutes and 45 seconds.
Previous posts as to the underlying architectural reasons for this behavior make sense within the context of trying to support the legacy functionality of Nagios core while adding the enterprise features of Nagios XI. I suppose a good compromise has been struck on that account. This does not mitigate the fact that your core monitoring engine must be restarted for even the most trivial changes.
I suppose if I wanted to spend a couple of days really digging into the "why" of this issue, I might come up with a better answer, or at least satisfy myself that this is necessary even though it's not the most desirable behavior. However, I'll just accept that this is the way things are and move on.
Previous posts as to the underlying architectural reasons for this behavior make sense within the context of trying to support the legacy functionality of Nagios core while adding the enterprise features of Nagios XI. I suppose a good compromise has been struck on that account. This does not mitigate the fact that your core monitoring engine must be restarted for even the most trivial changes.
I suppose if I wanted to spend a couple of days really digging into the "why" of this issue, I might come up with a better answer, or at least satisfy myself that this is necessary even though it's not the most desirable behavior. However, I'll just accept that this is the way things are and move on.