Long apply configurations
Long apply configurations
Hello,
Our apply configurations are taking longer and longer. It can take up to 40 seconds before Nagios XI is back online. Looking around the web I see competing Nagios clones which implemented a system where a parent process is spawned which takes over the monitoring etc. The new configuration is loaded in a duplicate child process. When the new configuration is loaded compeletely, the parent process with the old configuration is killed and the new process takes over resulting in a supposed 'downtime' of only 3-5 seconds.
Is this a feature which can be implemented in Nagios XI? Honestly, the long apply configurations are one of the most annoying features of Nagios XI. During the apply configuration process, there is a Window of 15-20 seconds where the Nagios hosts and services are no longer visible. Then there is a window of 10 seconds where hosts and services which were in downtime / acknowledged are visible in the open service problems views. This results in very confusing situations with duplicate calls and frustrated colleagues.
I understand my Nagios XI instance is bigger then the average, but we really need a better and more consistent solution for the apply configuration process. Please realize about 10 - 20 apply's are done each day resulting in 10-20 timeframes of 40 seconds where our views and dashboards are flashing or not showing anything at all or showing problems that already have been acknowledged.
Thanks for looking into this.
Willem
Our apply configurations are taking longer and longer. It can take up to 40 seconds before Nagios XI is back online. Looking around the web I see competing Nagios clones which implemented a system where a parent process is spawned which takes over the monitoring etc. The new configuration is loaded in a duplicate child process. When the new configuration is loaded compeletely, the parent process with the old configuration is killed and the new process takes over resulting in a supposed 'downtime' of only 3-5 seconds.
Is this a feature which can be implemented in Nagios XI? Honestly, the long apply configurations are one of the most annoying features of Nagios XI. During the apply configuration process, there is a Window of 15-20 seconds where the Nagios hosts and services are no longer visible. Then there is a window of 10 seconds where hosts and services which were in downtime / acknowledged are visible in the open service problems views. This results in very confusing situations with duplicate calls and frustrated colleagues.
I understand my Nagios XI instance is bigger then the average, but we really need a better and more consistent solution for the apply configuration process. Please realize about 10 - 20 apply's are done each day resulting in 10-20 timeframes of 40 seconds where our views and dashboards are flashing or not showing anything at all or showing problems that already have been acknowledged.
Thanks for looking into this.
Willem
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Long apply configurations
This would need to be more of a Core change than XI, but I think XI would need to be involved at some point as well, just not to the same degree.
I pinged our Core dev on this for his thoughts, will update the thread when I know more. That being said, I think this sort of functionality would be a great idea.
I pinged our Core dev on this for his thoughts, will update the thread when I know more. That being said, I think this sort of functionality would be a great idea.
Former Nagios employee
Re: Long apply configurations
From our dev. A GitHub issue was suggested, and I can file that or you can, doesn't matter to me.That might work. It would need some careful coding, but that might be the easiest way to do it.
Bear in mind this would take a lot of re-architecting and testing, so it likely would not be done very soon.
Former Nagios employee
Re: Long apply configurations
Trevor,
I understand this would take time to implement. I'll make the GitHub issue.
https://github.com/NagiosEnterprises/na ... issues/176
Thanks
Willem
I understand this would take time to implement. I'll make the GitHub issue.
https://github.com/NagiosEnterprises/na ... issues/176
Thanks
Willem
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: Long apply configurations
Thanks Willem! I'll leave this thread open should further discussion happen in the future, or if you have anything to add.
Former Nagios Employee
Re: Long apply configurations
As requested by avandemore https://support.nagios.com/forum/viewto ... 20#p204520
Grtz
Willem
Yes the issue is also in Core. Just tested it.During these restarts, does the information show up in Core?
Grtz
Willem
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Long apply configurations
This is different than the referenced thread if Core is exhibiting this behavior. Please post or PM your nagios.cfg.
Previous Nagios employee
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Long apply configurations
Your configuration looks correct for Core to preserve state across a reboot. During an Apply Config, what is the output from:
You can also PM this if necessary.
Code: Select all
# tail -F /usr/local/nagios/var/retention.datPrevious Nagios employee
Re: Long apply configurations
Avandemore,
Wel... I did as you asked. The output from the tail is super huge. Will be hard to even pm you this. Basically from the moment I'm applying it doesn't coutput fro 11 seconds and then starts ooutputting like crazy for +- 20 more seconds.
Example output:
It's just too much content and full of sensitive information. if you absolutely want to see this data, I suggest we do a remote support session or so.
Grtz
Willem
Wel... I did as you asked. The output from the tail is super huge. Will be hard to even pm you this. Basically from the moment I'm applying it doesn't coutput fro 11 seconds and then starts ooutputting like crazy for +- 20 more seconds.
Example output:
Code: Select all
hostdowntime {
host_name=servername
comment_id=2404807
downtime_id=374057
entry_time=1480831262
start_time=1481349600
flex_downtime_start=0
end_time=1481367600
triggered_by=0
fixed=1
duration=18000
is_in_effect=0
start_notification_sent=0
author=user
comment=AUTO: alfresco rebuild index
}
Grtz
Willem
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net