Long apply configurations

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Long apply configurations

Post by WillemDH »

Hello,

Our apply configurations are taking longer and longer. It can take up to 40 seconds before Nagios XI is back online. Looking around the web I see competing Nagios clones which implemented a system where a parent process is spawned which takes over the monitoring etc. The new configuration is loaded in a duplicate child process. When the new configuration is loaded compeletely, the parent process with the old configuration is killed and the new process takes over resulting in a supposed 'downtime' of only 3-5 seconds.

Is this a feature which can be implemented in Nagios XI? Honestly, the long apply configurations are one of the most annoying features of Nagios XI. During the apply configuration process, there is a Window of 15-20 seconds where the Nagios hosts and services are no longer visible. Then there is a window of 10 seconds where hosts and services which were in downtime / acknowledged are visible in the open service problems views. This results in very confusing situations with duplicate calls and frustrated colleagues.

I understand my Nagios XI instance is bigger then the average, but we really need a better and more consistent solution for the apply configuration process. Please realize about 10 - 20 apply's are done each day resulting in 10-20 timeframes of 40 seconds where our views and dashboards are flashing or not showing anything at all or showing problems that already have been acknowledged.

Thanks for looking into this.

Willem
Nagios XI 5.8.1
https://outsideit.net
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Long apply configurations

Post by tmcdonald »

This would need to be more of a Core change than XI, but I think XI would need to be involved at some point as well, just not to the same degree.

I pinged our Core dev on this for his thoughts, will update the thread when I know more. That being said, I think this sort of functionality would be a great idea.
Former Nagios employee
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Long apply configurations

Post by tmcdonald »

That might work. It would need some careful coding, but that might be the easiest way to do it.
From our dev. A GitHub issue was suggested, and I can file that or you can, doesn't matter to me.

Bear in mind this would take a lot of re-architecting and testing, so it likely would not be done very soon.
Former Nagios employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Long apply configurations

Post by WillemDH »

Trevor,

I understand this would take time to implement. I'll make the GitHub issue.

https://github.com/NagiosEnterprises/na ... issues/176

Thanks

Willem
Nagios XI 5.8.1
https://outsideit.net
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Long apply configurations

Post by rkennedy »

Thanks Willem! I'll leave this thread open should further discussion happen in the future, or if you have anything to add.
Former Nagios Employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Long apply configurations

Post by WillemDH »

As requested by avandemore https://support.nagios.com/forum/viewto ... 20#p204520
During these restarts, does the information show up in Core?
Yes the issue is also in Core. Just tested it.

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Long apply configurations

Post by avandemore »

This is different than the referenced thread if Core is exhibiting this behavior. Please post or PM your nagios.cfg.
Previous Nagios employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Long apply configurations

Post by WillemDH »

pm'd you my config
Nagios XI 5.8.1
https://outsideit.net
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Long apply configurations

Post by avandemore »

Your configuration looks correct for Core to preserve state across a reboot. During an Apply Config, what is the output from:

Code: Select all

# tail -F /usr/local/nagios/var/retention.dat
You can also PM this if necessary.
Previous Nagios employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Long apply configurations

Post by WillemDH »

Avandemore,

Wel... I did as you asked. The output from the tail is super huge. Will be hard to even pm you this. Basically from the moment I'm applying it doesn't coutput fro 11 seconds and then starts ooutputting like crazy for +- 20 more seconds.

Example output:

Code: Select all

hostdowntime {
host_name=servername
comment_id=2404807
downtime_id=374057
entry_time=1480831262
start_time=1481349600
flex_downtime_start=0
end_time=1481367600
triggered_by=0
fixed=1
duration=18000
is_in_effect=0
start_notification_sent=0
author=user
comment=AUTO: alfresco rebuild index
}
It's just too much content and full of sensitive information. if you absolutely want to see this data, I suggest we do a remote support session or so.

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
Locked