22k hosts, 134k services - 45 mins to apply config?
-
robinjporter
- Posts: 12
- Joined: Tue Jan 16, 2018 12:11 pm
22k hosts, 134k services - 45 mins to apply config?
Hi there
We have a client running Nagios XI who have a large install - 22,000 hosts and 134,000 service definitions. When they make a change and apply config, the GUI becomes unresponsive for around 45 minutes. We have made a large number of performance tweaks including offloading the database to a separate server, and implementing a RAMdisk.
Is this kind of delay normal at these numbers, and if not (it does see, excessive), where do we begin troubleshooting it? I will be able to get hands on with the server tomorrow so any pointers would be appreciated.
Many thanks
Robin
We have a client running Nagios XI who have a large install - 22,000 hosts and 134,000 service definitions. When they make a change and apply config, the GUI becomes unresponsive for around 45 minutes. We have made a large number of performance tweaks including offloading the database to a separate server, and implementing a RAMdisk.
Is this kind of delay normal at these numbers, and if not (it does see, excessive), where do we begin troubleshooting it? I will be able to get hands on with the server tomorrow so any pointers would be appreciated.
Many thanks
Robin
Re: 22k hosts, 134k services - 45 mins to apply config?
How many CPU Cores, how much Ram in the host?
I'm at 1700 hosts and 40,000 services and I thought I was large! It takes maybe a minute for GUI to respond and about 5 minutes for the GUI to catch up to present time when I do an apply.
I'm at 1700 hosts and 40,000 services and I thought I was large! It takes maybe a minute for GUI to respond and about 5 minutes for the GUI to catch up to present time when I do an apply.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
robinjporter
- Posts: 12
- Joined: Tue Jan 16, 2018 12:11 pm
Re: 22k hosts, 134k services - 45 mins to apply config?
8 CPU cores and 16GB RAM at this time - though it's a VM so we can easily flex this is required.
What is odd is that we're seeing no real swamping of CPU, RAM or I/O - on either the XI or the database server. It just seems like a PHP thread is locked up for about 45 minutes and the GUI is unresponsive during this time - however the Nagios engine continues in the background (save the restart naturally) and checks continue as normal.
We tried upgrading to 5.8.0 as it has been released, but no change in behaviour has been seen.
What is odd is that we're seeing no real swamping of CPU, RAM or I/O - on either the XI or the database server. It just seems like a PHP thread is locked up for about 45 minutes and the GUI is unresponsive during this time - however the Nagios engine continues in the background (save the restart naturally) and checks continue as normal.
We tried upgrading to 5.8.0 as it has been released, but no change in behaviour has been seen.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: 22k hosts, 134k services - 45 mins to apply config?
Hi Robin,
Typically, our recommendation is to add another XI instance at around 20,000k hosts and services, especially if you are still planning to add new devices to this server. It looks like the system does not have enough resources to handle this large of a check load (total # of hosts and services).
All the objects in the nagiosql are written out as configuration files that need to be verified and this likely the reason why it's taking a long time. This is an I/O intensive operation. However, for now, I would add more CPU and RAM. Once you have added more RAM, try increasing the resources available to PHP (beyond what may have been done so far).
Nagios XI - Optimizing The PHP Settings File
Also, do you know which databases have been offloaded to the remote server? If the nagiosql has been offloaded to the remote server, you may want to move that back locally.
Please PM the system profile, I can review the logs for any errors that could be impacting the server.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Best Regards,
Benjamin
Typically, our recommendation is to add another XI instance at around 20,000k hosts and services, especially if you are still planning to add new devices to this server. It looks like the system does not have enough resources to handle this large of a check load (total # of hosts and services).
All the objects in the nagiosql are written out as configuration files that need to be verified and this likely the reason why it's taking a long time. This is an I/O intensive operation. However, for now, I would add more CPU and RAM. Once you have added more RAM, try increasing the resources available to PHP (beyond what may have been done so far).
Nagios XI - Optimizing The PHP Settings File
Also, do you know which databases have been offloaded to the remote server? If the nagiosql has been offloaded to the remote server, you may want to move that back locally.
Please PM the system profile, I can review the logs for any errors that could be impacting the server.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Best Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: 22k hosts, 134k services - 45 mins to apply config?
Hi,
We have changed the following default settings in php.ini
It now takes +/- 35min io 45min before start seeings host & service check resulst in the GUI.
nagios, nagiosql & nagiosxi databases were offloaded to a separate DB server.
We will try to move the nagiosql db back to the nagiosxi server.
I will send you the profile in a pm
Regards
Dennis
We have changed the following default settings in php.ini
Code: Select all
[root@myhost ~]# cat /etc/php.ini | grep max_input_vars
#max_input_vars = 5000
max_input_vars = 50000
[root@myhost ~]# cat /etc/php.ini | grep memory_limit
#memory_limit = 256M
memory_limit = 4096M
[root@myhost ~]# cat /etc/php.ini | grep max_execution_time
#max_execution_time = 60
max_execution_time = 120
[root@myhost ~]# cat /etc/php.ini | grep max_input_time
; max_input_time
#max_input_time = 120
nagios, nagiosql & nagiosxi databases were offloaded to a separate DB server.
We will try to move the nagiosql db back to the nagiosxi server.
I will send you the profile in a pm
Regards
Dennis
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: 22k hosts, 134k services - 45 mins to apply config?
Hi Dennis,
Sounds good. Please update the thread once you PM the profile to bring it up in the support queue.I will send you the profile in a pm
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: 22k hosts, 134k services - 45 mins to apply config?
Hi,
I have sent you the profile yesterday.
I have sent you the profile yesterday.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: 22k hosts, 134k services - 45 mins to apply config?
Hi Dennis,
Unfortunately, I am not seeing a profile in my inbox from your account, can you re-send this once more?
Thanks,
Benjamin
Unfortunately, I am not seeing a profile in my inbox from your account, can you re-send this once more?
Thanks,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: 22k hosts, 134k services - 45 mins to apply config?
Hi Benjamin,
I've tried to send it again.
Pls confirm if you hve received it.
Thanks,
Dennis
I've tried to send it again.
Pls confirm if you hve received it.
Thanks,
Dennis
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: 22k hosts, 134k services - 45 mins to apply config?
Hi Dennis,
Got the profile, thanks for re-sending this. The check latencies are high but you have the check interval set high which is very good (any increase you can make here will really help). Overall, the CPU load isn't that high considering the size of this system.
We did make a few critical performance improvements to ndo3, and I would recommend bringing this system up to 5.8.1 since it's much better than 5.7.5. This will help reduce the start and start times during re-starting Core during apply configuration. From the changelog:
Not sure if you have done this yet, but increasing the dashlet refresher rate and disabling the subsystem logging will help as well. This is mentioned on pages 5-7 of the following guide.
Maximizing Performance In Nagios XI
Let me know if the times down after upgrading and adjusting the performance settings in the Admin area.
Regards
Benjamin
Got the profile, thanks for re-sending this. The check latencies are high but you have the check interval set high which is very good (any increase you can make here will really help). Overall, the CPU load isn't that high considering the size of this system.
We did make a few critical performance improvements to ndo3, and I would recommend bringing this system up to 5.8.1 since it's much better than 5.7.5. This will help reduce the start and start times during re-starting Core during apply configuration. From the changelog:
Be sure to take a backup or VM snapshot before running the upgrade.NDO - 3.0.5
Drastically reduced startup time for some systems
Fixed occasional long shutdown times in Nagios Core
Fixed segmentation faults related to severed MySQL connections
Fixed issue with service display_name being set to the service description
Not sure if you have done this yet, but increasing the dashlet refresher rate and disabling the subsystem logging will help as well. This is mentioned on pages 5-7 of the following guide.
Maximizing Performance In Nagios XI
Let me know if the times down after upgrading and adjusting the performance settings in the Admin area.
Regards
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!