22k hosts, 134k services - 45 mins to apply config?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
robinjporter
Posts: 12
Joined: Tue Jan 16, 2018 12:11 pm

22k hosts, 134k services - 45 mins to apply config?

Post by robinjporter »

Hi there

We have a client running Nagios XI who have a large install - 22,000 hosts and 134,000 service definitions. When they make a change and apply config, the GUI becomes unresponsive for around 45 minutes. We have made a large number of performance tweaks including offloading the database to a separate server, and implementing a RAMdisk.

Is this kind of delay normal at these numbers, and if not (it does see, excessive), where do we begin troubleshooting it? I will be able to get hands on with the server tomorrow so any pointers would be appreciated.

Many thanks

Robin
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by BanditBBS »

How many CPU Cores, how much Ram in the host?


I'm at 1700 hosts and 40,000 services and I thought I was large! It takes maybe a minute for GUI to respond and about 5 minutes for the GUI to catch up to present time when I do an apply.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
robinjporter
Posts: 12
Joined: Tue Jan 16, 2018 12:11 pm

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by robinjporter »

8 CPU cores and 16GB RAM at this time - though it's a VM so we can easily flex this is required.

What is odd is that we're seeing no real swamping of CPU, RAM or I/O - on either the XI or the database server. It just seems like a PHP thread is locked up for about 45 minutes and the GUI is unresponsive during this time - however the Nagios engine continues in the background (save the restart naturally) and checks continue as normal.

We tried upgrading to 5.8.0 as it has been released, but no change in behaviour has been seen.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by benjaminsmith »

Hi Robin,

Typically, our recommendation is to add another XI instance at around 20,000k hosts and services, especially if you are still planning to add new devices to this server. It looks like the system does not have enough resources to handle this large of a check load (total # of hosts and services).

All the objects in the nagiosql are written out as configuration files that need to be verified and this likely the reason why it's taking a long time. This is an I/O intensive operation. However, for now, I would add more CPU and RAM. Once you have added more RAM, try increasing the resources available to PHP (beyond what may have been done so far).

Nagios XI - Optimizing The PHP Settings File

Also, do you know which databases have been offloaded to the remote server? If the nagiosql has been offloaded to the remote server, you may want to move that back locally.

Please PM the system profile, I can review the logs for any errors that could be impacting the server.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Best Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
DennisPR
Posts: 149
Joined: Mon May 07, 2012 10:34 am

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by DennisPR »

Hi,

We have changed the following default settings in php.ini

Code: Select all

[root@myhost ~]# cat /etc/php.ini | grep max_input_vars
#max_input_vars = 5000
max_input_vars = 50000
[root@myhost  ~]# cat /etc/php.ini | grep memory_limit
#memory_limit = 256M
memory_limit = 4096M
[root@myhost  ~]# cat /etc/php.ini | grep max_execution_time
#max_execution_time = 60
max_execution_time = 120
[root@myhost ~]# cat /etc/php.ini | grep max_input_time
; max_input_time
#max_input_time = 120
It now takes +/- 35min io 45min before start seeings host & service check resulst in the GUI.
nagios, nagiosql & nagiosxi databases were offloaded to a separate DB server.
We will try to move the nagiosql db back to the nagiosxi server.
I will send you the profile in a pm

Regards

Dennis
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by benjaminsmith »

Hi Dennis,
I will send you the profile in a pm
Sounds good. Please update the thread once you PM the profile to bring it up in the support queue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
DennisPR
Posts: 149
Joined: Mon May 07, 2012 10:34 am

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by DennisPR »

Hi,

I have sent you the profile yesterday.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by benjaminsmith »

Hi Dennis,

Unfortunately, I am not seeing a profile in my inbox from your account, can you re-send this once more?

Thanks,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
DennisPR
Posts: 149
Joined: Mon May 07, 2012 10:34 am

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by DennisPR »

Hi Benjamin,

I've tried to send it again.
Pls confirm if you hve received it.

Thanks,

Dennis
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by benjaminsmith »

Hi Dennis,

Got the profile, thanks for re-sending this. The check latencies are high but you have the check interval set high which is very good (any increase you can make here will really help). Overall, the CPU load isn't that high considering the size of this system.

We did make a few critical performance improvements to ndo3, and I would recommend bringing this system up to 5.8.1 since it's much better than 5.7.5. This will help reduce the start and start times during re-starting Core during apply configuration. From the changelog:
NDO - 3.0.5
Drastically reduced startup time for some systems
Fixed occasional long shutdown times in Nagios Core
Fixed segmentation faults related to severed MySQL connections
Fixed issue with service display_name being set to the service description
Be sure to take a backup or VM snapshot before running the upgrade.

Not sure if you have done this yet, but increasing the dashlet refresher rate and disabling the subsystem logging will help as well. This is mentioned on pages 5-7 of the following guide.

Maximizing Performance In Nagios XI

Let me know if the times down after upgrading and adjusting the performance settings in the Admin area.

Regards
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked