Page 1 of 2

22k hosts, 134k services - 45 mins to apply config?

Posted: Wed Jan 13, 2021 10:31 am
by robinjporter
Hi there

We have a client running Nagios XI who have a large install - 22,000 hosts and 134,000 service definitions. When they make a change and apply config, the GUI becomes unresponsive for around 45 minutes. We have made a large number of performance tweaks including offloading the database to a separate server, and implementing a RAMdisk.

Is this kind of delay normal at these numbers, and if not (it does see, excessive), where do we begin troubleshooting it? I will be able to get hands on with the server tomorrow so any pointers would be appreciated.

Many thanks

Robin

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Wed Jan 13, 2021 4:19 pm
by BanditBBS
How many CPU Cores, how much Ram in the host?


I'm at 1700 hosts and 40,000 services and I thought I was large! It takes maybe a minute for GUI to respond and about 5 minutes for the GUI to catch up to present time when I do an apply.

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Thu Jan 14, 2021 8:01 am
by robinjporter
8 CPU cores and 16GB RAM at this time - though it's a VM so we can easily flex this is required.

What is odd is that we're seeing no real swamping of CPU, RAM or I/O - on either the XI or the database server. It just seems like a PHP thread is locked up for about 45 minutes and the GUI is unresponsive during this time - however the Nagios engine continues in the background (save the restart naturally) and checks continue as normal.

We tried upgrading to 5.8.0 as it has been released, but no change in behaviour has been seen.

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Thu Jan 14, 2021 3:25 pm
by benjaminsmith
Hi Robin,

Typically, our recommendation is to add another XI instance at around 20,000k hosts and services, especially if you are still planning to add new devices to this server. It looks like the system does not have enough resources to handle this large of a check load (total # of hosts and services).

All the objects in the nagiosql are written out as configuration files that need to be verified and this likely the reason why it's taking a long time. This is an I/O intensive operation. However, for now, I would add more CPU and RAM. Once you have added more RAM, try increasing the resources available to PHP (beyond what may have been done so far).

Nagios XI - Optimizing The PHP Settings File

Also, do you know which databases have been offloaded to the remote server? If the nagiosql has been offloaded to the remote server, you may want to move that back locally.

Please PM the system profile, I can review the logs for any errors that could be impacting the server.

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Best Regards,
Benjamin

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Thu Jan 21, 2021 9:34 am
by DennisPR
Hi,

We have changed the following default settings in php.ini

Code: Select all

[root@myhost ~]# cat /etc/php.ini | grep max_input_vars
#max_input_vars = 5000
max_input_vars = 50000
[root@myhost  ~]# cat /etc/php.ini | grep memory_limit
#memory_limit = 256M
memory_limit = 4096M
[root@myhost  ~]# cat /etc/php.ini | grep max_execution_time
#max_execution_time = 60
max_execution_time = 120
[root@myhost ~]# cat /etc/php.ini | grep max_input_time
; max_input_time
#max_input_time = 120
It now takes +/- 35min io 45min before start seeings host & service check resulst in the GUI.
nagios, nagiosql & nagiosxi databases were offloaded to a separate DB server.
We will try to move the nagiosql db back to the nagiosxi server.
I will send you the profile in a pm

Regards

Dennis

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Thu Jan 21, 2021 5:34 pm
by benjaminsmith
Hi Dennis,
I will send you the profile in a pm
Sounds good. Please update the thread once you PM the profile to bring it up in the support queue.

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Fri Jan 22, 2021 2:52 am
by DennisPR
Hi,

I have sent you the profile yesterday.

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Fri Jan 22, 2021 2:00 pm
by benjaminsmith
Hi Dennis,

Unfortunately, I am not seeing a profile in my inbox from your account, can you re-send this once more?

Thanks,
Benjamin

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Mon Jan 25, 2021 4:05 am
by DennisPR
Hi Benjamin,

I've tried to send it again.
Pls confirm if you hve received it.

Thanks,

Dennis

Re: 22k hosts, 134k services - 45 mins to apply config?

Posted: Tue Jan 26, 2021 9:44 am
by benjaminsmith
Hi Dennis,

Got the profile, thanks for re-sending this. The check latencies are high but you have the check interval set high which is very good (any increase you can make here will really help). Overall, the CPU load isn't that high considering the size of this system.

We did make a few critical performance improvements to ndo3, and I would recommend bringing this system up to 5.8.1 since it's much better than 5.7.5. This will help reduce the start and start times during re-starting Core during apply configuration. From the changelog:
NDO - 3.0.5
Drastically reduced startup time for some systems
Fixed occasional long shutdown times in Nagios Core
Fixed segmentation faults related to severed MySQL connections
Fixed issue with service display_name being set to the service description
Be sure to take a backup or VM snapshot before running the upgrade.

Not sure if you have done this yet, but increasing the dashlet refresher rate and disabling the subsystem logging will help as well. This is mentioned on pages 5-7 of the following guide.

Maximizing Performance In Nagios XI

Let me know if the times down after upgrading and adjusting the performance settings in the Admin area.

Regards
Benjamin