Fusion performance

This support forum board is for questions relating to Nagios Fusion.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Fusion performance

Post by crrussell3 »

We are working on a huge deployment of Nagios and have started rolling things into Nagios Fusion to provide a single pane of glass. In total, we should have the following statistics:

Nagios Core Hosts: ~300
- # of checks = >30,000
Nagios XI Hosts = 3-4
- # of checks = > 10,000
Nagios Fusion Hosts = 1

Right now we only have ~10,000 checks in Nagios Fusion with around 200 Nagios Core servers checking in. We are seeing performance issues with navigating/refreshing/reloading the pages.

Nagios Fusion Settings
Polling Subsystem Memory Limit: -1
Simultaneous Pollers: 250
Live Data Timeout: 45 seconds
Polling Lock Max Age: 1200 seconds
Poll Record Count: 1000 Records

Global Authentication Interval: 300 seconds
Global Polling Interval: 300 seconds
All servers are using the default 300 seconds

The VM has 8x vcpu and 36gb memory. Average cpu utilization is around 50%. Still have >6gb memory available.

Is there anything we can do to improve the page response for the Fusion server?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Fusion performance

Post by tgriep »

There are some settings you can change in the /etc/php.ini file to help out on improving the response if the web interface.

Edit the /etc/php.ini file and change the following from

Code: Select all

max_execution_time = 30
max_input_time = 60
memory_limit = 128M
to

Code: Select all

max_execution_time = 120
max_input_time = 240
memory_limit = 1024M
add this to the bottom of that file of if it exists, change it to 5000

Code: Select all

max_input_vars=5000
Save the file and restart apache by running

Code: Select all

service httpd restart
Then, I would increase this setting in the Fusion GUI to see if increasing the interval, gives more time to retrieve the data from all of the servers so it will can display all of the checks.
Global Polling Interval: 300 seconds

Try adding 30 seconds at a time to see if it helps out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: Fusion performance

Post by crrussell3 »

Thanks. I will get these implemented tomorrow and see what kind of performance increase this gives us.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Fusion performance

Post by dwhitfield »

In addition to the server improvements mentioned above, on the desktop side, if you've got the available RAM, you could create a tmpfs (Windows has it too, but not sure what they call it) and put your browser cache in that.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: Fusion performance

Post by crrussell3 »

Changed the php.ini settings are listed last night. No performance change.

I am starting to change the polling interval and letting that marinate for 5-10 minutes before changing again. The problem with this, is they would rather have it go lower instead of higher, as we are wanting to treat Nagios Fusion as a Noc display board so to speak.

Our Systems Architect did go into the Fusion box and increase mysql threads/connections upto 1,000. He can see that particular process getting hammered whenever the system is slow. CPU goes from 2% to 200%, even though we still have 70% system idle cpu.

Any other thoughts?
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: Fusion performance

Post by crrussell3 »

FYI - We just lost all of our Fused servers to an issue we had before where the password isn't getting decrypted properly. I am going to save the server vmdx files and restore from last night and hopefully we don't have a repeat of the issue.
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: Fusion performance

Post by crrussell3 »

We have been discussing internally this morning, and wanting to know if we are pushing a single instance of Fusion too far with what we are throwing at it?

As I mentioned before:
- ~300x core servers
- Each core having 60-70 checks to start. This can possibly go 3x that amount once finished.
- Fusion polls every 5 minutes (prefer less)
- 2-3 XI servers, 10k+ checks once finished.

Is there anything more we can do to improve mysql performance? Can we separate it out to its own server and be supported?
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: Fusion performance

Post by crrussell3 »

We have successfully restored the server to last night and everything seems good.

We have also dug around and changed the following:

Code: Select all

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# SPEED ROUND - WMCLEAN
max_connection = 2000
skip-name-resolve
slow-query-log = 1
slow-query-log-file = /var/lib/mysql/mysql-slow.log
long_query_time = 1
wait_timeout=60

# Added by Nagios 2017-10-05 18:10:50
max_allowed_packet = 32M

# Added by Nagios 2017-10-05 18:10:50 was 16M
query_cache_size = 128M

# Added by Nagios 2017-10-05 18:10:50 was 8M
query_cache_limit = 64M

# Added by Nagios 2017-10-05 18:10:50 was 64M
tmp_table_size = 1024M

# Added by Nagios 2017-10-05 18:10:50 was 64M
max_heap_table_size = 1024M

# Added by Nagios 2017-10-05 18:10:50 was 32M
key_buffer_size = 512M

# Added by Nagios 2017-10-05 18:10:50 was 32
table_open_cache = 64

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
After a restart of the box or mysql the pages are slow to load, but are then cached and loading is within < 3 seconds.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Fusion performance

Post by dwhitfield »

I spoke to the head Fusion dev and he thinks with those numbers you probably need a polling interval of 20 minutes. I suppose you can extrapolate from there how many fusion servers you might need.

Also, make sure to update the default polled data retention in constants.inc.php to 0.1 (this will keep your disk from filling up).

The above said, what version of MySQL are you running and on what OS?

If you are on CentOS/RHEL 6 I would suggest moving to 7. This is a big move, but mariadb is known to have better performance than the community edition of mysql. I haven't looked at bench-marking for mariadb versus Oracles proprietary MySQL enhancements.

If you are on MySQL 5.1, moving to a later version of MySQL should get you a performance boost. We don't have official, supported instructions for the move. I have recently tested moving from MySQL to 5.1 to 5.5 on XI. I can recreate this test with Fusion if this is something that would be of interest to you.


Did you look into the browser cache suggestion?
crrussell3
Posts: 31
Joined: Tue Oct 10, 2017 9:09 am

Re: Fusion performance

Post by crrussell3 »

With the changes made to the mysql conf file I replied before, we are running faster, but still see slowness. Typically once you are at the Dashboard, you are fine.

We will have to look into building a fresh Fusion box, as we deployed via the OVA. So whatever the OVA has configured is what we are running.

As for the browser cache, Windows ramdrive, we haven't looked into it. Problem is, it would need to be configured on ~40-50 profiles.
Locked