Fusion performance
-
crrussell3
- Posts: 31
- Joined: Tue Oct 10, 2017 9:09 am
Fusion performance
We are working on a huge deployment of Nagios and have started rolling things into Nagios Fusion to provide a single pane of glass. In total, we should have the following statistics:
Nagios Core Hosts: ~300
- # of checks = >30,000
Nagios XI Hosts = 3-4
- # of checks = > 10,000
Nagios Fusion Hosts = 1
Right now we only have ~10,000 checks in Nagios Fusion with around 200 Nagios Core servers checking in. We are seeing performance issues with navigating/refreshing/reloading the pages.
Nagios Fusion Settings
Polling Subsystem Memory Limit: -1
Simultaneous Pollers: 250
Live Data Timeout: 45 seconds
Polling Lock Max Age: 1200 seconds
Poll Record Count: 1000 Records
Global Authentication Interval: 300 seconds
Global Polling Interval: 300 seconds
All servers are using the default 300 seconds
The VM has 8x vcpu and 36gb memory. Average cpu utilization is around 50%. Still have >6gb memory available.
Is there anything we can do to improve the page response for the Fusion server?
Nagios Core Hosts: ~300
- # of checks = >30,000
Nagios XI Hosts = 3-4
- # of checks = > 10,000
Nagios Fusion Hosts = 1
Right now we only have ~10,000 checks in Nagios Fusion with around 200 Nagios Core servers checking in. We are seeing performance issues with navigating/refreshing/reloading the pages.
Nagios Fusion Settings
Polling Subsystem Memory Limit: -1
Simultaneous Pollers: 250
Live Data Timeout: 45 seconds
Polling Lock Max Age: 1200 seconds
Poll Record Count: 1000 Records
Global Authentication Interval: 300 seconds
Global Polling Interval: 300 seconds
All servers are using the default 300 seconds
The VM has 8x vcpu and 36gb memory. Average cpu utilization is around 50%. Still have >6gb memory available.
Is there anything we can do to improve the page response for the Fusion server?
Re: Fusion performance
There are some settings you can change in the /etc/php.ini file to help out on improving the response if the web interface.
Edit the /etc/php.ini file and change the following from
to
add this to the bottom of that file of if it exists, change it to 5000
Save the file and restart apache by running
Then, I would increase this setting in the Fusion GUI to see if increasing the interval, gives more time to retrieve the data from all of the servers so it will can display all of the checks.
Global Polling Interval: 300 seconds
Try adding 30 seconds at a time to see if it helps out.
Edit the /etc/php.ini file and change the following from
Code: Select all
max_execution_time = 30
max_input_time = 60
memory_limit = 128MCode: Select all
max_execution_time = 120
max_input_time = 240
memory_limit = 1024MCode: Select all
max_input_vars=5000Code: Select all
service httpd restartGlobal Polling Interval: 300 seconds
Try adding 30 seconds at a time to see if it helps out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
crrussell3
- Posts: 31
- Joined: Tue Oct 10, 2017 9:09 am
Re: Fusion performance
Thanks. I will get these implemented tomorrow and see what kind of performance increase this gives us.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Fusion performance
In addition to the server improvements mentioned above, on the desktop side, if you've got the available RAM, you could create a tmpfs (Windows has it too, but not sure what they call it) and put your browser cache in that.
-
crrussell3
- Posts: 31
- Joined: Tue Oct 10, 2017 9:09 am
Re: Fusion performance
Changed the php.ini settings are listed last night. No performance change.
I am starting to change the polling interval and letting that marinate for 5-10 minutes before changing again. The problem with this, is they would rather have it go lower instead of higher, as we are wanting to treat Nagios Fusion as a Noc display board so to speak.
Our Systems Architect did go into the Fusion box and increase mysql threads/connections upto 1,000. He can see that particular process getting hammered whenever the system is slow. CPU goes from 2% to 200%, even though we still have 70% system idle cpu.
Any other thoughts?
I am starting to change the polling interval and letting that marinate for 5-10 minutes before changing again. The problem with this, is they would rather have it go lower instead of higher, as we are wanting to treat Nagios Fusion as a Noc display board so to speak.
Our Systems Architect did go into the Fusion box and increase mysql threads/connections upto 1,000. He can see that particular process getting hammered whenever the system is slow. CPU goes from 2% to 200%, even though we still have 70% system idle cpu.
Any other thoughts?
-
crrussell3
- Posts: 31
- Joined: Tue Oct 10, 2017 9:09 am
Re: Fusion performance
FYI - We just lost all of our Fused servers to an issue we had before where the password isn't getting decrypted properly. I am going to save the server vmdx files and restore from last night and hopefully we don't have a repeat of the issue.
-
crrussell3
- Posts: 31
- Joined: Tue Oct 10, 2017 9:09 am
Re: Fusion performance
We have been discussing internally this morning, and wanting to know if we are pushing a single instance of Fusion too far with what we are throwing at it?
As I mentioned before:
- ~300x core servers
- Each core having 60-70 checks to start. This can possibly go 3x that amount once finished.
- Fusion polls every 5 minutes (prefer less)
- 2-3 XI servers, 10k+ checks once finished.
Is there anything more we can do to improve mysql performance? Can we separate it out to its own server and be supported?
As I mentioned before:
- ~300x core servers
- Each core having 60-70 checks to start. This can possibly go 3x that amount once finished.
- Fusion polls every 5 minutes (prefer less)
- 2-3 XI servers, 10k+ checks once finished.
Is there anything more we can do to improve mysql performance? Can we separate it out to its own server and be supported?
-
crrussell3
- Posts: 31
- Joined: Tue Oct 10, 2017 9:09 am
Re: Fusion performance
We have successfully restored the server to last night and everything seems good.
We have also dug around and changed the following:
After a restart of the box or mysql the pages are slow to load, but are then cached and loading is within < 3 seconds.
We have also dug around and changed the following:
Code: Select all
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# SPEED ROUND - WMCLEAN
max_connection = 2000
skip-name-resolve
slow-query-log = 1
slow-query-log-file = /var/lib/mysql/mysql-slow.log
long_query_time = 1
wait_timeout=60
# Added by Nagios 2017-10-05 18:10:50
max_allowed_packet = 32M
# Added by Nagios 2017-10-05 18:10:50 was 16M
query_cache_size = 128M
# Added by Nagios 2017-10-05 18:10:50 was 8M
query_cache_limit = 64M
# Added by Nagios 2017-10-05 18:10:50 was 64M
tmp_table_size = 1024M
# Added by Nagios 2017-10-05 18:10:50 was 64M
max_heap_table_size = 1024M
# Added by Nagios 2017-10-05 18:10:50 was 32M
key_buffer_size = 512M
# Added by Nagios 2017-10-05 18:10:50 was 32
table_open_cache = 64
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Fusion performance
I spoke to the head Fusion dev and he thinks with those numbers you probably need a polling interval of 20 minutes. I suppose you can extrapolate from there how many fusion servers you might need.
Also, make sure to update the default polled data retention in constants.inc.php to 0.1 (this will keep your disk from filling up).
The above said, what version of MySQL are you running and on what OS?
If you are on CentOS/RHEL 6 I would suggest moving to 7. This is a big move, but mariadb is known to have better performance than the community edition of mysql. I haven't looked at bench-marking for mariadb versus Oracles proprietary MySQL enhancements.
If you are on MySQL 5.1, moving to a later version of MySQL should get you a performance boost. We don't have official, supported instructions for the move. I have recently tested moving from MySQL to 5.1 to 5.5 on XI. I can recreate this test with Fusion if this is something that would be of interest to you.
Did you look into the browser cache suggestion?
Also, make sure to update the default polled data retention in constants.inc.php to 0.1 (this will keep your disk from filling up).
The above said, what version of MySQL are you running and on what OS?
If you are on CentOS/RHEL 6 I would suggest moving to 7. This is a big move, but mariadb is known to have better performance than the community edition of mysql. I haven't looked at bench-marking for mariadb versus Oracles proprietary MySQL enhancements.
If you are on MySQL 5.1, moving to a later version of MySQL should get you a performance boost. We don't have official, supported instructions for the move. I have recently tested moving from MySQL to 5.1 to 5.5 on XI. I can recreate this test with Fusion if this is something that would be of interest to you.
Did you look into the browser cache suggestion?
-
crrussell3
- Posts: 31
- Joined: Tue Oct 10, 2017 9:09 am
Re: Fusion performance
With the changes made to the mysql conf file I replied before, we are running faster, but still see slowness. Typically once you are at the Dashboard, you are fine.
We will have to look into building a fresh Fusion box, as we deployed via the OVA. So whatever the OVA has configured is what we are running.
As for the browser cache, Windows ramdrive, we haven't looked into it. Problem is, it would need to be configured on ~40-50 profiles.
We will have to look into building a fresh Fusion box, as we deployed via the OVA. So whatever the OVA has configured is what we are running.
As for the browser cache, Windows ramdrive, we haven't looked into it. Problem is, it would need to be configured on ~40-50 profiles.