Dears,
At present we are running NagiosXI v5.6.7 on a CentOS 6.10 system (virtual). We have 15123 service and 530 hosts, based on Mysql database which is not offloaded (i.e. in the same server).
All the services depend on ping/nrpe/ssh commands.
Now when we have a situation that a server is not reachable or for example nrpe is stopped for a reason, NagiosXI, of course will contain many alarms.
When this is the case Mysqld CPU shoots up and NagiosXI becomes very slow and even unresponsive at times.
We would like to understand why this happens, as in having too alarms makes the database processing working much more to insert the events?
Or something else.
Also, I would like to take the opportunity to give us recommendations on what we can do in such situations?
Service dependency?
MySQL offload?
Other things which we might set to help minimize such a situation?
Reason being that in such a case, especially if we have a serious outage that causes a number of alarms the monitoring system will be quite useless due to its behavior thus only behaving fine when we don't really need it.
Rgds,
Matthew
NagiosXI slow/unresponsive when having over 100 alarms
Re: NagiosXI slow/unresponsive when having over 100 alarms
There are a lot of things that could cause the issue you are having and is we knew what application or daemon that gets loaded, it would help narrow down on what to do.
But, most of the time when this happens, it is because the PHP limits need to be increased as they maybe too small or that the Max Connections to the MYSQL database are exceeded and that needs to be increased as well.
Follow the instructions in these articles to increase the settings on the server.
https://support.nagios.com/kb/article/n ... e-611.html
https://support.nagios.com/kb/article/n ... s-513.html
Also, to increase the performance of the server, add a RAM Disk to the server if one has not been added already. See this link for instructions.
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Let us know if this helps the server perform better.
But, most of the time when this happens, it is because the PHP limits need to be increased as they maybe too small or that the Max Connections to the MYSQL database are exceeded and that needs to be increased as well.
Follow the instructions in these articles to increase the settings on the server.
https://support.nagios.com/kb/article/n ... e-611.html
https://support.nagios.com/kb/article/n ... s-513.html
Also, to increase the performance of the server, add a RAM Disk to the server if one has not been added already. See this link for instructions.
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Let us know if this helps the server perform better.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: NagiosXI slow/unresponsive when having over 100 alarms
Hi,
We have replicated the issue again by raising over 1k alarms. We noticed one interesting scenario:
Whilst logged in with the default administrator account (nagiosadmin), Nagios was still behaving quite normal (maybe some few seconds delay in opening up pages).
However, with a non-admin user, it was impossible to browse Nagios. For example, it would take up over 2 minutes just to load the Operations Centre.
Just to really see if this behavior concerned a non-admin user, this user was changed to "admin", were the problem did not exist anymore.
Switching it back to "user" it ended up with the same situation.
Now since we did the simulation we tried a number of things, in which however none worked.
We changed the PHP setting as per KB :
https://support.nagios.com/kb/article/n ... e-611.html
max_execution_time = 100
max_input_time = 100
memory_limit = 1024M
max_input_vars = 50000
As for RAMDISK and increasing the MySQL max connections, this was already done way before.
Note that user creation does not use LDAP/AD at all. When we create a user, we set it as Local(default) in the authorization settings.
Nagios version was 5.6.7. We had the same situation on the latest release, 5.6.14.
Attached please find the profile maybe it helps troubleshooting this strange scenario. This is a serious situation for us since our non-admin users are including our NOC team which are on a 24/7 following up services impacts, which is making it impossible for them when such situations occur.
We have replicated the issue again by raising over 1k alarms. We noticed one interesting scenario:
Whilst logged in with the default administrator account (nagiosadmin), Nagios was still behaving quite normal (maybe some few seconds delay in opening up pages).
However, with a non-admin user, it was impossible to browse Nagios. For example, it would take up over 2 minutes just to load the Operations Centre.
Just to really see if this behavior concerned a non-admin user, this user was changed to "admin", were the problem did not exist anymore.
Switching it back to "user" it ended up with the same situation.
Now since we did the simulation we tried a number of things, in which however none worked.
We changed the PHP setting as per KB :
https://support.nagios.com/kb/article/n ... e-611.html
max_execution_time = 100
max_input_time = 100
memory_limit = 1024M
max_input_vars = 50000
As for RAMDISK and increasing the MySQL max connections, this was already done way before.
Note that user creation does not use LDAP/AD at all. When we create a user, we set it as Local(default) in the authorization settings.
Nagios version was 5.6.7. We had the same situation on the latest release, 5.6.14.
Attached please find the profile maybe it helps troubleshooting this strange scenario. This is a serious situation for us since our non-admin users are including our NOC team which are on a 24/7 following up services impacts, which is making it impossible for them when such situations occur.
You do not have the required permissions to view the files attached to this post.
Re: NagiosXI slow/unresponsive when having over 100 alarms
On larger systems like yours, it does take longer to display the data for user accounts as it has to gather the information from the user account, compare them to the objects they have rights to, build that in to a table that is used to display the data in the GUI.
Admin accounts do not have to do that, they see all of the objects so no checking.
I would increase the memory_limit in php.ini to 2048M
While you are editing that file, look for this option
error_reporting
Add these to the end to decrease the logging of the NOC screen and that may help speed things up a bit.
Keep an eye in the I/O wait. In the profile it is at 2.2% but if it is larger for long periods of time, see if you can move the server to a faster disk subsystem.
Admin accounts do not have to do that, they see all of the objects so no checking.
I would increase the memory_limit in php.ini to 2048M
While you are editing that file, look for this option
error_reporting
Add these to the end to decrease the logging of the NOC screen and that may help speed things up a bit.
Code: Select all
& ~E_NOTICE & ~E_WARNINGBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: NagiosXI slow/unresponsive when having over 100 alarms
Hi,
Adding the caused PHP error for timezone ->
Is there a step to solve this? (i have reverted back this setting in the meantime)
Adding the
Code: Select all
& ~E_NOTICE & ~E_WARNINGCode: Select all
PHP Warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected....Re: NagiosXI slow/unresponsive when having over 100 alarms
Adding those options should not of affected the Time Zone settings in the php.ini file unless there was a typo.
Search the php.ini file for this option.
It should show the timezone the server is in, if not set it.
You can do this through the XI interface and this article shows you how to do that.
https://assets.nagios.com/downloads/nag ... m_Time.pdf
The following example is how to set the error_reporting option in the php.ini file.
Try setting it to the above and see if it works without creating the Warning for the date().
Search the php.ini file for this option.
Code: Select all
date.timezoneYou can do this through the XI interface and this article shows you how to do that.
https://assets.nagios.com/downloads/nag ... m_Time.pdf
The following example is how to set the error_reporting option in the php.ini file.
Code: Select all
error_reporting = E_ALL & ~E_DEPRECATED & ~E_NOTICE & ~E_WARNINGBe sure to check out our Knowledgebase for helpful articles and solutions!