NagiosXI slow/unresponsive when having over 100 alarms
Posted: Wed May 20, 2020 8:13 am
Dears,
At present we are running NagiosXI v5.6.7 on a CentOS 6.10 system (virtual). We have 15123 service and 530 hosts, based on Mysql database which is not offloaded (i.e. in the same server).
All the services depend on ping/nrpe/ssh commands.
Now when we have a situation that a server is not reachable or for example nrpe is stopped for a reason, NagiosXI, of course will contain many alarms.
When this is the case Mysqld CPU shoots up and NagiosXI becomes very slow and even unresponsive at times.
We would like to understand why this happens, as in having too alarms makes the database processing working much more to insert the events?
Or something else.
Also, I would like to take the opportunity to give us recommendations on what we can do in such situations?
Service dependency?
MySQL offload?
Other things which we might set to help minimize such a situation?
Reason being that in such a case, especially if we have a serious outage that causes a number of alarms the monitoring system will be quite useless due to its behavior thus only behaving fine when we don't really need it.
Rgds,
Matthew
At present we are running NagiosXI v5.6.7 on a CentOS 6.10 system (virtual). We have 15123 service and 530 hosts, based on Mysql database which is not offloaded (i.e. in the same server).
All the services depend on ping/nrpe/ssh commands.
Now when we have a situation that a server is not reachable or for example nrpe is stopped for a reason, NagiosXI, of course will contain many alarms.
When this is the case Mysqld CPU shoots up and NagiosXI becomes very slow and even unresponsive at times.
We would like to understand why this happens, as in having too alarms makes the database processing working much more to insert the events?
Or something else.
Also, I would like to take the opportunity to give us recommendations on what we can do in such situations?
Service dependency?
MySQL offload?
Other things which we might set to help minimize such a situation?
Reason being that in such a case, especially if we have a serious outage that causes a number of alarms the monitoring system will be quite useless due to its behavior thus only behaving fine when we don't really need it.
Rgds,
Matthew