Nagios XI VM locks up
Posted: Fri Mar 17, 2017 11:11 am
When I change too much to the Nagios XI VM it seems to lock up (i.e. the processor is maxed out and it becomes unresponsive); does anyone know how to resolve this...?
I have taken a fresh version of the Nagios XI VM installed a few extra dependencies for extra processes that we need to run (e.g. log collecting) and then done the following:
- Added in some new configurations in the static config directory (about 15 hosts and 1000+ services)
- Changed the IP addresses of the network interfaces
Almost immediately after restarting the Nagios XI services (entering the command "service nagios restart" on the command line) the system went to 100% processor usage (the idle time is at 0%). Executing "top" on the command line I've observed the following:
- Lots of "php" processes running;
- A few times there have been 3-4 "nagios" processes taking 15-30% (essentially taking the majority or all of the processor between them);
- A few times the whole screen was filled with processes for check commands executing (most of our check commands require network access FYI and the VM is not on a network where is has access to other hosts).
When I restart the machine this doesn't fix it. The "mysqld" takes a long time to shutdown and often seems to fail if that provides any extra information.
If I had to guess I'd say Nagios is seeing a new config, scheduling all the checks at once (though it shouldn't as none of the hosts are present) and then getting stuck trying to process it all at once (with possibly access to the database being a particular bottleneck). But this is a guess based on what I've observed.
I haven't yet tweaked any of the settings affecting when Nagios schedules work; from my understanding I'm shouldn't yet be hitting the system hard enough to need it yet...
Thanks,
I have seen this in the past and sometimes waiting allows it to settle, but no always. At the moment I've been waiting nearly an hour and it still hasn't settled.
Has anyone else seen this? Do you know what I can do to stop it?
I have taken a fresh version of the Nagios XI VM installed a few extra dependencies for extra processes that we need to run (e.g. log collecting) and then done the following:
- Added in some new configurations in the static config directory (about 15 hosts and 1000+ services)
- Changed the IP addresses of the network interfaces
Almost immediately after restarting the Nagios XI services (entering the command "service nagios restart" on the command line) the system went to 100% processor usage (the idle time is at 0%). Executing "top" on the command line I've observed the following:
- Lots of "php" processes running;
- A few times there have been 3-4 "nagios" processes taking 15-30% (essentially taking the majority or all of the processor between them);
- A few times the whole screen was filled with processes for check commands executing (most of our check commands require network access FYI and the VM is not on a network where is has access to other hosts).
When I restart the machine this doesn't fix it. The "mysqld" takes a long time to shutdown and often seems to fail if that provides any extra information.
If I had to guess I'd say Nagios is seeing a new config, scheduling all the checks at once (though it shouldn't as none of the hosts are present) and then getting stuck trying to process it all at once (with possibly access to the database being a particular bottleneck). But this is a guess based on what I've observed.
I haven't yet tweaked any of the settings affecting when Nagios schedules work; from my understanding I'm shouldn't yet be hitting the system hard enough to need it yet...
Thanks,
I have seen this in the past and sometimes waiting allows it to settle, but no always. At the moment I've been waiting nearly an hour and it still hasn't settled.
Has anyone else seen this? Do you know what I can do to stop it?