Page 2 of 2

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Sun Dec 22, 2019 10:42 am
by aanderson
fleish wrote:FWIW - I experienced similar behavior when upgrading from 4.4.3 -> 4.4.5. Downgrading back to 4.4.3 fixed it before I found this thread: https://i.imgur.com/SOUtJmX.jpg
The graph looks pretty similar to mine, even the peaks and troughs. As long as you apply the fix of setting max_concurrent_checks to 15 or whatever value allows your checks to be spread evenly, you should be fine on 4.4.5. I've had no problems with spikes after doing that. The problem should only come back if you stop Nagios for more than 5 mins causing the checks to bunch up again.

regards,
Aidan

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Mon Dec 23, 2019 8:03 am
by scottwilkerson
I see you have livestatus enabled, I'm not sure if it could be causing any issue, but would it be possible to disable the livestatus module in the nagios.cfg to see if the problem persists?

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Tue Dec 24, 2019 10:26 am
by aanderson
scottwilkerson wrote:I see you have livestatus enabled, I'm not sure if it could be causing any issue, but would it be possible to disable the livestatus module in the nagios.cfg to see if the problem persists?
I disabled 'livestatus' and tested again. Same issue, 80% of checks rescheduled to run at the same time and then spaced over 8 seconds after that. Very high load recorded as usual.

regards,
Aidan

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Thu Dec 26, 2019 7:26 am
by scottwilkerson
Can I have you try setting the following in the nagios.cfg

Code: Select all

auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Sun Dec 29, 2019 8:58 am
by aanderson
scottwilkerson wrote:Can I have you try setting the following in the nagios.cfg

Code: Select all

auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
I'm on the move at the moment due to the holidays but will get this tested tomorrow evening and let you know.

regards,
Aidan

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Sun Dec 29, 2019 11:49 am
by scottwilkerson
sounds good

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Fri Jan 03, 2020 12:04 pm
by aanderson
aanderson wrote:
scottwilkerson wrote:Can I have you try setting the following in the nagios.cfg

Code: Select all

auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
I'm on the move at the moment due to the holidays but will get this tested tomorrow evening and let you know.

regards,
Aidan
I've finally got round to testing the auto rescheduling options. I set them as per above and this has resolved the issue. I tested as before and stopped Nagios for over 5 minutes to let all the checks bunch up. After starting Nagios it was showing the usual 80% of checks scheduled to run in the same second. However, about 30-40 seconds before they were due to run, the auto rescheduling kicked in and spread them out evenly over the next 5 minutes avoiding the huge CPU spike.

I have left Nagios running with the auto rescheduling options in place and will let you know if I notice any performance hit. Host and service check latency is low so it looks like it is working fine.

regards,
Aidan

Re: High CPU Load after upgrading from 4.3.4 to 4.4.5

Posted: Fri Jan 03, 2020 12:43 pm
by scottwilkerson
aanderson wrote:I've finally got round to testing the auto rescheduling options. I set them as per above and this has resolved the issue. I tested as before and stopped Nagios for over 5 minutes to let all the checks bunch up. After starting Nagios it was showing the usual 80% of checks scheduled to run in the same second. However, about 30-40 seconds before they were due to run, the auto rescheduling kicked in and spread them out evenly over the next 5 minutes avoiding the huge CPU spike.

I have left Nagios running with the auto rescheduling options in place and will let you know if I notice any performance hit. Host and service check latency is low so it looks like it is working fine.

regards,
Aidan
Awesome! Glad to help