Page 2 of 3

Re: Problems after upgradee to 5.5

Posted: Mon Jul 09, 2018 8:20 am
by nik.vu
I think i found the problem, when host goes down in 5.5 nagios constantly recheck every 1 minutes, tottaly different than in 5.4x. As you can see from atachement, this host is down for more than 4 hours and nagios still checking it on one minute interval. Thats tottaly opposite from config where check interval is set to 15 minutes. Like in 5.5 retry interval constantly check for down hosts and services, and this is real problem for setups that have more than 10.000 checks.

When i restore to previus version everything is ok with load, checks and XI works fine.

Code: Select all

define host {
    host_name                   KI_W009-KikindaGraditelj
    use                         xiwizard_switch_host
    address                     172.21.64.52
    parents                     Kikinda WIFI linkovi1
    hostgroups                  Kikinda
    max_check_attempts          5
    check_interval              15
    retry_interval              1
    check_period                xi_timeperiod_24x7
    contacts                    l1.provera,nagiosadmin
    notification_interval       0
    notification_period         xi_timeperiod_24x7
    first_notification_delay    10
    icon_image                  switch.png
    statusmap_image             switch.png
    _xiwizard                   switch
    register                    1
}

Re: Problems after upgradee to 5.5

Posted: Mon Jul 09, 2018 10:30 am
by scottwilkerson
Thanks for putting this together, and we have confirmed this is a bug.

Re: Problems after upgradee to 5.5

Posted: Mon Jul 09, 2018 11:42 am
by Envera IT
I can confirm we're also seeing a massive increase in the CPU load after the 5.5 upgrade. ~13k checks here with around 500 being down or unreachable at any given time. CPU load has quadrupled.

Re: Problems after upgradee to 5.5

Posted: Mon Jul 09, 2018 4:29 pm
by tgriep
Can you run the following commands as root and post the output to the forum so we can see what is taking the most processing at this time?

Code: Select all

top -n 1
ps -ef --cols-300
ipcs -q
Thanks

Re: Problems after upgradee to 5.5

Posted: Tue Jul 10, 2018 8:39 am
by Envera IT
tgriep wrote:Can you run the following commands as root and post the output to the forum so we can see what is taking the most processing at this time?

Code: Select all

top -n 1
ps -ef --cols-300
ipcs -q
Thanks
So I looked through some other threads and ran

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
My issues resolved soon after and load dropped back down to normal levels.

Re: Problems after upgradee to 5.5

Posted: Tue Jul 10, 2018 9:59 am
by tmcdonald
Good to hear. Did you have further (related) questions or are we good to lock this up?

Re: Problems after upgradee to 5.5

Posted: Tue Jul 10, 2018 10:00 am
by Envera IT
tmcdonald wrote:Good to hear. Did you have further (related) questions or are we good to lock this up?
I'm not OP so will differ to him/her.

Re: Problems after upgradee to 5.5

Posted: Tue Jul 10, 2018 10:05 am
by tmcdonald
Ehamby wrote:
tmcdonald wrote:Good to hear. Did you have further (related) questions or are we good to lock this up?
I'm not OP so will differ to him/her.
Ahh, thanks. Didn't scroll up far enough :)

Re: Problems after upgradee to 5.5

Posted: Tue Jul 10, 2018 1:19 pm
by nik.vu
This isn't yet resolved, still is the same problem. Even with ok load, XI is unusable. I explained in the first post, when config is applied XI needs to much time to be back in normal state, also XI isn't doing checks in proper time. Check are late for about 30 minutes.

Also i need to much times to kill and restart nagios process via cli, when i restart process it's a little better, but thats not a proper solution.

On 5.4.3 with all these check everything is fine.

Re: Problems after upgradee to 5.5

Posted: Tue Jul 10, 2018 4:05 pm
by scottwilkerson
nik.vu wrote:I think i found the problem, when host goes down in 5.5 nagios constantly recheck every 1 minutes, tottaly different than in 5.4x. As you can see from atachement, this host is down for more than 4 hours and nagios still checking it on one minute interval. Thats tottaly opposite from config where check interval is set to 15 minutes. Like in 5.5 retry interval constantly check for down hosts and services, and this is real problem for setups that have more than 10.000 checks.
nik.vu,

This is a known issue we are working to get a resolution out soon in a 5.5.1 release, I would suspect late this week or early next.