I think i found the problem, when host goes down in 5.5 nagios constantly recheck every 1 minutes, tottaly different than in 5.4x. As you can see from atachement, this host is down for more than 4 hours and nagios still checking it on one minute interval. Thats tottaly opposite from config where check interval is set to 15 minutes. Like in 5.5 retry interval constantly check for down hosts and services, and this is real problem for setups that have more than 10.000 checks.
When i restore to previus version everything is ok with load, checks and XI works fine.
I can confirm we're also seeing a massive increase in the CPU load after the 5.5 upgrade. ~13k checks here with around 500 being down or unreachable at any given time. CPU load has quadrupled.
tgriep wrote:Can you run the following commands as root and post the output to the forum so we can see what is taking the most processing at this time?
This isn't yet resolved, still is the same problem. Even with ok load, XI is unusable. I explained in the first post, when config is applied XI needs to much time to be back in normal state, also XI isn't doing checks in proper time. Check are late for about 30 minutes.
Also i need to much times to kill and restart nagios process via cli, when i restart process it's a little better, but thats not a proper solution.
nik.vu wrote:I think i found the problem, when host goes down in 5.5 nagios constantly recheck every 1 minutes, tottaly different than in 5.4x. As you can see from atachement, this host is down for more than 4 hours and nagios still checking it on one minute interval. Thats tottaly opposite from config where check interval is set to 15 minutes. Like in 5.5 retry interval constantly check for down hosts and services, and this is real problem for setups that have more than 10.000 checks.
nik.vu,
This is a known issue we are working to get a resolution out soon in a 5.5.1 release, I would suspect late this week or early next.