Strange regular outages
Posted: Sat Oct 04, 2014 9:22 am
Hello,
We're using Nagios 4.0.7 with ~4 000 active checks and ~10 000 passive checks on a single machine (we plan to split them to two instances, but this is another story). What we observe is that there are regular, like scheduled, periods when Nagios stops processing anything for about 9-10 min. Both active and passive checks go to 0%, there are no notifications being sent either. Then, after 9-10 min it starts again. We could see this happen regularly four times a day at exactly the same time: e.g. 01:27, 03:27, 13:27, 22:27, and then the next day, again and again. From time to time the exact time changes but the distance between different outages is still kept 2-10-9-3 hours... We could not see anything suspicious on the machine (like high CPU or something), there is no clue in Nagios logs either. We had the same behavior with Nagios 3.3.
Is such behavior known to anybody? Could you please advice what to check and how to continue our investigations?
Thanks!
We're using Nagios 4.0.7 with ~4 000 active checks and ~10 000 passive checks on a single machine (we plan to split them to two instances, but this is another story). What we observe is that there are regular, like scheduled, periods when Nagios stops processing anything for about 9-10 min. Both active and passive checks go to 0%, there are no notifications being sent either. Then, after 9-10 min it starts again. We could see this happen regularly four times a day at exactly the same time: e.g. 01:27, 03:27, 13:27, 22:27, and then the next day, again and again. From time to time the exact time changes but the distance between different outages is still kept 2-10-9-3 hours... We could not see anything suspicious on the machine (like high CPU or something), there is no clue in Nagios logs either. We had the same behavior with Nagios 3.3.
Is such behavior known to anybody? Could you please advice what to check and how to continue our investigations?
Thanks!


