Scheduling queue freezes

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
jfrickson

Re: Scheduling queue freezes

Post by jfrickson »

When it freezes, is the cpu usage high?
aisebouma wrote:About once a month the scheduling queue freezes and even restarting nagios does not resolve it, I also have to delete retention.dat.
Send us a copy of the retention.dat. If deleting it resolves it, there might be something in there to point me to where the problem lies. A copy of a normally running retention.dat would also help.

Also, does anything special or unusual happen about once a month? [EDIT] Trevor beat me to this question!
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

tmcdonald wrote:
aisebouma wrote:First of all a 1500 seconds time deviation is not normal for ntp.
We agree, @jolson mentioned this above.
aisebouma wrote:Second, why would nagios stop processing the scheduled queue after a time change?
Nagios uses in-memory timestamps to determine when the next checks should be run, notifications be sent, etc. When this skews from the system clock, you can see the "Compensating" message above. All these timestamps are periodically written to disk in retention.dat which is used to store state between reboots. This is why deleting that file causes the queue to refresh.
aisebouma wrote:Third, why is it not robust enough to restart processing the queue when nagios is restarted?
As mentioned above, the retention.dat file is what causes state to be restored on a reboot, and if the times saved in that file are off then the time will be off when nagios restarts. This is a drawback to be sure, but the alternative is that nothing gets saved on a restart and everything essentially gets re-checked from square one. This can be configured on or off, but the benefits far outweigh the drawbacks. At any rate, a little skew over time is usually dealt with in stride, but 6 hours suddenly is harder to deal with consistently.

We did see another user recently have a similar issue, but his was every weekend and the clock skew was not nearly as consistent (yours seems to be at least in the 6h range). We still haven't found what caused it on his system, and they did a lot of work up-front on their end to rule things out. A couple things I would like to ask of you:

Are you running mod_gearman?
How precise is the "once a month" estimate? Is it on a particular calendar day? Week day?
Does anything in your environment/network occur around the time that this happens?
That are the technical answers. I do not think it is the right functional solution (at least in my case). I do not run mod_gearman. The once a month is not very precise, I will check if I can find a pattern.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Scheduling queue freezes

Post by rkennedy »

Let us know if you need any further assistance from our team.
Former Nagios Employee
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

jfrickson wrote:When it freezes, is the cpu usage high?
aisebouma wrote:About once a month the scheduling queue freezes and even restarting nagios does not resolve it, I also have to delete retention.dat.
Send us a copy of the retention.dat. If deleting it resolves it, there might be something in there to point me to where the problem lies. A copy of a normally running retention.dat would also help.

Also, does anything special or unusual happen about once a month? [EDIT] Trevor beat me to this question!
Here are the dates it happened:
01-13-2015
01-24-2015
03-05-2015
03-09-2015
03-11-2015
03-17-2015
05-13-2015
08-06-2015
11-19-2015
11-25-2015
12-08-2015
Seems pretty random to me, happens even on Sundays when our business is closed.
I gave up on Nagios in May, but started with a new version in August, unfortunetaly the problem was not solved.

I will try to send you a copy of retention.dat via pm
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Scheduling queue freezes

Post by hsmith »

Let us know when you have sent the file. Thanks.
Former Nagios Employee.
me.
jfrickson

Re: Scheduling queue freezes

Post by jfrickson »

@hsmith I got the files in email. Checking them out.
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

Did the retention.dat files uncover a possible cause of the problem?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Scheduling queue freezes

Post by tmcdonald »

Haven't gotten an answer from our Dev yet, but with the holidays we haven't been in the office much until this week.

In the meantime though, did you ever find out if anything was happening on the network when this occurs? Backups, scans, updates, reboots, anything that might cause some interruption or slowness?
Former Nagios employee
aisebouma
Posts: 14
Joined: Mon Dec 07, 2015 9:05 am

Re: Scheduling queue freezes

Post by aisebouma »

tmcdonald wrote:Haven't gotten an answer from our Dev yet, but with the holidays we haven't been in the office much until this week.

In the meantime though, did you ever find out if anything was happening on the network when this occurs? Backups, scans, updates, reboots, anything that might cause some interruption or slowness?
No, it seems to occur totally random.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Scheduling queue freezes

Post by rkennedy »

I found a post here that may correlate to the issue you are experiencing - https://support.nagios.com/forum/viewto ... 10#p112580

Can you post the contents of the file below for us to review?

Code: Select all

/etc/sysctl.conf 
Former Nagios Employee
Locked