Re: [Nagios-devel] Bug report: downtimes beyond 2038 cause event

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Bug report: downtimes beyond 2038 cause event

Post by Guest »


On 4 Apr 2013, at 22:55, Andreas Ericsson wrote:
>> This fails on CentOS 5 64bit, though appears to work on Debian =
Squeeze 32bit, so it maybe a 64 bit only issue.
>>=20
>> We think this is an issue when the event is scheduled via =
squeue_add(). We've managed to get the test-squeue to fail by changing =
the time value to be greater than 2038 with the following:
>>=20
>> Index: test-squeue.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- test-squeue.c (revision 2716)
>> +++ test-squeue.c (working copy)
>> @@ -116,7 +116,7 @@
>> sq_test_random(sq);
>> t(squeue_size(sq) =3D=3D 0, "Size should be 0 after first =
sq_test_random");
>>=20
>> - t((a.evt =3D squeue_add(sq, time(NULL) + 9, &a)) !=3D NULL);
>> + t((a.evt =3D squeue_add(sq, time(NULL)*2, &a)) !=3D NULL);
>> t(squeue_size(sq) =3D=3D 1);
>> t((b.evt =3D squeue_add(sq, time(NULL) + 3, &b)) !=3D NULL);
>> t(squeue_size(sq) =3D=3D 2);
>>=20
>> This gives the test result of:
>>=20
>> ### squeue tests
>> FAIL max > FAIL x =3D=3D &b @test-squeue.c:133
>> FAIL x->id =3D=3D b.id @test-squeue.c:134
>> FAIL x =3D=3D &c @test-squeue.c:141
>> about to fail pretty fucking hard...
>> ea: 0xbfe065e0; &b: 0xbfe065d8; &c: 0xbfe065d0; ed: 0xbfe065c8; x: =
0xbfde9b80
>> FAIL x =3D=3D &b @test-squeue.c:152
>> FAIL x->id =3D=3D b.id @test-squeue.c:153
>> FAIL x =3D=3D &b @test-squeue.c:160
>> FAIL x->id =3D=3D b.id @test-squeue.c:161
>> FAIL x =3D=3D &c @test-squeue.c:166
>> FAIL x->id =3D=3D c.id @test-squeue.c:167
>> Test results: 390637 passed, 10 failed
>>=20
>> Changing to a factor of 1.1 instead of 2 passes:
>>=20
>=20
> I'm not surprised. 1.1 would mean it's still within the unix =
timeframe.
>=20
> What's the size of time_t, long and struct timeval on systems where it=20=

> fails?
> What's the sizes on systems where it succeeds?

With the recreation steps, Nagios 4 works fine on rhel5 32bit, but fails =
on rhel5 64bit.

sizes.c:

#include
#include
#include
#include
#include
#include
#include
#include "pqueue.h"

int main(int argc, char **argv)
{
struct timeval tv;

printf("long =3D %d\n", sizeof(long));
printf("time_t =3D %d\n", sizeof(time_t));
printf("tv =3D %d\n", sizeof(tv));
printf("pqueue_pri_t =3D %d\n", sizeof(pqueue_pri_t));
return 0;

}

RHEL5 32 bit:
long =3D 4
time_t =3D 4
tv =3D 8
pqueue_pri_t =3D 8


RHEL5 64 bit:
long =3D 8
time_t =3D 8
tv =3D 16
pqueue_pri_t =3D 8

> Does time_t differ in signedness on them?

Not sure how to check this.

> I think a runtime check based on those sizes should work just fine, =
and
> also be optimized away so we don't actually have to pay for it, but =
I'm
> curious to see where it actually goes wrong. If it's before we get to
> see the number in squeue.c we're pretty much fscked, as the only =
option
> then is a macro which does voodoo-casting so the squeue api sees the
> right number.
>=20
>> ### squeue tests
>> Test results: 390647 passed, 0 failed
>>=20
>> This worked in Nagios 3, so we're guessing that the change to use the =
squeue library for events is probably where this limitation has come in.
>>=20
>> Any thoughts?
>>=20
>=20
> Well, modifying the evt_compute_pri() algorithm to discard
> everything but the 21 least significant bits of the tv->tv_usec
> would allow us to use 43 bits for the seconds value. That would
> land us somewhere in the year 141234 before we run out of seconds.
> It's not a real fix though, since we could live with discarding
> events that are patently absurd, but blocking the entire scheduler
> because we get a bogus date is just plain wrong.

I've changed the code so it now looks like this:

static pqueue_pri_t evt_compute_pri(struct timeval *tv)
{
pqueue_pri_t ret;

/* keep weird compilers on 32-bit systems from doing wrong */
if(sizeof(pqueue_pri_t) tv_sec;
ret +=3D !!tv->tv_

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: ton.voon@opsview.com
Locked