Page 1 of 1

[Nagios-devel] Bug report: downtimes beyond 2038 cause event queue

Posted: Thu Apr 04, 2013 3:52 pm
by Guest
Hi!

We've come across a problem in an upgrade of Nagios 3 to Nagios 4 which =
we can't work out where the fix is. It occurs when an event is scheduled =
in the future beyond 2038.

Recreation steps:
* Set a downtime on a service to end next day
* Stop Nagios
* Edit the retention.dat so that the end_date=3D4514791088 (some other =
values seem to work)
* Start Nagios

When Nagios starts, it will not run any scheduled events in the events =
queue.

This fails on CentOS 5 64bit, though appears to work on Debian Squeeze =
32bit, so it maybe a 64 bit only issue.

We think this is an issue when the event is scheduled via squeue_add(). =
We've managed to get the test-squeue to fail by changing the time value =
to be greater than 2038 with the following:

Index: test-squeue.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- test-squeue.c (revision 2716)
+++ test-squeue.c (working copy)
@@ -116,7 +116,7 @@
sq_test_random(sq);
t(squeue_size(sq) =3D=3D 0, "Size should be 0 after first =
sq_test_random");
=20
- t((a.evt =3D squeue_add(sq, time(NULL) + 9, &a)) !=3D NULL);
+ t((a.evt =3D squeue_add(sq, time(NULL)*2, &a)) !=3D NULL);
t(squeue_size(sq) =3D=3D 1);
t((b.evt =3D squeue_add(sq, time(NULL) + 3, &b)) !=3D NULL);
t(squeue_size(sq) =3D=3D 2);

This gives the test result of:

### squeue tests
FAIL max id =3D=3D b.id @test-squeue.c:134
FAIL x =3D=3D &c @test-squeue.c:141
about to fail pretty fucking hard...
ea: 0xbfe065e0; &b: 0xbfe065d8; &c: 0xbfe065d0; ed: 0xbfe065c8; x: =
0xbfde9b80
FAIL x =3D=3D &b @test-squeue.c:152
FAIL x->id =3D=3D b.id @test-squeue.c:153
FAIL x =3D=3D &b @test-squeue.c:160
FAIL x->id =3D=3D b.id @test-squeue.c:161
FAIL x =3D=3D &c @test-squeue.c:166
FAIL x->id =3D=3D c.id @test-squeue.c:167
Test results: 390637 passed, 10 failed

Changing to a factor of 1.1 instead of 2 passes:

### squeue tests
Test results: 390647 passed, 0 failed

This worked in Nagios 3, so we're guessing that the change to use the =
squeue library for events is probably where this limitation has come in.

Any thoughts?

Ton






This post was automatically imported from historical nagios-devel mailing list archives
Original poster: ton.voon@opsview.com