Re: [Nagios-devel] ndo2db problems on solaris 10 (ndoutils 1.4b7)
Posted: Wed Feb 27, 2008 7:14 am
hi!
> Funny you should mention this as we just found a fix for Solaris for
> ndoutils 1.4b3. Note that in the accept call 11 lines up from the
> bottom there is an EINTR error from accept. We've patched the call
> around the accept so that an EINTR causes a retry and this appears to
> work around the problem. See the patch attached. My guess is that this
> occurs because the signal is received at the same time that the parent
> gets a result on accept, so accept returns with this error rather than
> handling the child signal first.
>
thanks very much for the patch, it works partially => the process doesn't
die anymore but i've further problems writing to the database.
nagios.log:
[1204124795] Nagios 3.0rc2 starting... (PID=10288)
[1204124795] Local time is Wed Feb 27 16:06:35 CET 2008
[1204124795] LOG VERSION: 2.0
[1204124795] ndomod: NDOMOD 1.4b7 (10-31-2007) Copyright (c) 2005-2007
Ethan Galstad ([email protected])
[1204124795] ndomod: Successfully connected to data sink. 0 queued items
to flush.
[1204124795] Event broker module '/usr/local/nagios/ndo/ndomod.o'
initialized successfully.
[1204124795] ndomod: Error writing to data sink! Some output may get lost...
[1204124795] Finished daemonizing... (New PID=10291)
[1204124811] ndomod: Successfully reconnected to data sink! 0 items lost,
253 queued items to flush.
[1204124811] ndomod: Error writing to data sink! Some output may get
lost. 236 queued items to flush.
[1204124827] ndomod: Successfully reconnected to data sink! 0 items lost,
316 queued items to flush.
[1204124827] ndomod: Error writing to data sink! Some output may get
lost. 299 queued items to flush.
[1204124836] Caught SIGTERM, shutting down...
[1204124836] Successfully shutdown... (PID=10291)
[1204124836] ndomod: Shutdown complete.
[1204124836] Event broker module '/usr/local/nagios/ndo/ndomod.o'
deinitialized successfully.
truss:
root@nagios_1 # truss -f -p 10003
10003: accept(5, 0xFFBFF554, 0xFFBFF564, SOV_DEFAULT) (sleeping...)
10003: accept(5, 0xFFBFF554, 0xFFBFF564, SOV_DEFAULT) = 6
10003: fork1() = 10289
10289: fork1() (returning as child ...) = 10003
10289: getpid() = 10289 [10003]
10003: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
10289: lwp_self() = 1
10003: close(6) = 0
10289: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
10289: llseek(3, 0, SEEK_CUR) = 0
10289: close(3) = 0
10289: open("/usr/local/nagios/var/ndo2db.debug",
O_RDWR|O_APPEND|O_CREAT, 0666) = 3
10289: sigaction(SIGQUIT, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGTERM, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGINT, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGSEGV, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGFPE, 0xFFBFED80, 0xFFBFEE20) = 0
10289: open("/etc/netconfig", O_RDONLY|O_LARGEFILE) = 7
10289: fcntl(7, F_DUPFD, 0x00000100) Err#22 EINVAL
10289: read(7, " # p r a g m a i d e n".., 1024) = 1024
10289: read(7, " t s t p i _ c".., 1024) = 215
10289: read(7, 0x000400F8, 1024) = 0
10289: lseek(7, 0, SEEK_SET) = 0
10289: read(7, " # p r a g m a i d e n".., 1024) = 1024
10289: read(7, " t s t p i _ c".., 1024) = 215
10289: read(7, 0x000400F8, 1024) = 0
10289: close(7) = 0
10289: open("/dev/udp", O_RDONLY) = 7
10289: ioctl(7, SIOCGLIFNUM, 0xFFBFEBD4) = 0
10289: close(7) = 0
10289: getuid() = 100 [100]
10289: getuid() = 100 [100]
10289: door_info(4, 0xFFBFE8E0) = 0
10289: door_call(4, 0xFFBFE988)
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
> Funny you should mention this as we just found a fix for Solaris for
> ndoutils 1.4b3. Note that in the accept call 11 lines up from the
> bottom there is an EINTR error from accept. We've patched the call
> around the accept so that an EINTR causes a retry and this appears to
> work around the problem. See the patch attached. My guess is that this
> occurs because the signal is received at the same time that the parent
> gets a result on accept, so accept returns with this error rather than
> handling the child signal first.
>
thanks very much for the patch, it works partially => the process doesn't
die anymore but i've further problems writing to the database.
nagios.log:
[1204124795] Nagios 3.0rc2 starting... (PID=10288)
[1204124795] Local time is Wed Feb 27 16:06:35 CET 2008
[1204124795] LOG VERSION: 2.0
[1204124795] ndomod: NDOMOD 1.4b7 (10-31-2007) Copyright (c) 2005-2007
Ethan Galstad ([email protected])
[1204124795] ndomod: Successfully connected to data sink. 0 queued items
to flush.
[1204124795] Event broker module '/usr/local/nagios/ndo/ndomod.o'
initialized successfully.
[1204124795] ndomod: Error writing to data sink! Some output may get lost...
[1204124795] Finished daemonizing... (New PID=10291)
[1204124811] ndomod: Successfully reconnected to data sink! 0 items lost,
253 queued items to flush.
[1204124811] ndomod: Error writing to data sink! Some output may get
lost. 236 queued items to flush.
[1204124827] ndomod: Successfully reconnected to data sink! 0 items lost,
316 queued items to flush.
[1204124827] ndomod: Error writing to data sink! Some output may get
lost. 299 queued items to flush.
[1204124836] Caught SIGTERM, shutting down...
[1204124836] Successfully shutdown... (PID=10291)
[1204124836] ndomod: Shutdown complete.
[1204124836] Event broker module '/usr/local/nagios/ndo/ndomod.o'
deinitialized successfully.
truss:
root@nagios_1 # truss -f -p 10003
10003: accept(5, 0xFFBFF554, 0xFFBFF564, SOV_DEFAULT) (sleeping...)
10003: accept(5, 0xFFBFF554, 0xFFBFF564, SOV_DEFAULT) = 6
10003: fork1() = 10289
10289: fork1() (returning as child ...) = 10003
10289: getpid() = 10289 [10003]
10003: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
10289: lwp_self() = 1
10003: close(6) = 0
10289: lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
[0x0000FFFF]
10289: llseek(3, 0, SEEK_CUR) = 0
10289: close(3) = 0
10289: open("/usr/local/nagios/var/ndo2db.debug",
O_RDWR|O_APPEND|O_CREAT, 0666) = 3
10289: sigaction(SIGQUIT, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGTERM, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGINT, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGSEGV, 0xFFBFED80, 0xFFBFEE20) = 0
10289: sigaction(SIGFPE, 0xFFBFED80, 0xFFBFEE20) = 0
10289: open("/etc/netconfig", O_RDONLY|O_LARGEFILE) = 7
10289: fcntl(7, F_DUPFD, 0x00000100) Err#22 EINVAL
10289: read(7, " # p r a g m a i d e n".., 1024) = 1024
10289: read(7, " t s t p i _ c".., 1024) = 215
10289: read(7, 0x000400F8, 1024) = 0
10289: lseek(7, 0, SEEK_SET) = 0
10289: read(7, " # p r a g m a i d e n".., 1024) = 1024
10289: read(7, " t s t p i _ c".., 1024) = 215
10289: read(7, 0x000400F8, 1024) = 0
10289: close(7) = 0
10289: open("/dev/udp", O_RDONLY) = 7
10289: ioctl(7, SIOCGLIFNUM, 0xFFBFEBD4) = 0
10289: close(7) = 0
10289: getuid() = 100 [100]
10289: getuid() = 100 [100]
10289: door_info(4, 0xFFBFE8E0) = 0
10289: door_call(4, 0xFFBFE988)
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]