Here is an strace on the same box from just a few minutes ago.
As you can see whats happening is Nagios does not appear to be catching
the error about trying to write to a read only file system.
nanosleep({1, 0},{1, 0}) = 0
kill(-7799, SIGKILL) = -1 ESRCH (No such process)
gettimeofday({1227029613, 509530}, NULL) = 0
close(10) = 0
open("/usr/local/nagios/var/nagios.log", O_RDWR|O_APPEND|O_CREAT, 0666)
= -1 EROFS (Read-only file system)
gettimeofday({1227029613, 511146}, NULL) = 0
gettimeofday({1227029613, 511316}, NULL) = 0
time([1227029613]) = 1227029613
time([1227029613]) = 1227029613
gettimeofday({1227029613, 511843}, NULL) = 0
time([1227029613]) = 1227029613
gettimeofday({1227029613, 512141}, NULL) = 0
gettimeofday({1227029613, 512287}, NULL) = 0
gettimeofday({1227029613, 511659}, NULL) = 0
time([1227029613]) = 1227029613
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=877, ...}) = 0
gettimeofday({1227029613, 512585}, NULL) = 0
gettimeofday({1227029613, 512746}, NULL) = 0
gettimeofday({1227029613, 513905}, NULL) = 0
gettimeofday({1227029613, 514058}, NULL) = 0
time([1227029613]) = 1227029613
time([1227029613]) = 1227029613
time([1227029613]) = 1227029613
gettimeofday({1227029613, 513917}, NULL) = 0
gettimeofday({1227029613, 514869}, NULL) = 0
time([1227029613]) = 1227029613
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=877, ...}) = 0
time([1227029613]) = 1227029613
time([1227029613]) = 1227029613
gettimeofday({1227029613, 522324}, NULL) = 0
time([1227029613]) = 1227029613
gettimeofday({1227029613, 521730}, NULL) = 0
pipe([10, 11]) = 0
fcntl64(10, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
fcntl64(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
gettimeofday({1227029613, 522435}, NULL) = 0
gettimeofday({1227029613, 522584}, NULL) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0x40176708) = 7802
close(11) = 0
waitpid(7802, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7802
--- SIGCHLD (Child exited) @ 0 (0) ---
As always thanks for looking into this with me.
Sincerely,
Steven D. Morrey
On Tue, 2008-11-18 at 10:28 -0700, Steven D. Morrey wrote:
> Hello Everyone,
> Over the weekend my test implementation of Nagios stopped recording
> results.
> I checked with ps and it is still running along just fine but it appears
> to have lost the ability to write out results.
> After doing some checking I noticed that it stopped writing at 5pm last
> Saturday.
> On a hunch I checked my /var/messages and found this little beauty of an
> error.
>
> Nov 15 05:01:43 test-system kernel: SCSI error : return code =
> 0x20008
> Nov 15 05:01:43 test-system kernel: end_request: I/O error, dev sda,
> sector 27780153
> Nov 15 05:01:43 test-system kernel: buffer layer error at
> fs/buffer.c:2996
> Nov 15 05:01:43 test-system kernel: Call Trace:
> Nov 15 05:01:43 test-system kernel: [] drop_buffers
> +0x149/0x1c0
> Nov 15 05:01:43 test-system kernel: [] try_to_free_buffers
> +0x24/0x70
> Nov 15 05:01:43 test-system kernel: [] reiserfs_releasepage
> +0x5c/0xa0 [reiserfs]
> Nov 15 05:01:43 test-system kernel: [] reiserfs_releasepage
> +0x0/0xa0 [reiserfs]
> Nov 15 05:01:43 test-system kernel: [] try_to_release_page
> +0x35/0x50
> Nov 15 05:01:43 test-system kernel: []
> reiserfs_invalidatepage+0x15c/0x1b0 [reiserfs]
> Nov 15 05:01:43 test-system kernel: [] do_invalidatepage
> +0x14/0x30
> Nov 15 05:01:43 test-system kernel: [] truncate_complete_page
> +0x9e/0xc0
> Nov 15 05:01:43 test-system kernel: [] truncate_inode_pages
> +0xa3/0x300
> Nov 15 05:01:43 test-system kernel: [] reiserfs_delete_inode
> +0x0/0xdc [reiserfs]
> Nov 15 05:01:43 t
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]