Page 1 of 1

Think I broke Nagios already

Posted: Wed Jul 25, 2012 2:52 pm
by brien.crean
Hi all,

I had Nagios working fine on Voyage (Debian) Linux. I then tried to follow this guide to setting up SMS notifications:

http://www.jonathangazeley.com/2009/08/ ... th-nagios/

But I think while trying to get that running I broke Nagios. I had to update cpan, install gcc, make and build-essentials to get the perl module to install

I think while installing the various packages apaches configuration was altered and I thought I might have a permissions issue but I can't even change the permissions of the files mentioned in the below log file. I am getting Input/Output errors. I created test files in those directories to see if it was disk corruption.

Here is my /var/log/nagios3/nagios.log file

[1343244722] Warning: Could not stat() check result file '/var/lib/nagios3/spool/checkresults/ch5zlMa'.
[1343244722] Error: Unable to rename file '/var/cache/nagios3/nagios.tmp4H62m8' to '/var/cache/nagios3/status.dat': Input/output error
[1343244722] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error
[1343244726] Caught SIGTERM, shutting down...
[1343244726] Successfully shutdown... (PID=6452)
[1343244726] Nagios 3.2.1 starting... (PID=6572)
[1343244726] Local time is Wed Jul 25 20:32:06 IST 2012
[1343244726] LOG VERSION: 2.0
[1343244726] Finished daemonizing... (New PID=6573)
[1343244726] Error: Unable to rename file '/var/cache/nagios3/nagios.tmprlT48L' to '/var/cache/nagios3/status.dat': Input/output error
[1343244726] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error
[1343244736] Warning: Could not stat() check result file '/var/lib/nagios3/spool/checkresults/ch5zlMa'.
[1343244736] Error: Unable to rename file '/var/cache/nagios3/nagios.tmp9fVLF3' to '/var/cache/nagios3/status.dat': Input/output error
[1343244736] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error
[1343244746] Warning: Could not stat() check result file '/var/lib/nagios3/spool/checkresults/ch5zlMa'.
[1343244746] Error: Unable to rename file '/var/cache/nagios3/nagios.tmpMKIzO9' to '/var/cache/nagios3/status.dat': Input/output error
[1343244746] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error

Any help would be much appreciated!

Thanks

Re: Think I broke Nagios already

Posted: Thu Jul 26, 2012 12:52 am
by gdeber
Hi!
brien.crean wrote:but I can't even change the permissions of the files mentioned in the below log file. I am getting Input/Output errors.
take a look also at dmesg output after the IN/Out error.
Maybe you can try fsck.<yourFStype> to see if there's some fs error, and badblocks to test disk for bad sectors.

Have a nice day
Debe

Re: Think I broke Nagios already

Posted: Thu Jul 26, 2012 8:30 am
by brien.crean
gdeber wrote: Maybe you can try fsck.<yourFStype> to see if there's some fs error, and badblocks to test disk for bad sectors.
Thanks Debe. I am running Linux on an Alix with a CF card. I ran fsck.ext2 on the filesystem and I got:

root@voyage:~# fsck.ext2 -n /dev/hda1
e2fsck 1.41.12 (17-May-2010)
Warning! /dev/hda1 is mounted.
ROOT_FS contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'ch5zlMa.ok' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82243. Clear? no

Entry 'ch5zlMa' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82170. Clear? no

Entry 'cB9mGcg.ok' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82262. Clear? no

Entry 'cB9mGcg' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82261. Clear? no

Entry 'status.dat' in /var/cache/nagios3 (82097) has deleted/unused inode 82181. Clear? no

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -338869 -(341382--341385) -(341998--341999) -(343125--343127)
Fix? no

Free blocks count wrong (352235, counted=352242).
Fix? no

Inode bitmap differences: -82170 -82181 -(82243--82244) -(82247--82249)
Fix? no

Free inodes count wrong (100603, counted=100604).
Fix? no


ROOT_FS: ********** WARNING: Filesystem still has errors **********

ROOT_FS: 22277/122880 files (0.5% non-contiguous), 138835/491070 blocks


Looks like a filesystem/bad block issue

badblocks didnt return any errors

Can I run fsck.ext2 while the file system is mounted or would I be better removing the CF card and connecting it via USB to another Linux server and then run fsck.ext2?

Thanks
Brien

Re: Think I broke Nagios already

Posted: Thu Jul 26, 2012 9:15 am
by nscott
Do not run fsck on mounted volume, definitely unmount it and run fsck on it elsewhere, unmounted. Only tears and sadness will come from running fsck on a mounted volume.

Re: Think I broke Nagios already

Posted: Thu Jul 26, 2012 9:35 am
by brien.crean
nscott wrote:Do not run fsck on mounted volume, definitely unmount it and run fsck on it elsewhere, unmounted. Only tears and sadness will come from running fsck on a mounted volume.
Thanks nscott! I did unmount it and then I ran fsck on the filesystem. All is working fine now, until the next time! Thanks everyone for your help

Re: Think I broke Nagios already

Posted: Fri Jul 27, 2012 12:53 am
by gdeber
nscott wrote:Only tears and sadness will come from running fsck on a mounted volume.
:lol: