Think I broke Nagios already

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
brien.crean
Posts: 6
Joined: Mon Jul 16, 2012 11:30 am

Think I broke Nagios already

Post by brien.crean »

Hi all,

I had Nagios working fine on Voyage (Debian) Linux. I then tried to follow this guide to setting up SMS notifications:

http://www.jonathangazeley.com/2009/08/ ... th-nagios/

But I think while trying to get that running I broke Nagios. I had to update cpan, install gcc, make and build-essentials to get the perl module to install

I think while installing the various packages apaches configuration was altered and I thought I might have a permissions issue but I can't even change the permissions of the files mentioned in the below log file. I am getting Input/Output errors. I created test files in those directories to see if it was disk corruption.

Here is my /var/log/nagios3/nagios.log file

[1343244722] Warning: Could not stat() check result file '/var/lib/nagios3/spool/checkresults/ch5zlMa'.
[1343244722] Error: Unable to rename file '/var/cache/nagios3/nagios.tmp4H62m8' to '/var/cache/nagios3/status.dat': Input/output error
[1343244722] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error
[1343244726] Caught SIGTERM, shutting down...
[1343244726] Successfully shutdown... (PID=6452)
[1343244726] Nagios 3.2.1 starting... (PID=6572)
[1343244726] Local time is Wed Jul 25 20:32:06 IST 2012
[1343244726] LOG VERSION: 2.0
[1343244726] Finished daemonizing... (New PID=6573)
[1343244726] Error: Unable to rename file '/var/cache/nagios3/nagios.tmprlT48L' to '/var/cache/nagios3/status.dat': Input/output error
[1343244726] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error
[1343244736] Warning: Could not stat() check result file '/var/lib/nagios3/spool/checkresults/ch5zlMa'.
[1343244736] Error: Unable to rename file '/var/cache/nagios3/nagios.tmp9fVLF3' to '/var/cache/nagios3/status.dat': Input/output error
[1343244736] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error
[1343244746] Warning: Could not stat() check result file '/var/lib/nagios3/spool/checkresults/ch5zlMa'.
[1343244746] Error: Unable to rename file '/var/cache/nagios3/nagios.tmpMKIzO9' to '/var/cache/nagios3/status.dat': Input/output error
[1343244746] Error: Unable to update status data file '/var/cache/nagios3/status.dat': Input/output error

Any help would be much appreciated!

Thanks
Last edited by brien.crean on Thu Jul 26, 2012 9:39 am, edited 1 time in total.
gdeber
Posts: 4
Joined: Wed Jul 25, 2012 2:38 am

Re: Think I broke Nagios already

Post by gdeber »

Hi!
brien.crean wrote:but I can't even change the permissions of the files mentioned in the below log file. I am getting Input/Output errors.
take a look also at dmesg output after the IN/Out error.
Maybe you can try fsck.<yourFStype> to see if there's some fs error, and badblocks to test disk for bad sectors.

Have a nice day
Debe
brien.crean
Posts: 6
Joined: Mon Jul 16, 2012 11:30 am

Re: Think I broke Nagios already

Post by brien.crean »

gdeber wrote: Maybe you can try fsck.<yourFStype> to see if there's some fs error, and badblocks to test disk for bad sectors.
Thanks Debe. I am running Linux on an Alix with a CF card. I ran fsck.ext2 on the filesystem and I got:

root@voyage:~# fsck.ext2 -n /dev/hda1
e2fsck 1.41.12 (17-May-2010)
Warning! /dev/hda1 is mounted.
ROOT_FS contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'ch5zlMa.ok' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82243. Clear? no

Entry 'ch5zlMa' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82170. Clear? no

Entry 'cB9mGcg.ok' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82262. Clear? no

Entry 'cB9mGcg' in /var/lib/nagios3/spool/checkresults (82096) has deleted/unused inode 82261. Clear? no

Entry 'status.dat' in /var/cache/nagios3 (82097) has deleted/unused inode 82181. Clear? no

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -338869 -(341382--341385) -(341998--341999) -(343125--343127)
Fix? no

Free blocks count wrong (352235, counted=352242).
Fix? no

Inode bitmap differences: -82170 -82181 -(82243--82244) -(82247--82249)
Fix? no

Free inodes count wrong (100603, counted=100604).
Fix? no


ROOT_FS: ********** WARNING: Filesystem still has errors **********

ROOT_FS: 22277/122880 files (0.5% non-contiguous), 138835/491070 blocks


Looks like a filesystem/bad block issue

badblocks didnt return any errors

Can I run fsck.ext2 while the file system is mounted or would I be better removing the CF card and connecting it via USB to another Linux server and then run fsck.ext2?

Thanks
Brien
User avatar
nscott
Posts: 1040
Joined: Wed May 11, 2011 8:54 am

Re: Think I broke Nagios already

Post by nscott »

Do not run fsck on mounted volume, definitely unmount it and run fsck on it elsewhere, unmounted. Only tears and sadness will come from running fsck on a mounted volume.
Nicholas Scott
Former Nagios employee
brien.crean
Posts: 6
Joined: Mon Jul 16, 2012 11:30 am

Re: Think I broke Nagios already

Post by brien.crean »

nscott wrote:Do not run fsck on mounted volume, definitely unmount it and run fsck on it elsewhere, unmounted. Only tears and sadness will come from running fsck on a mounted volume.
Thanks nscott! I did unmount it and then I ran fsck on the filesystem. All is working fine now, until the next time! Thanks everyone for your help
gdeber
Posts: 4
Joined: Wed Jul 25, 2012 2:38 am

Re: Think I broke Nagios already

Post by gdeber »

nscott wrote:Only tears and sadness will come from running fsck on a mounted volume.
:lol:
Locked