Page 1 of 1
Nagios won't stay up
Posted: Fri Jul 26, 2013 11:49 am
by kinnema
Hi,
Overnight last night our nagios process stopped, and since then I've been unable to keep it up. I can start it, but it stops almost immediately. There isn't much in the log (unless there's a better log to look at than /var/log/messages or /usr/local/nagios/var/nagios.log). I did find a message from last night in /var/log/messages:
Jul 26 05:39:52 uitlnagp01 ndo2db: Error: mysql_query() failed for 'UPDATE nagios_conninfo SET disconnect_time=NOW(), last_checkin_time=NOW(), data_end_time=FROM_UNIXTIME(0), bytes_proessed='0', lines_processed='0', entries_processed='0' WHERE conninfo_id='0''
Jul 26 05:39:52 uitlnagp01 ndo2db: mysq_error: 'MySQL server has gone away'
Jul 26 05:39:52 uitlnagp01 ndo2db: Error: Connection to MySQL database has been lost!
This is about the time that the messages log ceases to have entries for any nrpe processes and check results. When I began looking into the issue mysqld was still running, but I restarted it and ran the repair script just in case, but I still can't get Nagios to stay up. Any ideas?
Thanks!
Re: Nagios won't stay up
Posted: Fri Jul 26, 2013 1:00 pm
by lmiltchev
Run the following command and show the output (in code wraps):
Re: Nagios won't stay up
Posted: Fri Jul 26, 2013 1:20 pm
by kinnema
Here's the output
Code: Select all
[root@uitlnagp01 ~J$ tail -n 50 /var/log/mysqld.log
130726 12:14:28 mysqld ended
130726 12:18:15 mysqld started
130726 12:18:15 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 12:18:15 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 12:18:16 InnoDB: Started; log sequence number 0 43695
130726 12:18:16 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
130726 12:30:00 [Note] /usr/libexec/mysqld: Normal shutdown
130726 12:30:02 InnoDB: Starting shutdown...
130726 12:30:03 InnoDB: Shutdown completed; log sequence number 0 43695
130726 12:30:03 [Note] /usr/libexec/mysqld: Shutdown complete
130726 12:30:03 mysqld ended
130726 12:33:56 mysqld started
130726 12:33:56 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 12:33:56 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 12:33:56 InnoDB: Started; log sequence number 0 43695
130726 12:33:57 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
130726 12:35:13 [Note] /usr/libexec/mysqld: Normal shutdown
130726 12:35:15 InnoDB: Starting shutdown...
130726 12:35:16 InnoDB: Shutdown completed; log sequence number 0 43695
130726 12:35:16 [Note] /usr/libexec/mysqld: Shutdown complete
130726 12:35:16 mysqld ended
130726 12:35:17 mysqld started
130726 12:35:17 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 12:35:17 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 12:35:17 InnoDB: Started; log sequence number 0 43695
130726 12:35:17 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
130726 14:17:10 [Note] /usr/libexec/mysqld: Normal shutdown
130726 14:17:12 InnoDB: Starting shutdown...
130726 14:17:13 InnoDB: Shutdown completed; log sequence number 0 43705
130726 14:17:13 [Note] /usr/libexec/mysqld: Shutdown complete
130726 14:17:13 mysqld ended
130726 14:17:13 mysqld started
130726 14:17:13 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 14:17:13 [Warning] option 'max_join_size': unsigned value 18446744073709551615 adjusted to 4294967295
130726 14:17:14 InnoDB: Started; log sequence number 0 43705
130726 14:17:14 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
I just restored all of nagios xi from backup and the issue still exists.
Re: Nagios won't stay up
Posted: Fri Jul 26, 2013 1:28 pm
by lmiltchev
Do you have any config errors?
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Any new errors in the system or nagios logs?
Re: Nagios won't stay up
Posted: Fri Jul 26, 2013 1:45 pm
by kinnema
No config errors (plenty of warnings though

)
I think we may have found the issue... fingers crossed anyway. I found the mysqld log file and a dnx debug log file were over 2GB, so removed those and now it seems to be staying up. At least for the moment.
I'll post back in a bit and let you know either way. Thanks for the help!
Re: Nagios won't stay up
Posted: Fri Jul 26, 2013 1:51 pm
by lmiltchev
H-m-m. I'm glad it's working now but I'm just wondering if you are running out of disk space... What's the output of the following commands?
Re: Nagios won't stay up
Posted: Fri Jul 26, 2013 2:24 pm
by kinnema
Disk space isn't an issue:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
3.9G 2.7G 1.1G 72% /
/dev/mapper/VolGroup00-LogVol02
3.9G 2.7G 1.1G 72% /home
/dev/mapper/VolGroup00-LogVol06
632G 78G 522G 13% /opt
/dev/mapper/VolGroup00-LogVol05
9.7G 5.0G 4.3G 54% /var
/dev/mapper/VolGroup00-LogVol01
3.9G 829M 2.9G 23% /usr/local
/dev/sda1 251M 32M 207M 14% /boot
tmpfs 24G 0 24G 0% /dev/shm
tmpfs 100M 39M 62M 39% /var/nagiosramdisk
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
1048576 130285 918291 13% /
/dev/mapper/VolGroup00-LogVol02
1048576 18497 1030079 2% /home
/dev/mapper/VolGroup00-LogVol06
171016192 50956 170965236 1% /opt
/dev/mapper/VolGroup00-LogVol05
2621440 17408 2604032 1% /var
/dev/mapper/VolGroup00-LogVol01
1048576 40368 1008208 4% /usr/local
/dev/sda1 66264 42 66222 1% /boot
tmpfs 94734 1 94733 1% /dev/shm
tmpfs 94734 535 94199 1% /var/nagiosramdisk
Things seem to be stable now.
Re: Nagios won't stay up
Posted: Fri Jul 26, 2013 2:36 pm
by lmiltchev
OK, let us know if you have any more issues. Thanks!