Nagios database seems hosed
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: Nagios database seems hosed
Well that seems to have done it! Looks like everything's chugging along nicely again. No idea what would have caused this to happen, especially over the weekend when no changes were being made, but meh. Thanks for all the help with this!
Re: Nagios database seems hosed
I'll leave this one open, just in case... let's hope we don't have to worry about itsnapon_admin wrote:Well that seems to have done it! Looks like everything's chugging along nicely again. No idea what would have caused this to happen, especially over the weekend when no changes were being made, but meh. Thanks for all the help with this!
Glad it's working though!
Former Nagios Employee.
me.
me.
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: Nagios database seems hosed
A few days in and everything seems ok. I had a thought though, any idea why this would have happened? It happened during the weekend when no changes were being made and it happened on our test server so it was important to get working but not "oh god the CIO is breathing down my neck about this" important. I'd like to have an idea of what caused the issue so that I never have to run into scenario 2.
Re: Nagios database seems hosed
It is hard to say what caused it without going through all of the log files but your system looks really tight on free memory.
If the system gets low on free memory, the Linux kernel could shutdown some processes and that could of caused it.
Try adding some memory to the system.
If the system gets low on free memory, the Linux kernel could shutdown some processes and that could of caused it.
Try adding some memory to the system.
Be sure to check out our Knowledgebase for helpful articles and solutions!
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: Nagios database seems hosed
That is interesting. This server has 218 hosts and 3819 services and is using more memory than our prod server that has 1102 hosts and 12777 services. Why would that be?
Re: Nagios database seems hosed
Can you run the following and post back the output?
Code: Select all
free
ps auxBe sure to check out our Knowledgebase for helpful articles and solutions!
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: Nagios database seems hosed
Code: Select all
[root@keno-ngos-01-pv ~]# free
total used free shared buffers cached
Mem: 8054044 7357472 696572 0 172880 1477004
-/+ buffers/cache: 5707588 2346456
Swap: 2064376 11348 2053028
[root@keno-ngos-01-pv ~]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 19232 1112 ? Ss Aug31 0:01 /sbin/init
root 2 0.0 0.0 0 0 ? S Aug31 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Aug31 1:45 [migration/0]
root 4 0.0 0.0 0 0 ? S Aug31 1:36 [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Aug31 0:00 [migration/0]
root 6 0.0 0.0 0 0 ? S Aug31 1:43 [watchdog/0]
root 7 0.0 0.0 0 0 ? S Aug31 1:44 [migration/1]
root 8 0.0 0.0 0 0 ? S Aug31 0:00 [migration/1]
root 9 0.0 0.0 0 0 ? S Aug31 2:23 [ksoftirqd/1]
root 10 0.0 0.0 0 0 ? S Aug31 1:18 [watchdog/1]
root 11 0.0 0.0 0 0 ? S Aug31 1:51 [migration/2]
root 12 0.0 0.0 0 0 ? S Aug31 0:00 [migration/2]
root 13 0.0 0.0 0 0 ? S Aug31 2:09 [ksoftirqd/2]
root 14 0.0 0.0 0 0 ? S Aug31 1:05 [watchdog/2]
root 15 0.0 0.0 0 0 ? S Aug31 1:47 [migration/3]
root 16 0.0 0.0 0 0 ? S Aug31 0:00 [migration/3]
root 17 0.0 0.0 0 0 ? S Aug31 2:02 [ksoftirqd/3]
root 18 0.0 0.0 0 0 ? S Aug31 1:03 [watchdog/3]
root 19 0.1 0.0 0 0 ? S Aug31 8:55 [events/0]
root 20 0.1 0.0 0 0 ? S Aug31 8:21 [events/1]
root 21 0.1 0.0 0 0 ? S Aug31 7:06 [events/2]
root 22 0.1 0.0 0 0 ? S Aug31 6:48 [events/3]
root 23 0.0 0.0 0 0 ? S Aug31 0:00 [cgroup]
root 24 0.0 0.0 0 0 ? S Aug31 0:00 [khelper]
root 25 0.0 0.0 0 0 ? S Aug31 0:00 [netns]
root 26 0.0 0.0 0 0 ? S Aug31 0:00 [async/mgr]
root 27 0.0 0.0 0 0 ? S Aug31 0:00 [pm]
root 28 0.0 0.0 0 0 ? S Aug31 0:34 [sync_supers]
root 29 0.0 0.0 0 0 ? S Aug31 0:22 [bdi-default]
root 30 0.0 0.0 0 0 ? S Aug31 0:00 [kintegrityd/0]
root 31 0.0 0.0 0 0 ? S Aug31 0:00 [kintegrityd/1]
root 32 0.0 0.0 0 0 ? S Aug31 0:00 [kintegrityd/2]
root 33 0.0 0.0 0 0 ? S Aug31 0:00 [kintegrityd/3]
root 34 0.1 0.0 0 0 ? S Aug31 5:10 [kblockd/0]
root 35 0.1 0.0 0 0 ? S Aug31 7:25 [kblockd/1]
root 36 0.1 0.0 0 0 ? S Aug31 7:26 [kblockd/2]
root 37 0.1 0.0 0 0 ? S Aug31 7:02 [kblockd/3]
root 38 0.0 0.0 0 0 ? S Aug31 0:00 [kacpid]
root 39 0.0 0.0 0 0 ? S Aug31 0:00 [kacpi_notify]
root 40 0.0 0.0 0 0 ? S Aug31 0:00 [kacpi_hotplug]
root 41 0.0 0.0 0 0 ? S Aug31 0:00 [ata/0]
root 42 0.0 0.0 0 0 ? S Aug31 0:00 [ata/1]
root 43 0.0 0.0 0 0 ? S Aug31 0:00 [ata/2]
root 44 0.0 0.0 0 0 ? S Aug31 0:00 [ata/3]
root 45 0.0 0.0 0 0 ? S Aug31 0:00 [ata_aux]
root 46 0.0 0.0 0 0 ? S Aug31 0:00 [ksuspend_usbd]
root 47 0.0 0.0 0 0 ? S Aug31 0:00 [khubd]
root 48 0.0 0.0 0 0 ? S Aug31 0:00 [kseriod]
root 49 0.0 0.0 0 0 ? S Aug31 0:00 [md/0]
root 50 0.0 0.0 0 0 ? S Aug31 0:00 [md/1]
root 51 0.0 0.0 0 0 ? S Aug31 0:00 [md/2]
root 52 0.0 0.0 0 0 ? S Aug31 0:00 [md/3]
root 53 0.0 0.0 0 0 ? S Aug31 0:00 [md_misc/0]
root 54 0.0 0.0 0 0 ? S Aug31 0:00 [md_misc/1]
root 55 0.0 0.0 0 0 ? S Aug31 0:00 [md_misc/2]
root 56 0.0 0.0 0 0 ? S Aug31 0:00 [md_misc/3]
root 57 0.0 0.0 0 0 ? S Aug31 0:00 [khungtaskd]
root 58 0.0 0.0 0 0 ? S Aug31 0:38 [kswapd0]
root 59 0.0 0.0 0 0 ? SN Aug31 0:00 [ksmd]
root 60 0.0 0.0 0 0 ? SN Aug31 0:45 [khugepaged]
root 61 0.0 0.0 0 0 ? S Aug31 0:00 [aio/0]
root 62 0.0 0.0 0 0 ? S Aug31 0:00 [aio/1]
root 63 0.0 0.0 0 0 ? S Aug31 0:00 [aio/2]
root 64 0.0 0.0 0 0 ? S Aug31 0:00 [aio/3]
root 65 0.0 0.0 0 0 ? S Aug31 0:00 [crypto/0]
root 66 0.0 0.0 0 0 ? S Aug31 0:00 [crypto/1]
root 67 0.0 0.0 0 0 ? S Aug31 0:00 [crypto/2]
root 68 0.0 0.0 0 0 ? S Aug31 0:00 [crypto/3]
root 73 0.0 0.0 0 0 ? S Aug31 0:00 [kthrotld/0]
root 74 0.0 0.0 0 0 ? S Aug31 0:00 [kthrotld/1]
root 75 0.0 0.0 0 0 ? S Aug31 0:00 [kthrotld/2]
root 76 0.0 0.0 0 0 ? S Aug31 0:00 [kthrotld/3]
root 77 0.0 0.0 0 0 ? S Aug31 0:00 [pciehpd]
root 79 0.0 0.0 0 0 ? S Aug31 0:00 [kpsmoused]
root 80 0.0 0.0 0 0 ? S Aug31 0:00 [usbhid_resumer]
root 110 0.0 0.0 0 0 ? S Aug31 0:00 [kstriped]
root 292 0.0 0.0 0 0 ? S Aug31 0:00 [scsi_eh_0]
root 293 0.0 0.0 0 0 ? S Aug31 0:00 [scsi_eh_1]
root 305 0.1 0.0 0 0 ? S Aug31 6:37 [mpt_poll_0]
root 306 0.0 0.0 0 0 ? S Aug31 0:00 [mpt/0]
root 307 0.0 0.0 0 0 ? S Aug31 0:00 [scsi_eh_2]
root 374 0.0 0.0 0 0 ? S Aug31 0:00 [kdmflush]
root 376 0.0 0.0 0 0 ? S Aug31 0:00 [kdmflush]
root 393 0.1 0.0 0 0 ? S Aug31 8:25 [jbd2/dm-0-8]
root 394 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
root 395 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
root 396 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
root 397 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
apache 466 1.5 0.4 463884 38228 ? S Sep02 16:12 /usr/sbin/httpd
root 470 0.0 0.0 10908 312 ? S<s Aug31 0:00 /sbin/udevd -d
postgres 537 0.0 0.0 215460 7044 ? Ss Sep02 0:25 postgres: nagiosxi nagiosxi ::1(34468) idle
root 670 0.0 0.0 0 0 ? S Aug31 0:24 [vmmemctl]
root 705 0.0 0.0 0 0 ? S Aug31 0:00 [hd-audio0]
root 793 0.0 0.0 10904 256 ? S< Aug31 0:00 /sbin/udevd -d
root 821 0.0 0.0 0 0 ? S Aug31 0:00 [jbd2/sda1-8]
root 822 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
root 823 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
root 824 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
root 825 0.0 0.0 0 0 ? S Aug31 0:00 [ext4-dio-unwrit]
root 857 0.0 0.0 0 0 ? S Aug31 0:11 [kauditd]
root 1094 0.0 0.0 93200 748 ? S<sl Aug31 0:45 auditd
root 1110 0.0 0.0 249472 5008 ? Sl Aug31 0:31 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
dbus 1122 0.0 0.0 21404 368 ? Ss Aug31 0:00 dbus-daemon --system
root 1159 0.0 0.0 64400 624 ? Ss Aug31 0:00 /usr/sbin/sshd
root 1167 0.0 0.0 22096 640 ? Ss Aug31 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
ntp 1175 0.0 0.0 30720 1548 ? Ss Aug31 0:01 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
postgres 1350 0.0 0.0 214048 5584 ? S Aug31 0:45 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
root 1353 1.5 0.0 0 0 ? S Aug31 70:59 [flush-253:0]
postgres 1354 0.0 0.0 177060 920 ? Ss Aug31 1:40 postgres: logger process
postgres 1356 0.1 0.0 214168 6708 ? Ss Aug31 6:05 postgres: writer process
postgres 1357 0.1 0.0 214048 1288 ? Ss Aug31 5:28 postgres: wal writer process
postgres 1358 0.0 0.0 214344 1464 ? Ss Aug31 2:28 postgres: autovacuum launcher process
postgres 1359 0.0 0.0 177328 1156 ? Ss Aug31 1:48 postgres: stats collector process
root 1437 0.0 0.0 79072 3320 ? Ss Aug31 0:07 /usr/libexec/postfix/master
postfix 1446 0.0 0.0 79324 3452 ? S Aug31 0:00 qmgr -l -t fifo -u
root 1455 0.0 0.0 117216 1244 ? Ss Aug31 0:28 crond
ajaxterm 1472 0.1 0.0 168208 7328 ? Sl Aug31 5:59 python /usr/share/ajaxterm/ajaxterm.py --daemon --port=8022 --uid=ajaxterm
root 1562 0.0 0.0 4064 564 tty1 Ss+ Aug31 0:00 /sbin/mingetty /dev/tty1
root 1564 0.0 0.0 4064 564 tty2 Ss+ Aug31 0:00 /sbin/mingetty /dev/tty2
root 1566 0.0 0.0 10904 252 ? S< Aug31 0:00 /sbin/udevd -d
root 1567 0.0 0.0 4064 568 tty3 Ss+ Aug31 0:00 /sbin/mingetty /dev/tty3
root 1569 0.0 0.0 4064 564 tty4 Ss+ Aug31 0:00 /sbin/mingetty /dev/tty4
root 1571 0.0 0.0 4064 564 tty5 Ss+ Aug31 0:00 /sbin/mingetty /dev/tty5
root 1573 0.0 0.0 4064 564 tty6 Ss+ Aug31 0:00 /sbin/mingetty /dev/tty6
apache 12068 1.4 0.4 463884 38604 ? S Sep02 14:25 /usr/sbin/httpd
postgres 12360 0.0 0.0 215460 7092 ? Ss Sep02 0:22 postgres: nagiosxi nagiosxi ::1(58975) idle
apache 17223 1.3 0.4 463900 38412 ? S 11:57 2:53 /usr/sbin/httpd
postgres 17351 0.0 0.0 215460 6832 ? Ss 11:57 0:03 postgres: nagiosxi nagiosxi ::1(44834) idle
root 18082 0.0 0.1 337720 13924 ? Ss Sep01 1:40 /usr/sbin/httpd
root 18169 0.0 0.0 108168 1552 ? S Sep01 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld
mysql 18271 5.1 0.7 2239728 61556 ? Sl Sep01 168:41 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/r
nagios 18312 0.0 0.0 50192 856 ? Ss Sep01 0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 18323 0.0 0.0 368888 972 ? S Sep01 1:05 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
apache 25971 1.1 0.4 464692 39056 ? S 12:05 2:20 /usr/sbin/httpd
postgres 26060 0.0 0.0 215504 6852 ? Ss 12:05 0:03 postgres: nagiosxi nagiosxi ::1(50735) idle
apache 26243 1.5 0.4 464732 39412 ? S Sep02 16:56 /usr/sbin/httpd
apache 26250 1.2 0.4 464672 39144 ? S 12:05 2:26 /usr/sbin/httpd
postgres 26341 0.0 0.0 215504 7068 ? Ss Sep02 0:28 postgres: nagiosxi nagiosxi ::1(35631) idle
postgres 26362 0.0 0.0 215460 6836 ? Ss 12:05 0:03 postgres: nagiosxi nagiosxi ::1(50948) idle
apache 27264 0.3 0.3 448636 26072 ? S 15:06 0:04 /usr/sbin/httpd
postgres 28153 0.0 0.0 215420 5624 ? Ss 15:07 0:00 postgres: nagiosxi nagiosxi ::1(45368) idle
apache 31807 1.1 0.4 464616 39108 ? S 12:10 2:11 /usr/sbin/httpd
postgres 31856 0.0 0.0 215504 6856 ? Ss 12:10 0:02 postgres: nagiosxi nagiosxi ::1(54665) idle
postfix 43794 0.0 0.0 79152 3336 ? S 14:21 0:00 pickup -l -t fifo -u
root 48874 0.0 0.0 140100 1728 ? S 15:27 0:00 CROND
root 48875 0.0 0.0 140100 1728 ? S 15:27 0:00 CROND
root 48876 0.0 0.0 140100 1728 ? S 15:27 0:00 CROND
root 48877 0.0 0.0 140100 1728 ? S 15:27 0:00 CROND
root 48878 0.0 0.0 140100 1728 ? S 15:27 0:00 CROND
nagios 48879 0.0 0.0 9196 1184 ? Ss 15:27 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios 48880 0.0 0.0 9196 1184 ? Ss 15:27 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1
nagios 48883 0.0 0.0 9196 1188 ? Ss 15:27 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios 48885 1.8 0.2 224688 22520 ? S 15:27 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios 48886 0.0 0.0 9196 1188 ? Ss 15:27 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios 48888 1.7 0.2 224556 22292 ? S 15:27 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios 48889 0.0 0.0 9196 1184 ? Ss 15:27 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios 48890 2.8 0.3 231976 25300 ? S 15:27 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios 48893 1.7 0.2 224984 22800 ? S 15:27 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios 48894 2.4 0.3 232500 30272 ? S 15:27 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
postgres 48902 0.1 0.0 215456 5636 ? Ss 15:27 0:00 postgres: nagiosxi nagiosxi ::1(59658) idle
postgres 48904 0.0 0.0 215420 5308 ? Ss 15:27 0:00 postgres: nagiosxi nagiosxi ::1(59659) idle
postgres 48911 0.0 0.0 215420 5348 ? Ss 15:27 0:00 postgres: nagiosxi nagiosxi ::1(59661) idle
postgres 48913 0.0 0.0 215456 5876 ? Ss 15:27 0:00 postgres: nagiosxi nagiosxi ::1(59662) idle
postgres 48916 0.0 0.0 215420 5332 ? Ss 15:27 0:00 postgres: nagiosxi nagiosxi ::1(59663) idle
root 49005 0.6 0.0 98160 4088 ? Ss 15:27 0:00 sshd: root@pts/0
root 49037 0.1 0.0 108300 1956 pts/0 Ss 15:27 0:00 -bash
nagios 49126 0.0 0.0 41420 2884 ? S 15:27 0:00 /usr/local/nagios/libexec/check_nrpe -H kendbms05d.snapon.com -u -p 5668 -t 70 -c check_init_service -a svc:/system/filesy
root 49127 0.0 0.0 110232 1176 pts/0 R+ 15:27 0:00 ps aux
nagios 56561 2.2 0.1 34860 14172 ? Ss Sep01 70:30 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 56563 0.2 0.0 10016 992 ? S Sep01 6:11 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 56564 0.2 0.0 10016 988 ? S Sep01 6:24 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 56565 0.2 0.0 10016 1044 ? S Sep01 6:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 56566 0.1 0.0 10016 984 ? S Sep01 6:03 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 56567 0.2 0.0 10016 992 ? S Sep01 6:25 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 56568 0.2 0.0 10016 984 ? S Sep01 6:17 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 56573 0.1 0.0 50192 1592 ? S Sep01 4:59 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 56574 1.0 0.1 61240 12756 ? S Sep01 31:19 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 56708 0.0 0.1 34344 8796 ? S Sep01 2:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
apache 58498 1.5 0.4 464692 39416 ? S Sep02 16:29 /usr/sbin/httpd
postgres 58628 0.0 0.0 215460 7068 ? Ss Sep02 0:24 postgres: nagiosxi nagiosxi ::1(57778) idle Re: Nagios database seems hosed
Nothing looks bad in that output except for the buffers/cache used is a little high and that at one time the system swapped.
Run this.
Run this.
Code: Select all
ipcsBe sure to check out our Knowledgebase for helpful articles and solutions!
- snapon_admin
- Posts: 952
- Joined: Mon Jun 10, 2013 10:39 am
- Location: Kenosha, WI
- Contact:
Re: Nagios database seems hosed
Code: Select all
[root@keno-ngos-01-pv ~]# ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 0 postgres 600 37879808 18
------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 0 root 600 1
0x00000000 65537 root 600 1
0x0052e2c1 98306 postgres 600 17
0x0052e2c2 131075 postgres 600 17
0x0052e2c3 163844 postgres 600 17
0x0052e2c4 196613 postgres 600 17
0x0052e2c5 229382 postgres 600 17
0x0052e2c6 262151 postgres 600 17
0x0052e2c7 294920 postgres 600 17
0x00000000 425993 apache 600 1
0x00000000 458762 apache 600 1
0x00000000 491531 apache 600 1
------ Message Queues --------
key msqid owner perms used-bytes messages
0x1d000002 2195456 nagios 600 0 0 Re: Nagios database seems hosed
That looks good. Can't say what it is......
Be sure to check out our Knowledgebase for helpful articles and solutions!