Page 2 of 2
Re: SNMP traps being received but not updating in Nagios
Posted: Thu Sep 07, 2017 4:43 pm
by scottwilkerson
The duration is only going to change if the state changes to something other than OK, so you can receive 1000 traps but the duration will say 2 hours because that is the duration in that state, not since the last trap was received.
Re: SNMP traps being received but not updating in Nagios
Posted: Mon Sep 11, 2017 11:53 am
by snapon_admin
Yeah I know that but I was watching the screen and the status output wasn't updating either. I also can't seem to submit passive check results on any of these hosts for some reason. Just to test state changes I was going to submit a passive check with an unknown status and then have the device send an OK trap just to see if it would change but I can't even manually submit a passive result to change it to unknown.
Re: SNMP traps being received but not updating in Nagios
Posted: Mon Sep 11, 2017 4:17 pm
by scottwilkerson
snapon_admin wrote:Yeah I know that but I was watching the screen and the status output wasn't updating either. I also can't seem to submit passive check results on any of these hosts for some reason. Just to test state changes I was going to submit a passive check with an unknown status and then have the device send an OK trap just to see if it would change but I can't even manually submit a passive result to change it to unknown.
I'm not sure that this is your issue, but we have had many clients that didn't realize their servers send the same "OK" trap like every other second and it would overwrite the result before they could see it.
If that's not the case, and they really aren't updating, then I would verify you do not have multiple parent nagios processes
And also that there isn't some DB corruption
Re: SNMP traps being received but not updating in Nagios
Posted: Tue Sep 12, 2017 10:17 am
by snapon_admin
Code: Select all
[root@lisl-ngos-01-pv var]# ps -ef|grep bin/nagios
nagios 9030 1 3 09:53 ? 00:00:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 9032 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9033 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9034 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9035 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9036 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9037 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9038 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9039 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9040 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9041 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9042 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 9044 9030 0 09:53 ? 00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
root 12200 4702 0 10:16 pts/0 00:00:00 grep bin/nagios
nagios 21975 1 0 Aug31 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Code: Select all
[root@lisl-ngos-01-pv var]# tail -100 /var/log/mysqld.log
170423 19:26:03 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170423 19:26:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170423 19:26:46 InnoDB: Initializing buffer pool, size = 8.0M
170423 19:26:46 InnoDB: Completed initialization of buffer pool
170423 19:26:46 InnoDB: Started; log sequence number 0 44253
170423 19:26:46 [Note] Event Scheduler: Loaded 0 events
170423 19:26:46 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
170423 20:34:33 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:33 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:40 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:40 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:47 [ERROR] /usr/libexec170423 20:43:22 [Note] /usr/libexec/mysqld: Normal shutdown
170423 20:43:22 [Note] Event Scheduler: Purging the queue. 0 events
170423 20:43:24 [Warning] /usr/libexec/mysqld: Forcing close of thread 1205 user: 'ndoutils'
170423 20:44:00 InnoDB: Starting shutdown...
170423 20:44:01 InnoDB: Shutdown completed; log sequence number 0 44253
170423 20:44:01 [Note] /usr/libexec/mysqld: Shutdown complete
170423 20:44:01 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170423 20:44:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170423 20:44:46 InnoDB: Initializing buffer pool, size = 8.0M
170423 20:44:46 InnoDB: Completed initialization of buffer pool
170423 20:44:46 InnoDB: Started; log sequence number 0 44253
170423 20:44:46 [Note] Event Scheduler: Loaded 0 events
170423 20:44:46 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
170424 12:46:57 [Note] /usr/libexec/mysqld: Normal shutdown
170424 12:46:57 [Note] Event Scheduler: Purging the queue. 0 events
170424 12:46:57 InnoDB: Starting shutdown...
170424 12:47:03 InnoDB: Shutdown completed; log sequence number 0 44253
170424 12:47:03 [Note] /usr/libexec/mysqld: Shutdown complete
170424 12:47:03 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170424 12:47:05 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170424 12:47:05 InnoDB: Initializing buffer pool, size = 8.0M
170424 12:47:05 InnoDB: Completed initialization of buffer pool
170424 12:47:05 InnoDB: Started; log sequence number 0 44253
170424 12:47:06 [Note] Event Scheduler: Loaded 0 events
170424 12:47:06 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:01 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:19 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:19 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:22 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:22 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:36 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:36 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:41 [Warning] Disk is full writing '/var/lib/mysql/nagios/nagios_logentries.MYI' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
170424 15:33:41 [Warning] Retry in 60 secs. Message reprinted in 600 secs
170424 15:33:43 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:43 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:50 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:50 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:57 [ERROR] /usr/li170425 11:48:06 [Note] /usr/libexec/mysqld: Normal shutdown
170425 11:48:06 [Note] Event Scheduler: Purging the queue. 0 events
170425 11:48:06 [ERROR] /usr/libexec/mysqld: Sort aborted
170425 11:48:06 [ERROR] /usr/libexec/mysqld: Sort aborted
170425 11:48:08 InnoDB: Starting shutdown...
170425 11:48:10 InnoDB: Shutdown completed; log sequence number 0 44253
170425 11:48:10 [Note] /usr/libexec/mysqld: Shutdown complete
170425 11:48:10 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170425 11:48:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170425 11:48:11 InnoDB: Initializing buffer pool, size = 8.0M
170425 11:48:11 InnoDB: Completed initialization of buffer pool
170425 11:48:11 InnoDB: Started; log sequence number 0 44253
170425 11:48:11 [Note] Event Scheduler: Loaded 0 events
170425 11:48:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
Re: SNMP traps being received but not updating in Nagios
Posted: Tue Sep 12, 2017 10:29 am
by scottwilkerson
you have 2 parent nagios processes, that will cause problems
Code: Select all
nagios 9030 1 3 09:53 ? 00:00:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 21975 1 0 Aug31 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Lets run the following:
Code: Select all
service nagios stop
killall -9 nagios
service nagios start
Re: SNMP traps being received but not updating in Nagios
Posted: Tue Sep 12, 2017 2:15 pm
by snapon_admin
Ok, that's what I thought that meant, is there any particular way to prevent this? It seems to happen quite frequently here. As a matter of fact, I literally just did what you posted and it spawned 2 again.
Code: Select all
[root@lisl-ngos-01-pv var]# ps -ef|grep bin/nagios
nagios 10853 1 3 12:09 ? 00:04:50 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 10855 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10856 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10857 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10858 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10859 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10860 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10861 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10862 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10863 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10864 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10865 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10866 10853 0 12:09 ? 00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 10949 10853 0 12:09 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 19717 4702 0 14:14 pts/0 00:00:00 grep bin/nagios
nagios 21975 1 0 Aug31 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
You have new mail in /var/spool/mail/root
[root@lisl-ngos-01-pv var]# service nagios stop
Stopping nagios:. done.
[root@lisl-ngos-01-pv var]# killall -9 nagios
[root@lisl-ngos-01-pv var]# service nagios start
Starting nagios: done.
[root@lisl-ngos-01-pv var]# ps -ef|grep bin/nagios
nagios 20783 1 7 14:14 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20785 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20786 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20787 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20788 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20790 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20791 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20792 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20793 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20794 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20795 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20796 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20797 20783 0 14:14 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 20867 20783 0 14:15 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 25461 4702 0 14:15 pts/0 00:00:00 grep bin/nagios
Re: SNMP traps being received but not updating in Nagios
Posted: Tue Sep 12, 2017 2:33 pm
by dwhitfield
snapon_admin wrote: I literally just did what you posted and it spawned 2 again.
Is that what you are trying to show in the output, because your output shows something different. Your output shows a parent and a child, not two parents.
Code: Select all
nagios 20783 1 7 14:14 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20867 20783 0 14:15 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Notice how 20783 matches between the two. However, in the first, they are both spawning from 1.
Re: SNMP traps being received but not updating in Nagios
Posted: Tue Sep 12, 2017 3:15 pm
by snapon_admin
Ah you're right my fault. I read it wrong, looked like it was a different number to me for some reason. Either way, this seems to happen a lot to us, any idea what causes it or how to avoid it? In regards to the Trap issue, i think I might have 2 separate issues going on there. One where traps are received and processed but Nagios doesn't always update. Sometimes it does right away, sometimes a few seconds/minutes late, and sometimes not at all. And this is both when submitting passive checks manually to change the status and when receiving a trap. The other issue might not be Nagios' fault, but we have a server that's been sending traps to Nagios for well over a year without issue and suddenly it doesn't send them, we get an error when trying (on the server, not in Nagios). For the second issue I'm leaning toward there being an issue on that server and not with Nagios but, just wondering if you had any thoughts on that.
Re: SNMP traps being received but not updating in Nagios
Posted: Tue Sep 12, 2017 4:10 pm
by dwhitfield
The next time this happens run
ipcs -q and see if you have two message queues.
grep queue /var/log/messages might be a clue too. If you have two in ipcs or anything ndo related in that grep, that's a sign you should take a look at
https://support.nagios.com/kb/article.php?id=139