Page 2 of 2

Re: SNMP traps being received but not updating in Nagios

Posted: Thu Sep 07, 2017 4:43 pm
by scottwilkerson
The duration is only going to change if the state changes to something other than OK, so you can receive 1000 traps but the duration will say 2 hours because that is the duration in that state, not since the last trap was received.

Re: SNMP traps being received but not updating in Nagios

Posted: Mon Sep 11, 2017 11:53 am
by snapon_admin
Yeah I know that but I was watching the screen and the status output wasn't updating either. I also can't seem to submit passive check results on any of these hosts for some reason. Just to test state changes I was going to submit a passive check with an unknown status and then have the device send an OK trap just to see if it would change but I can't even manually submit a passive result to change it to unknown.

Re: SNMP traps being received but not updating in Nagios

Posted: Mon Sep 11, 2017 4:17 pm
by scottwilkerson
snapon_admin wrote:Yeah I know that but I was watching the screen and the status output wasn't updating either. I also can't seem to submit passive check results on any of these hosts for some reason. Just to test state changes I was going to submit a passive check with an unknown status and then have the device send an OK trap just to see if it would change but I can't even manually submit a passive result to change it to unknown.
I'm not sure that this is your issue, but we have had many clients that didn't realize their servers send the same "OK" trap like every other second and it would overwrite the result before they could see it.

If that's not the case, and they really aren't updating, then I would verify you do not have multiple parent nagios processes

Code: Select all

ps -ef|grep bin/nagios
And also that there isn't some DB corruption

Code: Select all

tail -100 /var/log/mysqld.log

Re: SNMP traps being received but not updating in Nagios

Posted: Tue Sep 12, 2017 10:17 am
by snapon_admin

Code: Select all

[root@lisl-ngos-01-pv var]# ps -ef|grep bin/nagios
nagios    9030     1  3 09:53 ?        00:00:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    9032  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9033  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9034  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9035  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9036  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9037  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9038  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9039  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9040  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9041  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9042  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    9044  9030  0 09:53 ?        00:00:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
root     12200  4702  0 10:16 pts/0    00:00:00 grep bin/nagios
nagios   21975     1  0 Aug31 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Code: Select all

[root@lisl-ngos-01-pv var]# tail -100 /var/log/mysqld.log
170423 19:26:03 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170423 19:26:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170423 19:26:46  InnoDB: Initializing buffer pool, size = 8.0M
170423 19:26:46  InnoDB: Completed initialization of buffer pool
170423 19:26:46  InnoDB: Started; log sequence number 0 44253
170423 19:26:46 [Note] Event Scheduler: Loaded 0 events
170423 19:26:46 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
170423 20:34:33 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:33 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:37 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:40 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:40 [ERROR] /usr/libexec/mysqld: Sort aborted
170423 20:34:47 [ERROR] /usr/libexec170423 20:43:22 [Note] /usr/libexec/mysqld: Normal shutdown

170423 20:43:22 [Note] Event Scheduler: Purging the queue. 0 events
170423 20:43:24 [Warning] /usr/libexec/mysqld: Forcing close of thread 1205  user: 'ndoutils'

170423 20:44:00  InnoDB: Starting shutdown...
170423 20:44:01  InnoDB: Shutdown completed; log sequence number 0 44253
170423 20:44:01 [Note] /usr/libexec/mysqld: Shutdown complete

170423 20:44:01 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170423 20:44:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170423 20:44:46  InnoDB: Initializing buffer pool, size = 8.0M
170423 20:44:46  InnoDB: Completed initialization of buffer pool
170423 20:44:46  InnoDB: Started; log sequence number 0 44253
170423 20:44:46 [Note] Event Scheduler: Loaded 0 events
170423 20:44:46 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
170424 12:46:57 [Note] /usr/libexec/mysqld: Normal shutdown

170424 12:46:57 [Note] Event Scheduler: Purging the queue. 0 events
170424 12:46:57  InnoDB: Starting shutdown...
170424 12:47:03  InnoDB: Shutdown completed; log sequence number 0 44253
170424 12:47:03 [Note] /usr/libexec/mysqld: Shutdown complete

170424 12:47:03 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170424 12:47:05 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170424 12:47:05  InnoDB: Initializing buffer pool, size = 8.0M
170424 12:47:05  InnoDB: Completed initialization of buffer pool
170424 12:47:05  InnoDB: Started; log sequence number 0 44253
170424 12:47:06 [Note] Event Scheduler: Loaded 0 events
170424 12:47:06 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:32:59 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:01 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:09 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:19 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:19 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:22 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:22 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:29 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:36 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:36 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:39 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:41 [Warning] Disk is full writing '/var/lib/mysql/nagios/nagios_logentries.MYI' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
170424 15:33:41 [Warning] Retry in 60 secs. Message reprinted in 600 secs
170424 15:33:43 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:43 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:49 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:50 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:50 [ERROR] /usr/libexec/mysqld: Sort aborted
170424 15:33:57 [ERROR] /usr/li170425 11:48:06 [Note] /usr/libexec/mysqld: Normal shutdown

170425 11:48:06 [Note] Event Scheduler: Purging the queue. 0 events
170425 11:48:06 [ERROR] /usr/libexec/mysqld: Sort aborted
170425 11:48:06 [ERROR] /usr/libexec/mysqld: Sort aborted
170425 11:48:08  InnoDB: Starting shutdown...
170425 11:48:10  InnoDB: Shutdown completed; log sequence number 0 44253
170425 11:48:10 [Note] /usr/libexec/mysqld: Shutdown complete

170425 11:48:10 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
170425 11:48:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170425 11:48:11  InnoDB: Initializing buffer pool, size = 8.0M
170425 11:48:11  InnoDB: Completed initialization of buffer pool
170425 11:48:11  InnoDB: Started; log sequence number 0 44253
170425 11:48:11 [Note] Event Scheduler: Loaded 0 events
170425 11:48:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

Re: SNMP traps being received but not updating in Nagios

Posted: Tue Sep 12, 2017 10:29 am
by scottwilkerson
you have 2 parent nagios processes, that will cause problems

Code: Select all

nagios    9030     1  3 09:53 ?        00:00:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   21975     1  0 Aug31 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Lets run the following:

Code: Select all

service nagios stop
killall -9 nagios
service nagios start

Re: SNMP traps being received but not updating in Nagios

Posted: Tue Sep 12, 2017 2:15 pm
by snapon_admin
Ok, that's what I thought that meant, is there any particular way to prevent this? It seems to happen quite frequently here. As a matter of fact, I literally just did what you posted and it spawned 2 again.

Code: Select all

[root@lisl-ngos-01-pv var]# ps -ef|grep bin/nagios
nagios   10853     1  3 12:09 ?        00:04:50 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   10855 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10856 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10857 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10858 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10859 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10860 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10861 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10862 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10863 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10864 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10865 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10866 10853  0 12:09 ?        00:00:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   10949 10853  0 12:09 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     19717  4702  0 14:14 pts/0    00:00:00 grep bin/nagios
nagios   21975     1  0 Aug31 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
You have new mail in /var/spool/mail/root
[root@lisl-ngos-01-pv var]# service nagios stop
Stopping nagios:. done.
[root@lisl-ngos-01-pv var]# killall -9 nagios
[root@lisl-ngos-01-pv var]# service nagios start
Starting nagios: done.
[root@lisl-ngos-01-pv var]# ps -ef|grep bin/nagios
nagios   20783     1  7 14:14 ?        00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20785 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20786 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20787 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20788 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20790 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20791 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20792 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20793 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20794 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20795 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20796 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20797 20783  0 14:14 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   20867 20783  0 14:15 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     25461  4702  0 14:15 pts/0    00:00:00 grep bin/nagios

Re: SNMP traps being received but not updating in Nagios

Posted: Tue Sep 12, 2017 2:33 pm
by dwhitfield
snapon_admin wrote: I literally just did what you posted and it spawned 2 again.
Is that what you are trying to show in the output, because your output shows something different. Your output shows a parent and a child, not two parents.

Code: Select all

nagios   20783     1  7 14:14 ?        00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   20867 20783  0 14:15 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Notice how 20783 matches between the two. However, in the first, they are both spawning from 1.

Re: SNMP traps being received but not updating in Nagios

Posted: Tue Sep 12, 2017 3:15 pm
by snapon_admin
Ah you're right my fault. I read it wrong, looked like it was a different number to me for some reason. Either way, this seems to happen a lot to us, any idea what causes it or how to avoid it? In regards to the Trap issue, i think I might have 2 separate issues going on there. One where traps are received and processed but Nagios doesn't always update. Sometimes it does right away, sometimes a few seconds/minutes late, and sometimes not at all. And this is both when submitting passive checks manually to change the status and when receiving a trap. The other issue might not be Nagios' fault, but we have a server that's been sending traps to Nagios for well over a year without issue and suddenly it doesn't send them, we get an error when trying (on the server, not in Nagios). For the second issue I'm leaning toward there being an issue on that server and not with Nagios but, just wondering if you had any thoughts on that.

Re: SNMP traps being received but not updating in Nagios

Posted: Tue Sep 12, 2017 4:10 pm
by dwhitfield
The next time this happens run ipcs -q and see if you have two message queues. grep queue /var/log/messages might be a clue too. If you have two in ipcs or anything ndo related in that grep, that's a sign you should take a look at https://support.nagios.com/kb/article.php?id=139