Page 1 of 2

Nagios XI Caching Alert Data

Posted: Fri Feb 27, 2015 5:27 am
by jstoddart
We seem to be having issues with Nagios XI appearing to cache data about service and host checks.

Regardless of whether we configure the check directly or via CCM we still get invalid alerting. Have also tried de-activating some hosts and services via CCM but they still show as critical. When we drilldown on the critical alert it shows as pending as there is no actual service to check. (See attachments)

Is there a way to reset the server, I have tried restarting the engine from the Admin screens?

Re: Nagios XI Caching Alert Data

Posted: Fri Feb 27, 2015 10:58 am
by abrist
This sounds like an issue with ndo2db. Lets check some logs for errors or hints:

Code: Select all

tail -25 /var/log/mysqld.log
tail -25 /var/log/messages
Have you offloaded your database or implemented a ramdisk?

Re: Nagios XI Caching Alert Data

Posted: Sat Feb 28, 2015 11:37 am
by jstoddart
Apologies for delay in replying, forgot to add notification option and been a bit busy, here is the info you requested.

Regards

Jamie

tail -25 /var/log/mysqld.log
141111 1:41:16 [Note] /usr/libexec/mysqld: Shutdown complete

141111 01:41:16 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 01:46:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 1:46:11 InnoDB: Initializing buffer pool, size = 8.0M
141111 1:46:11 InnoDB: Completed initialization of buffer pool
141111 1:46:11 InnoDB: Started; log sequence number 0 44233
141111 1:46:11 [Note] Event Scheduler: Loaded 0 events
141111 1:46:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
141111 2:27:03 [Note] /usr/libexec/mysqld: Normal shutdown

141111 2:27:03 [Note] Event Scheduler: Purging the queue. 0 events
141111 2:27:05 InnoDB: Starting shutdown...
141111 2:27:09 InnoDB: Shutdown completed; log sequence number 0 44233
141111 2:27:09 [Note] /usr/libexec/mysqld: Shutdown complete

141111 02:27:09 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 02:29:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 2:29:20 InnoDB: Initializing buffer pool, size = 8.0M
141111 2:29:20 InnoDB: Completed initialization of buffer pool
141111 2:29:21 InnoDB: Started; log sequence number 0 44233
141111 2:29:21 [Note] Event Scheduler: Loaded 0 events
141111 2:29:21 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution

tail -25 /var/log/messages
Feb 28 16:31:55 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=29286 duration=0(sec)
Feb 28 16:32:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:06 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:32:26 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:26 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:32:46 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:46 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:05 linux64 nagios: wproc: Core Worker 29387: job 51453 (pid=29516) timed out. Killing it
Feb 28 16:33:05 linux64 nagios: wproc: CHECK job 51453 from worker Core Worker 29387 timed out after 60.01s
Feb 28 16:33:05 linux64 nagios: wproc: host=SCP_PROD_POP; service=CPU Stats;
Feb 28 16:33:05 linux64 nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Feb 28 16:33:05 linux64 nagios: wproc: stdout line 01: CHECK_NRPE: Socket timeout after 60 seconds.
Feb 28 16:33:05 linux64 nagios: Warning: Check of service 'CPU Stats' on host 'SCP_PROD_POP' timed out after 60.006s!
Feb 28 16:33:05 linux64 nagios: wproc: Core Worker 29387: job 51453 (pid=29516): Dormant child reaped
Feb 28 16:33:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:06 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:16 linux64 xinetd[1420]: START: nrpe pid=30667 from=::ffff:172.18.1.164
Feb 28 16:33:16 linux64 xinetd[30667]: FAIL: nrpe address from=::ffff:172.18.1.164
Feb 28 16:33:16 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=30667 duration=0(sec)
Feb 28 16:33:26 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:26 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:46 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:46 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:34:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:34:06 linux64 ndo2db: Warning: queue send error, retrying...
[NagiosX1:main.linux64 ~]#

Re: Nagios XI Caching Alert Data

Posted: Mon Mar 02, 2015 12:28 pm
by abrist
Looks like you are hitting the linux kernel max queue limit. Please increase this limit:
http://support.nagios.com/wiki/index.ph ... 3.x_Issues

Re: Nagios XI Caching Alert Data

Posted: Mon Mar 02, 2015 5:36 pm
by jstoddart
Thanks, will have a look at that.

Re: Nagios XI Caching Alert Data

Posted: Mon Mar 02, 2015 5:43 pm
by lmiltchev
Let us know if this fixes your issue.

Re: Nagios XI Caching Alert Data

Posted: Tue Mar 03, 2015 2:22 am
by jstoddart
Still same issue :-(

Re: Nagios XI Caching Alert Data

Posted: Tue Mar 03, 2015 2:25 am
by jstoddart
It appeared to have cleared for a few minutes then returned to the same problem

[NagiosX1:main.linux64 ~]# tail -25 /var/log/mysqld.log
141111 1:41:16 [Note] /usr/libexec/mysqld: Shutdown complete

141111 01:41:16 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 01:46:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 1:46:11 InnoDB: Initializing buffer pool, size = 8.0M
141111 1:46:11 InnoDB: Completed initialization of buffer pool
141111 1:46:11 InnoDB: Started; log sequence number 0 44233
141111 1:46:11 [Note] Event Scheduler: Loaded 0 events
141111 1:46:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
141111 2:27:03 [Note] /usr/libexec/mysqld: Normal shutdown

141111 2:27:03 [Note] Event Scheduler: Purging the queue. 0 events
141111 2:27:05 InnoDB: Starting shutdown...
141111 2:27:09 InnoDB: Shutdown completed; log sequence number 0 44233
141111 2:27:09 [Note] /usr/libexec/mysqld: Shutdown complete

141111 02:27:09 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 02:29:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 2:29:20 InnoDB: Initializing buffer pool, size = 8.0M
141111 2:29:20 InnoDB: Completed initialization of buffer pool
141111 2:29:21 InnoDB: Started; log sequence number 0 44233
141111 2:29:21 [Note] Event Scheduler: Loaded 0 events
141111 2:29:21 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
[NagiosX1:main.linux64 ~]# tail -25 /var/log/messages
Mar 3 07:20:17 linux64 xinetd[1420]: START: nrpe pid=4673 from=::ffff:172.18.1.164
Mar 3 07:20:17 linux64 xinetd[4673]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:20:17 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=4673 duration=0(sec)
Mar 3 07:21:34 linux64 nagios: wproc: Core Worker 22792: job 533374 (pid=4914) timed out. Killing it
Mar 3 07:21:34 linux64 nagios: wproc: CHECK job 533374 from worker Core Worker 22792 timed out after 60.01s
Mar 3 07:21:34 linux64 nagios: wproc: host=SCP_PROD_POP; service=CPU Stats;
Mar 3 07:21:34 linux64 nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Mar 3 07:21:34 linux64 nagios: Warning: Check of service 'CPU Stats' on host 'SCP_PROD_POP' timed out after 60.006s!
Mar 3 07:21:34 linux64 nagios: wproc: Core Worker 22792: job 533374 (pid=4914): Dormant child reaped
Mar 3 07:21:38 linux64 xinetd[1420]: START: nrpe pid=6060 from=::ffff:172.18.1.164
Mar 3 07:21:38 linux64 xinetd[6060]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:21:38 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=6060 duration=0(sec)
Mar 3 07:22:50 linux64 xinetd[1420]: START: nrpe pid=7333 from=::ffff:172.18.1.164
Mar 3 07:22:50 linux64 xinetd[7333]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:22:50 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=7333 duration=0(sec)
Mar 3 07:22:50 linux64 nagios: HOST ALERT: EJ_Monitoring_Old;UP;HARD;1;OK - 172.18.1.200: rta 2.222ms, lost 0%
Mar 3 07:22:50 linux64 nagios: HOST ALERT: Barnsley_QA_DB;UP;HARD;1;OK - 172.18.1.226: rta 1.343ms, lost 0%
Mar 3 07:22:54 linux64 nagios: HOST ALERT: Barnsley_QA_App01;UP;HARD;1;OK - 172.18.1.227: rta 0.619ms, lost 0%
Mar 3 07:22:55 linux64 xinetd[1420]: START: nrpe pid=7641 from=::ffff:172.18.1.164
Mar 3 07:22:55 linux64 xinetd[7641]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:22:55 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=7641 duration=0(sec)
Mar 3 07:22:56 linux64 nagios: HOST ALERT: Selenium;UP;HARD;1;OK - 80.69.143.227: rta 0.438ms, lost 0%
Mar 3 07:23:03 linux64 nagios: SERVICE ALERT: Essex BI Prod;Logon Errors;OK;HARD;5;Login Errors since last reboot is 0
Mar 3 07:23:21 linux64 nagios: SERVICE ALERT: EJ_Monitoring_Old;Ping;OK;HARD;5;OK - 172.18.1.200: rta 0.230ms, lost 0%
Mar 3 07:23:21 linux64 nagios: SERVICE ALERT: Barnsley_QA_DB;Ping;OK;HARD;5;OK - 172.18.1.226: rta 0.244ms, lost 0%

Re: Nagios XI Caching Alert Data

Posted: Tue Mar 03, 2015 1:45 pm
by tgriep
Could you post one of the configurations that you are having issues with?

Could you provide a screen capture of the following screen?

Click on "Admin" > "Monitoring Engine Status"

Re: Nagios XI Caching Alert Data

Posted: Wed Mar 04, 2015 3:16 am
by jstoddart
Not sure how you want the config posted but the host Barnsley_QA_App02 which keeps alerting even though it is disabled does not exist in /usr/local/nagios/etc/hosts on the server and there are likewise no services defined for it in /usr/local/nagios/etc/services but as you can see from my original post it keeps showing up.