We seem to be having issues with Nagios XI appearing to cache data about service and host checks.
Regardless of whether we configure the check directly or via CCM we still get invalid alerting. Have also tried de-activating some hosts and services via CCM but they still show as critical. When we drilldown on the critical alert it shows as pending as there is no actual service to check. (See attachments)
Is there a way to reset the server, I have tried restarting the engine from the Admin screens?
Nagios XI Caching Alert Data
Nagios XI Caching Alert Data
You do not have the required permissions to view the files attached to this post.
Re: Nagios XI Caching Alert Data
This sounds like an issue with ndo2db. Lets check some logs for errors or hints:
Have you offloaded your database or implemented a ramdisk?
Code: Select all
tail -25 /var/log/mysqld.log
tail -25 /var/log/messagesFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Nagios XI Caching Alert Data
Apologies for delay in replying, forgot to add notification option and been a bit busy, here is the info you requested.
Regards
Jamie
tail -25 /var/log/mysqld.log
141111 1:41:16 [Note] /usr/libexec/mysqld: Shutdown complete
141111 01:41:16 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 01:46:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 1:46:11 InnoDB: Initializing buffer pool, size = 8.0M
141111 1:46:11 InnoDB: Completed initialization of buffer pool
141111 1:46:11 InnoDB: Started; log sequence number 0 44233
141111 1:46:11 [Note] Event Scheduler: Loaded 0 events
141111 1:46:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
141111 2:27:03 [Note] /usr/libexec/mysqld: Normal shutdown
141111 2:27:03 [Note] Event Scheduler: Purging the queue. 0 events
141111 2:27:05 InnoDB: Starting shutdown...
141111 2:27:09 InnoDB: Shutdown completed; log sequence number 0 44233
141111 2:27:09 [Note] /usr/libexec/mysqld: Shutdown complete
141111 02:27:09 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 02:29:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 2:29:20 InnoDB: Initializing buffer pool, size = 8.0M
141111 2:29:20 InnoDB: Completed initialization of buffer pool
141111 2:29:21 InnoDB: Started; log sequence number 0 44233
141111 2:29:21 [Note] Event Scheduler: Loaded 0 events
141111 2:29:21 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
tail -25 /var/log/messages
Feb 28 16:31:55 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=29286 duration=0(sec)
Feb 28 16:32:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:06 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:32:26 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:26 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:32:46 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:46 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:05 linux64 nagios: wproc: Core Worker 29387: job 51453 (pid=29516) timed out. Killing it
Feb 28 16:33:05 linux64 nagios: wproc: CHECK job 51453 from worker Core Worker 29387 timed out after 60.01s
Feb 28 16:33:05 linux64 nagios: wproc: host=SCP_PROD_POP; service=CPU Stats;
Feb 28 16:33:05 linux64 nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Feb 28 16:33:05 linux64 nagios: wproc: stdout line 01: CHECK_NRPE: Socket timeout after 60 seconds.
Feb 28 16:33:05 linux64 nagios: Warning: Check of service 'CPU Stats' on host 'SCP_PROD_POP' timed out after 60.006s!
Feb 28 16:33:05 linux64 nagios: wproc: Core Worker 29387: job 51453 (pid=29516): Dormant child reaped
Feb 28 16:33:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:06 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:16 linux64 xinetd[1420]: START: nrpe pid=30667 from=::ffff:172.18.1.164
Feb 28 16:33:16 linux64 xinetd[30667]: FAIL: nrpe address from=::ffff:172.18.1.164
Feb 28 16:33:16 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=30667 duration=0(sec)
Feb 28 16:33:26 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:26 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:46 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:46 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:34:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:34:06 linux64 ndo2db: Warning: queue send error, retrying...
[NagiosX1:main.linux64 ~]#
Regards
Jamie
tail -25 /var/log/mysqld.log
141111 1:41:16 [Note] /usr/libexec/mysqld: Shutdown complete
141111 01:41:16 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 01:46:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 1:46:11 InnoDB: Initializing buffer pool, size = 8.0M
141111 1:46:11 InnoDB: Completed initialization of buffer pool
141111 1:46:11 InnoDB: Started; log sequence number 0 44233
141111 1:46:11 [Note] Event Scheduler: Loaded 0 events
141111 1:46:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
141111 2:27:03 [Note] /usr/libexec/mysqld: Normal shutdown
141111 2:27:03 [Note] Event Scheduler: Purging the queue. 0 events
141111 2:27:05 InnoDB: Starting shutdown...
141111 2:27:09 InnoDB: Shutdown completed; log sequence number 0 44233
141111 2:27:09 [Note] /usr/libexec/mysqld: Shutdown complete
141111 02:27:09 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 02:29:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 2:29:20 InnoDB: Initializing buffer pool, size = 8.0M
141111 2:29:20 InnoDB: Completed initialization of buffer pool
141111 2:29:21 InnoDB: Started; log sequence number 0 44233
141111 2:29:21 [Note] Event Scheduler: Loaded 0 events
141111 2:29:21 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
tail -25 /var/log/messages
Feb 28 16:31:55 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=29286 duration=0(sec)
Feb 28 16:32:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:06 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:32:26 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:26 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:32:46 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:32:46 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:05 linux64 nagios: wproc: Core Worker 29387: job 51453 (pid=29516) timed out. Killing it
Feb 28 16:33:05 linux64 nagios: wproc: CHECK job 51453 from worker Core Worker 29387 timed out after 60.01s
Feb 28 16:33:05 linux64 nagios: wproc: host=SCP_PROD_POP; service=CPU Stats;
Feb 28 16:33:05 linux64 nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Feb 28 16:33:05 linux64 nagios: wproc: stdout line 01: CHECK_NRPE: Socket timeout after 60 seconds.
Feb 28 16:33:05 linux64 nagios: Warning: Check of service 'CPU Stats' on host 'SCP_PROD_POP' timed out after 60.006s!
Feb 28 16:33:05 linux64 nagios: wproc: Core Worker 29387: job 51453 (pid=29516): Dormant child reaped
Feb 28 16:33:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:06 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:16 linux64 xinetd[1420]: START: nrpe pid=30667 from=::ffff:172.18.1.164
Feb 28 16:33:16 linux64 xinetd[30667]: FAIL: nrpe address from=::ffff:172.18.1.164
Feb 28 16:33:16 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=30667 duration=0(sec)
Feb 28 16:33:26 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:26 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:33:46 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:33:46 linux64 ndo2db: Warning: queue send error, retrying...
Feb 28 16:34:06 linux64 ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may neeed to be tuned. See README.
Feb 28 16:34:06 linux64 ndo2db: Warning: queue send error, retrying...
[NagiosX1:main.linux64 ~]#
Re: Nagios XI Caching Alert Data
Looks like you are hitting the linux kernel max queue limit. Please increase this limit:
http://support.nagios.com/wiki/index.ph ... 3.x_Issues
http://support.nagios.com/wiki/index.ph ... 3.x_Issues
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Nagios XI Caching Alert Data
Thanks, will have a look at that.
Re: Nagios XI Caching Alert Data
Let us know if this fixes your issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios XI Caching Alert Data
Still same issue 
Re: Nagios XI Caching Alert Data
It appeared to have cleared for a few minutes then returned to the same problem
[NagiosX1:main.linux64 ~]# tail -25 /var/log/mysqld.log
141111 1:41:16 [Note] /usr/libexec/mysqld: Shutdown complete
141111 01:41:16 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 01:46:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 1:46:11 InnoDB: Initializing buffer pool, size = 8.0M
141111 1:46:11 InnoDB: Completed initialization of buffer pool
141111 1:46:11 InnoDB: Started; log sequence number 0 44233
141111 1:46:11 [Note] Event Scheduler: Loaded 0 events
141111 1:46:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
141111 2:27:03 [Note] /usr/libexec/mysqld: Normal shutdown
141111 2:27:03 [Note] Event Scheduler: Purging the queue. 0 events
141111 2:27:05 InnoDB: Starting shutdown...
141111 2:27:09 InnoDB: Shutdown completed; log sequence number 0 44233
141111 2:27:09 [Note] /usr/libexec/mysqld: Shutdown complete
141111 02:27:09 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 02:29:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 2:29:20 InnoDB: Initializing buffer pool, size = 8.0M
141111 2:29:20 InnoDB: Completed initialization of buffer pool
141111 2:29:21 InnoDB: Started; log sequence number 0 44233
141111 2:29:21 [Note] Event Scheduler: Loaded 0 events
141111 2:29:21 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
[NagiosX1:main.linux64 ~]# tail -25 /var/log/messages
Mar 3 07:20:17 linux64 xinetd[1420]: START: nrpe pid=4673 from=::ffff:172.18.1.164
Mar 3 07:20:17 linux64 xinetd[4673]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:20:17 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=4673 duration=0(sec)
Mar 3 07:21:34 linux64 nagios: wproc: Core Worker 22792: job 533374 (pid=4914) timed out. Killing it
Mar 3 07:21:34 linux64 nagios: wproc: CHECK job 533374 from worker Core Worker 22792 timed out after 60.01s
Mar 3 07:21:34 linux64 nagios: wproc: host=SCP_PROD_POP; service=CPU Stats;
Mar 3 07:21:34 linux64 nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Mar 3 07:21:34 linux64 nagios: Warning: Check of service 'CPU Stats' on host 'SCP_PROD_POP' timed out after 60.006s!
Mar 3 07:21:34 linux64 nagios: wproc: Core Worker 22792: job 533374 (pid=4914): Dormant child reaped
Mar 3 07:21:38 linux64 xinetd[1420]: START: nrpe pid=6060 from=::ffff:172.18.1.164
Mar 3 07:21:38 linux64 xinetd[6060]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:21:38 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=6060 duration=0(sec)
Mar 3 07:22:50 linux64 xinetd[1420]: START: nrpe pid=7333 from=::ffff:172.18.1.164
Mar 3 07:22:50 linux64 xinetd[7333]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:22:50 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=7333 duration=0(sec)
Mar 3 07:22:50 linux64 nagios: HOST ALERT: EJ_Monitoring_Old;UP;HARD;1;OK - 172.18.1.200: rta 2.222ms, lost 0%
Mar 3 07:22:50 linux64 nagios: HOST ALERT: Barnsley_QA_DB;UP;HARD;1;OK - 172.18.1.226: rta 1.343ms, lost 0%
Mar 3 07:22:54 linux64 nagios: HOST ALERT: Barnsley_QA_App01;UP;HARD;1;OK - 172.18.1.227: rta 0.619ms, lost 0%
Mar 3 07:22:55 linux64 xinetd[1420]: START: nrpe pid=7641 from=::ffff:172.18.1.164
Mar 3 07:22:55 linux64 xinetd[7641]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:22:55 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=7641 duration=0(sec)
Mar 3 07:22:56 linux64 nagios: HOST ALERT: Selenium;UP;HARD;1;OK - 80.69.143.227: rta 0.438ms, lost 0%
Mar 3 07:23:03 linux64 nagios: SERVICE ALERT: Essex BI Prod;Logon Errors;OK;HARD;5;Login Errors since last reboot is 0
Mar 3 07:23:21 linux64 nagios: SERVICE ALERT: EJ_Monitoring_Old;Ping;OK;HARD;5;OK - 172.18.1.200: rta 0.230ms, lost 0%
Mar 3 07:23:21 linux64 nagios: SERVICE ALERT: Barnsley_QA_DB;Ping;OK;HARD;5;OK - 172.18.1.226: rta 0.244ms, lost 0%
[NagiosX1:main.linux64 ~]# tail -25 /var/log/mysqld.log
141111 1:41:16 [Note] /usr/libexec/mysqld: Shutdown complete
141111 01:41:16 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 01:46:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 1:46:11 InnoDB: Initializing buffer pool, size = 8.0M
141111 1:46:11 InnoDB: Completed initialization of buffer pool
141111 1:46:11 InnoDB: Started; log sequence number 0 44233
141111 1:46:11 [Note] Event Scheduler: Loaded 0 events
141111 1:46:11 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
141111 2:27:03 [Note] /usr/libexec/mysqld: Normal shutdown
141111 2:27:03 [Note] Event Scheduler: Purging the queue. 0 events
141111 2:27:05 InnoDB: Starting shutdown...
141111 2:27:09 InnoDB: Shutdown completed; log sequence number 0 44233
141111 2:27:09 [Note] /usr/libexec/mysqld: Shutdown complete
141111 02:27:09 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
141111 02:29:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
141111 2:29:20 InnoDB: Initializing buffer pool, size = 8.0M
141111 2:29:20 InnoDB: Completed initialization of buffer pool
141111 2:29:21 InnoDB: Started; log sequence number 0 44233
141111 2:29:21 [Note] Event Scheduler: Loaded 0 events
141111 2:29:21 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
[NagiosX1:main.linux64 ~]# tail -25 /var/log/messages
Mar 3 07:20:17 linux64 xinetd[1420]: START: nrpe pid=4673 from=::ffff:172.18.1.164
Mar 3 07:20:17 linux64 xinetd[4673]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:20:17 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=4673 duration=0(sec)
Mar 3 07:21:34 linux64 nagios: wproc: Core Worker 22792: job 533374 (pid=4914) timed out. Killing it
Mar 3 07:21:34 linux64 nagios: wproc: CHECK job 533374 from worker Core Worker 22792 timed out after 60.01s
Mar 3 07:21:34 linux64 nagios: wproc: host=SCP_PROD_POP; service=CPU Stats;
Mar 3 07:21:34 linux64 nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Mar 3 07:21:34 linux64 nagios: Warning: Check of service 'CPU Stats' on host 'SCP_PROD_POP' timed out after 60.006s!
Mar 3 07:21:34 linux64 nagios: wproc: Core Worker 22792: job 533374 (pid=4914): Dormant child reaped
Mar 3 07:21:38 linux64 xinetd[1420]: START: nrpe pid=6060 from=::ffff:172.18.1.164
Mar 3 07:21:38 linux64 xinetd[6060]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:21:38 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=6060 duration=0(sec)
Mar 3 07:22:50 linux64 xinetd[1420]: START: nrpe pid=7333 from=::ffff:172.18.1.164
Mar 3 07:22:50 linux64 xinetd[7333]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:22:50 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=7333 duration=0(sec)
Mar 3 07:22:50 linux64 nagios: HOST ALERT: EJ_Monitoring_Old;UP;HARD;1;OK - 172.18.1.200: rta 2.222ms, lost 0%
Mar 3 07:22:50 linux64 nagios: HOST ALERT: Barnsley_QA_DB;UP;HARD;1;OK - 172.18.1.226: rta 1.343ms, lost 0%
Mar 3 07:22:54 linux64 nagios: HOST ALERT: Barnsley_QA_App01;UP;HARD;1;OK - 172.18.1.227: rta 0.619ms, lost 0%
Mar 3 07:22:55 linux64 xinetd[1420]: START: nrpe pid=7641 from=::ffff:172.18.1.164
Mar 3 07:22:55 linux64 xinetd[7641]: FAIL: nrpe address from=::ffff:172.18.1.164
Mar 3 07:22:55 linux64 xinetd[1420]: EXIT: nrpe status=0 pid=7641 duration=0(sec)
Mar 3 07:22:56 linux64 nagios: HOST ALERT: Selenium;UP;HARD;1;OK - 80.69.143.227: rta 0.438ms, lost 0%
Mar 3 07:23:03 linux64 nagios: SERVICE ALERT: Essex BI Prod;Logon Errors;OK;HARD;5;Login Errors since last reboot is 0
Mar 3 07:23:21 linux64 nagios: SERVICE ALERT: EJ_Monitoring_Old;Ping;OK;HARD;5;OK - 172.18.1.200: rta 0.230ms, lost 0%
Mar 3 07:23:21 linux64 nagios: SERVICE ALERT: Barnsley_QA_DB;Ping;OK;HARD;5;OK - 172.18.1.226: rta 0.244ms, lost 0%
Re: Nagios XI Caching Alert Data
Could you post one of the configurations that you are having issues with?
Could you provide a screen capture of the following screen?
Click on "Admin" > "Monitoring Engine Status"
Could you provide a screen capture of the following screen?
Click on "Admin" > "Monitoring Engine Status"
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios XI Caching Alert Data
Not sure how you want the config posted but the host Barnsley_QA_App02 which keeps alerting even though it is disabled does not exist in /usr/local/nagios/etc/hosts on the server and there are likewise no services defined for it in /usr/local/nagios/etc/services but as you can see from my original post it keeps showing up.
You do not have the required permissions to view the files attached to this post.