I am seeing a lot of errors in /var/log/messages:
Jun 4 22:34:52 lonagiosxi ndo2db: Message sent to queue
Jun 4 22:35:00 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 4 22:35:01 lonagiosxi nagios: ndomod: Error writing to data sink! Some output may get lost...
Jun 4 22:35:01 lonagiosxi nagios: ndomod: Please check remote ndo2db log, database connection or SSL Parameters
Jun 4 22:35:05 lonagiosxi nagios: Caught SIGTERM, shutting down...
Jun 4 22:35:05 lonagiosxi nagios: Successfully shutdown... (PID=7097)
Jun 4 22:35:05 lonagiosxi nagios: ndomod: Shutdown complete.
Jun 4 22:35:05 lonagiosxi nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jun 4 22:35:06 lonagiosxi nagios: Nagios 3.4.1 starting... (PID=10664)
Jun 4 22:35:06 lonagiosxi nagios: Local time is Mon Jun 04 22:35:06 PDT 2012
Jun 4 22:35:06 lonagiosxi nagios: LOG VERSION: 2.0
Jun 4 22:35:06 lonagiosxi nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jun 4 22:35:06 lonagiosxi nagios: ndomod: Successfully connected to data sink. 69 queued items to flush.
Jun 4 22:35:06 lonagiosxi nagios: ndomod: Successfully flushed 69 queued items to data sink.
Jun 4 22:35:07 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 4 22:35:07 lonagiosxi nagios: Finished daemonizing... (New PID=10669)
Jun 4 22:35:08 lonagiosxi ndo2db: Message sent to queue
Jun 4 22:35:08 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 4 22:35:09 lonagiosxi ndo2db: Message sent to queue
Jun 4 22:35:28 lonagiosxi ndo2db: Message sent to queue
Jun 4 22:35:28 lonagiosxi nagios: Caught SIGTERM, shutting down...
Jun 4 22:35:28 lonagiosxi nagios: Successfully shutdown... (PID=10669)
Jun 4 22:35:28 lonagiosxi nagios: ndomod: Shutdown complete.
Jun 4 22:35:29 lonagiosxi nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jun 4 22:35:29 lonagiosxi nagios: Nagios 3.4.1 starting... (PID=10986)
Jun 4 22:35:29 lonagiosxi nagios: Local time is Mon Jun 04 22:35:29 PDT 2012
Jun 4 22:35:29 lonagiosxi nagios: LOG VERSION: 2.0
Jun 4 22:35:29 lonagiosxi nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jun 4 22:35:29 lonagiosxi nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jun 4 22:35:29 lonagiosxi nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Edit: Added more from log file:
Jun 6 13:51:02 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 6 13:51:03 lonagiosxi ndo2db: Message sent to queue
Jun 6 13:51:03 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 6 13:51:04 lonagiosxi ndo2db: Message sent to queue
Jun 6 13:51:05 lonagiosxi xinetd[29328]: FAIL: nrpe address from=10.2.1.116
Jun 6 13:51:05 lonagiosxi xinetd[2525]: START: nrpe pid=29328 from=10.2.1.116
Jun 6 13:51:05 lonagiosxi xinetd[2525]: EXIT: nrpe status=0 pid=29328 duration=0(sec)
Jun 6 13:51:21 lonagiosxi ndo2db: Error: queue send error, retrying...
I'm seeing a lot of queue send errors apparently from ndo2db.
I have searched the forums and see some mentions of this problem, but I'm not familiar enough
with nagios yet to know what to look for. I have repaired the nagios database in mysql, shutdown
and restarted nagios and ndo2db but the errors still appear.
Any clue here?
Thanks.
ndo2db Errors.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: ndo2db Errors.
I believe this was one of the items that was fixed in the new 2011R3.0 release.
http://library.nagios.com/library/produ ... -nagios-xi
http://library.nagios.com/library/produ ... -nagios-xi
Re: ndo2db Errors.
We are on the latest:
Nagios XI 2011R3.0 Copyright © 2008-2012 Nagios Enterprises, LLC.
Clicking on Check for Updates:
Up To Date
Your installation of Nagios XI (2011R3.0) is up-to-date, so no upgrade is required. The latest version of Nagios XI is 2011R3.0, which was released on 2012-06-04.
Thanks,
Keith
Nagios XI 2011R3.0 Copyright © 2008-2012 Nagios Enterprises, LLC.
Clicking on Check for Updates:
Up To Date
Your installation of Nagios XI (2011R3.0) is up-to-date, so no upgrade is required. The latest version of Nagios XI is 2011R3.0, which was released on 2012-06-04.
Thanks,
Keith
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: ndo2db Errors.
Actually looking at the logs, you seem to only be getting the cannot connect to data sink right when nagios is restarting. this would be normal behavior, then once it is up and running you get the following
this is all expected behavior.
If you were getting a lot of unable to connect to datasink without the Successfully connected, that would be something to worry about.
Code: Select all
Jun 4 22:35:06 lonagiosxi nagios: ndomod: Successfully connected to data sink. 69 queued items to flush.
Jun 4 22:35:06 lonagiosxi nagios: ndomod: Successfully flushed 69 queued items to data sink.If you were getting a lot of unable to connect to datasink without the Successfully connected, that would be something to worry about.
Re: ndo2db Errors.
tail -5 /var/log/messages:scottwilkerson wrote:Actually looking at the logs, you seem to only be getting the cannot connect to data sink right when nagios is restarting. this would be normal behavior, then once it is up and running you get the followingthis is all expected behavior.Code: Select all
Jun 4 22:35:06 lonagiosxi nagios: ndomod: Successfully connected to data sink. 69 queued items to flush. Jun 4 22:35:06 lonagiosxi nagios: ndomod: Successfully flushed 69 queued items to data sink.
If you were getting a lot of unable to connect to datasink without the Successfully connected, that would be something to worry about.
Jun 6 14:49:28 lonagiosxi ndo2db: Message sent to queue
Jun 6 14:49:47 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 6 14:49:48 lonagiosxi ndo2db: Message sent to queue
Jun 6 14:50:17 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 6 14:50:18 lonagiosxi ndo2db: Message sent to queue
They just keep coming. I can tail -f /var/log/messages and just watch these errors appended.
Then when you try and stop sometimes I see this error:
service ndo2db stop
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done
When a stop does happen:
Jun 6 14:51:28 lonagiosxi nagios: ndomod: Error writing to data sink! Some output may get lost...
Jun 6 14:51:28 lonagiosxi nagios: ndomod: Please check remote ndo2db log, database connection or SSL Parameters
But then after a restart:
Jun 6 14:54:59 lonagiosxi nagios: ndomod: Successfully reconnected to data sink! 0 items lost, 323 queued items to flush.
Jun 6 14:54:59 lonagiosxi ndo2db: Error: queue send error, retrying...
Jun 6 14:55:00 lonagiosxi ndo2db: Message sent to queue
Jun 6 14:55:00 lonagiosxi nagios: ndomod: Successfully flushed 323 queued items to data sink.
But the only way things get flushed is after a stop and a start of ndo2db.
Otherwise I just seethe queue send errors pop up.
Thanks.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: ndo2db Errors.
You will need to increase the kernel message queue parameters for your system. I do not know the 'optimal' parameters for your system. But in my case I increased my parameters substantially (in /etc/sysctl.conf) to:
kernel.msgmax = 131072000
kernel.msgmnb = 131072000
After updating this in the conf file, run:
/sbin/sysctl -p
kernel.msgmax = 131072000
kernel.msgmnb = 131072000
After updating this in the conf file, run:
/sbin/sysctl -p
Re: ndo2db Errors.
I checked this yesterday about 3pm and those parameters were already set like the above.scottwilkerson wrote:You will need to increase the kernel message queue parameters for your system. I do not know the 'optimal' parameters for your system. But in my case I increased my parameters substantially (in /etc/sysctl.conf) to:
kernel.msgmax = 131072000
kernel.msgmnb = 131072000
After updating this in the conf file, run:
/sbin/sysctl -p
I ran sysctl -p anyway and after about 30 mins of tailing the messages log I saw that those
errors had gone away.
Checking this morning I don't see those log error messages any longer.
More RAM was added to the system the day before I noticed these log messages showing up,
would that have anything to do with it? I don't see how it would though.
Ive been 99% running BSD type machines the last ten+ years so I'm still 'learning' linux and
the differences between the two.
Thanks.
Re: ndo2db Errors.
Go ahead and keep an eye on that log, if you see any of those errors reappearing we'll dive into this further.
Re: ndo2db Errors.
I was having the same issue and made the suggested changes. It took a restart of ndo2db for the messages to stop (I also restarted nagios, nagiosxi, npcd and httpd for good measure). Thanks!