With one process what's it using inter-process communication for then?jfrickson wrote:It does run in just one process, but it still uses IPC.rseiwert wrote:I'm wondering, if you set NDO2DEBUG directive and it does not daemonize, doesn't that stop it from using IPC and cause everything to run in one process?
NDO2DB Issue out of the blue
Re: NDO2DB Issue out of the blue
Grumpy Olde IT Guy
Re: NDO2DB Issue out of the blue
I do believe that's for receiving information from the nagios process, via ndomod (the NEB module that ships off data from nagios to ndo2db). I could be wrong, or we could be talking about different things. IPC being a broad term, are you referring to the kernel message queue?rseiwert wrote:With one process what's it using inter-process communication for then?
Former Nagios employee
Re: NDO2DB Issue out of the blue
it is doing it so often now I have resorted to putting in an event handler to restart ndo2db whenever it sees the queue numbers starting to climb
This definitely isn't good solution as we're losing all those messages.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: NDO2DB Issue out of the blue
I really wish I had a better answer for you, and trust me when I say we're all stressing over this one.
We've not heard back about a permanent fix, and each new patch we get only works part of the time or makes things worse. All I can think to ask is what has changed in your system? We definitely believe it to be related to a certain check's output, and I had suspected WMI in the past but disabling all WMI checks did not solve anything. If you can come up with a list of things in the last 2 weeks we can work off of that. I don't have much more unfortunately, and I hate having to give that answer
We've not heard back about a permanent fix, and each new patch we get only works part of the time or makes things worse. All I can think to ask is what has changed in your system? We definitely believe it to be related to a certain check's output, and I had suspected WMI in the past but disabling all WMI checks did not solve anything. If you can come up with a list of things in the last 2 weeks we can work off of that. I don't have much more unfortunately, and I hate having to give that answer
Former Nagios employee
Re: NDO2DB Issue out of the blue
Well...we make changes all over the place, but I guess I could sort the services by the ID column and see which ones may have been just added and go from there. Let me work on that and I'll update.tmcdonald wrote:I really wish I had a better answer for you, and trust me when I say we're all stressing over this one.
We've not heard back about a permanent fix, and each new patch we get only works part of the time or makes things worse. All I can think to ask is what has changed in your system? We definitely believe it to be related to a certain check's output, and I had suspected WMI in the past but disabling all WMI checks did not solve anything. If you can come up with a list of things in the last 2 weeks we can work off of that. I don't have much more unfortunately, and I hate having to give that answer
EDIT: Looked through the entire month of August additional services added - They are all already monitored on other hosts and nothing special about any of them, nothing I can think of that could be throwing anything odd
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: NDO2DB Issue out of the blue
When I said IPC I was referring to the System V InterProcess Communication System which is viewed with the ipcs command.tmcdonald wrote:IPC being a broad term, are you referring to the kernel message queue?
Grumpy Olde IT Guy
Re: NDO2DB Issue out of the blue
Looking through this thread I don't see any ndo2db.debug log
Did you set the following in /usr/local/nagios/etc/ndo2db.cfg?
If so what is in the ndo2db.debug file? Have you tried sorting this file by line length? What is in the longest line? Having experienced this problem for several weeks I would like to see this solved as well. For me I noticed the check data for another check in the one that was actually the one causing the problem. Have you noticed any checks that have some other checks data?
Did you set the following in /usr/local/nagios/etc/ndo2db.cfg?
Code: Select all
# DEBUG LEVEL
# This option determines how much (if any) debugging information will
# be written to the debug file. OR values together to log multiple
# types of information.
# Values: -1 = Everything
# 0 = Nothing
# 1 = Process info
# 2 = SQL queries
debug_level=-1
# DEBUG VERBOSITY
# This option determines how verbose the debug log out will be.
# Values: 0 = Brief output
# 1 = More detailed
# 2 = Very detailed
debug_verbosity=2
# DEBUG FILE
# This option determines where the daemon should write debugging information.
debug_file=/usr/local/nagios/var/ndo2db.debug
Grumpy Olde IT Guy
Re: NDO2DB Issue out of the blue
I have it doing the debug log now so I can check for what you asked. I'll let you know.
Interesting note: 3rd day in a row where it crashed multiple times in the morning, specifically at least once between 8:00 and 8:10am. The nice thing, I have event handler in to restart it now and then send me a SMS so I know it got restarted. Good band-aid for now, but yeah, this def needs fixed, no telling how much information we're losing.
Interesting note: 3rd day in a row where it crashed multiple times in the morning, specifically at least once between 8:00 and 8:10am. The nice thing, I have event handler in to restart it now and then send me a SMS so I know it got restarted. Good band-aid for now, but yeah, this def needs fixed, no telling how much information we're losing.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: NDO2DB Issue out of the blue
One of the thoughts was that there is a certain combination of message length, newline placement, and presence of a delimiter (= or : if I recall correctly) that causes ndo to enter an infinite loop parsing one specific message, though we have not narrowed down exactly what that combination is. You say that between 8:00 and 8:10 this happens, is that consistent? There may be a backup, scan, or other scheduled event running on a remote machine that causes a check to return output that matches those criteria. Can you think of anything on your monitored machines that would do this?
Former Nagios employee
Re: NDO2DB Issue out of the blue
We have lots of checks that use those delimeters, so yeah, could be and also plenty of long output as well. I was thinking of maybe doing this:tmcdonald wrote:One of the thoughts was that there is a certain combination of message length, newline placement, and presence of a delimiter (= or : if I recall correctly) that causes ndo to enter an infinite loop parsing one specific message, though we have not narrowed down exactly what that combination is. You say that between 8:00 and 8:10 this happens, is that consistent? There may be a backup, scan, or other scheduled event running on a remote machine that causes a check to return output that matches those criteria. Can you think of anything on your monitored machines that would do this?
Code: Select all
echo "
alter table nagios_servicestatus modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_hoststatus modify output varchar(65535) not null, modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_servicechecks modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
alter table nagios_hostchecks modify output varchar(65535) not null,modify long_output varchar(65535) not null,modify perfdata varchar(65535) not null;
" | mysql -pnagiosxi nagios2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github