Engine hosed after network disconnect
Re: Engine hosed after network disconnect
... Hey guys!
NDO is resilient enough to reconnect the data sink once the db is back. The issue here may be how long it is down. Checkresults will backup as they cannot be written to the historical db, and if the db is down long enough, the engine may have an uphill battle between new checks and cached old checks. It could take a considerable amount of time for everything to normalize if the outage was of a decently long duration (or the number of checks per 5 minutes is very high)..
NDO is resilient enough to reconnect the data sink once the db is back. The issue here may be how long it is down. Checkresults will backup as they cannot be written to the historical db, and if the db is down long enough, the engine may have an uphill battle between new checks and cached old checks. It could take a considerable amount of time for everything to normalize if the outage was of a decently long duration (or the number of checks per 5 minutes is very high)..
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Engine hosed after network disconnect
Where might I find indicators of such a backlog? Any known error messages and from which log files?
Andrew J. - Do you even grok?
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Engine hosed after network disconnect
I would look in the following logs:
You may also find some useful information from what happened when the outage hit your network in:
You can also find the nagios log archives in the above directory.
In addition, do you have logging set up on the remote mysql server? At the least a log showing when mysqld is initiated? Possibly the syslog?
I'd also check to see if you have an inordinate amount of temp check* files in /tmp/. Or in:
Code: Select all
/var/log/messagesCode: Select all
/var/log/mysqld.logCode: Select all
/usr/local/nagios/var/nagios.logIn addition, do you have logging set up on the remote mysql server? At the least a log showing when mysqld is initiated? Possibly the syslog?
I'd also check to see if you have an inordinate amount of temp check* files in /tmp/. Or in:
Code: Select all
/usr/local/nagios/var/spool/checkresults/Re: Engine hosed after network disconnect
Well, we just found an issue yesterday where logging was not rotating. /var/log/messages is about 146MB
Andrew J. - Do you even grok?
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Engine hosed after network disconnect
Dear Lord that is a lot. Can you run the following and share the output:
Code: Select all
cat /var/lib/logrotate.status | grep messages-
sreinhardt
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Engine hosed after network disconnect
So that 146mb after rotation and within the two days or prior to rotation and some extended period of time? As you said, thats a pretty big log! Hopefully clearing that up some should let you see ndo messages just a bit more clearly. 
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Engine hosed after network disconnect
Log still hadn't rotated this morning, LInux admin tells me issued a rotate command but it said it wasn't necessary. So he forced it.
Rotate schedule was set to weekly. we changed to daily. Will know more tomorrow.
On another note, we had another stab at a big L3 network change last night and the engine did not fall over. Only difference seems to be that I rebooted the box yesterday and it had very little buffer/cache used on the box. Usually it's taking up 19GB in cache on a box with 24GB physical mem.
Rotate schedule was set to weekly. we changed to daily. Will know more tomorrow.
On another note, we had another stab at a big L3 network change last night and the engine did not fall over. Only difference seems to be that I rebooted the box yesterday and it had very little buffer/cache used on the box. Usually it's taking up 19GB in cache on a box with 24GB physical mem.
Andrew J. - Do you even grok?
Re: Engine hosed after network disconnect
I haven't noticed any big upstream kernel bugs with caching recently. I guess we will not have a good test until ram fills up with disk cache again. Let us know what happened with the log rotation today.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Engine hosed after network disconnect
OK. Log rotation is good at 24hrs now.
Andrew J. - Do you even grok?