Engine hosed after network disconnect

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Engine hosed after network disconnect

Post by abrist »

... Hey guys!

NDO is resilient enough to reconnect the data sink once the db is back. The issue here may be how long it is down. Checkresults will backup as they cannot be written to the historical db, and if the db is down long enough, the engine may have an uphill battle between new checks and cached old checks. It could take a considerable amount of time for everything to normalize if the outage was of a decently long duration (or the number of checks per 5 minutes is very high)..
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Engine hosed after network disconnect

Post by vAJ »

Where might I find indicators of such a backlog? Any known error messages and from which log files?
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Engine hosed after network disconnect

Post by slansing »

I would look in the following logs:

Code: Select all

/var/log/messages

Code: Select all

/var/log/mysqld.log
You may also find some useful information from what happened when the outage hit your network in:

Code: Select all

/usr/local/nagios/var/nagios.log
You can also find the nagios log archives in the above directory.

In addition, do you have logging set up on the remote mysql server? At the least a log showing when mysqld is initiated? Possibly the syslog?

I'd also check to see if you have an inordinate amount of temp check* files in /tmp/. Or in:

Code: Select all

/usr/local/nagios/var/spool/checkresults/
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Engine hosed after network disconnect

Post by vAJ »

Well, we just found an issue yesterday where logging was not rotating. /var/log/messages is about 146MB
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Engine hosed after network disconnect

Post by slansing »

Dear Lord that is a lot. Can you run the following and share the output:

Code: Select all

cat /var/lib/logrotate.status | grep messages
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Engine hosed after network disconnect

Post by vAJ »

2013-10-6
Andrew J. - Do you even grok?
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Engine hosed after network disconnect

Post by sreinhardt »

So that 146mb after rotation and within the two days or prior to rotation and some extended period of time? As you said, thats a pretty big log! Hopefully clearing that up some should let you see ndo messages just a bit more clearly. :D
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Engine hosed after network disconnect

Post by vAJ »

Log still hadn't rotated this morning, LInux admin tells me issued a rotate command but it said it wasn't necessary. So he forced it.

Rotate schedule was set to weekly. we changed to daily. Will know more tomorrow.

On another note, we had another stab at a big L3 network change last night and the engine did not fall over. Only difference seems to be that I rebooted the box yesterday and it had very little buffer/cache used on the box. Usually it's taking up 19GB in cache on a box with 24GB physical mem.
Andrew J. - Do you even grok?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Engine hosed after network disconnect

Post by abrist »

I haven't noticed any big upstream kernel bugs with caching recently. I guess we will not have a good test until ram fills up with disk cache again. Let us know what happened with the log rotation today.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: Engine hosed after network disconnect

Post by vAJ »

OK. Log rotation is good at 24hrs now.
Andrew J. - Do you even grok?
Locked