NDO2DB Issue out of the blue

rseiwert · Post by **rseiwert** » Tue Sep 15, 2015 7:19 pm

Just noticed the yet another patch. I will try that.

I came back to this thread since I pulled this exact scenario naturally today. After having a SAN meltdown (again!) I thought I had everything back to normal. I did have an alert from Pnp RRD Health that my sharepoint Application_Log.rrd: found extra data on update argument: 0:0:0:0. This was odd. When I went to look at at the application log check I saw that it had a disk check results. Looking in the actual application log via event viewer there were thousands of error messages, several a min, about it having a database issue. Yet another green check on what should have been a critical result.

rseiwert · Post by **rseiwert** » Wed Sep 16, 2015 10:26 am

Forced the event log check to return to much data and immediately started getting results from other checks in the application log check. The debug.log is attached. The Hostname is SQL2k8, the Check Description is Application Log and the actual check is check_wmi_plus checkevents.

ndo2db.debug.gz

jfrickson · Post by **jfrickson** » Wed Sep 16, 2015 10:31 am

rseiwert wrote:Forced the event log check to return to much data and immediately started getting results from other checks in the application log check. The debug.log is attached. The Hostname is SQL2k8, the Check Description is Application Log and the actual check is check_wmi_plus checkevents.

Great, thanks! I'll take a look at it.

jfrickson · Post by **jfrickson** » Wed Sep 16, 2015 10:41 am

Edit: Oops, looked at the wrong thing. The messages I'm looking for are in there.

rseiwert wrote:The debug.log is attached.

Hmm, there's only INSERT statements in there. Did you update the config file?

jfrickson · Post by **jfrickson** » Wed Sep 16, 2015 2:27 pm

The good news is that the buffers and messages look like they're all being processed correctly. I looked through the log and everything looks like it worked ok.

The bad news is that after the HUGE messages, there were only 9 additional messages, none of which appeared to be a failure status.

rseiwert · Post by **rseiwert** » Wed Sep 16, 2015 4:37 pm

Yep, it appears the failure is upstream. This is so much better than the message queues crashing and Nagios not updating. Below is an application log check from a domain controller cut from the ndo2db.debuf and it has the results from a ping check. The application log check should have been critical and not OK. I did update the ndo2db.conf as instructed to log everything verbosely.

Code: Select all

213:
1=1202
2=0
3=0
4=1442416432.828432
53=DC1
114=Application Log
95=OK - 10.1.5.151: rta 0.449ms, lost 0%
125=
99=rta=0.449ms;3000.000;5000.000;0; pl=0%;80;100;; rtmax=0.800ms;;;; rtmin=0.250ms;;;;

jfrickson · Post by **jfrickson** » Thu Sep 17, 2015 11:29 am

rseiwert wrote:Below is an application log check from a domain controller cut from the ndo2db.debuf and it has the results from a ping check.

Yup, it's definitely bad coming in off the IPC queue. So it's either being sent to the socket corrupted, or the function that reads from the socket and writes it to the queue is messing it up.

The attached patch adds logging of the data coming in over the socket and being written to the IPC queue. Please run this when you have a chance and send me the output.

Nagios Support Forum

NDO2DB Issue out of the blue

Re: NDO2DB Issue out of the blue

Re: NDO2DB Issue out of the blue

Re: NDO2DB Issue out of the blue

Re: NDO2DB Issue out of the blue

Re: NDO2DB Issue out of the blue

Re: NDO2DB Issue out of the blue

Re: NDO2DB Issue out of the blue