Just noticed the yet another patch. I will try that.
I came back to this thread since I pulled this exact scenario naturally today. After having a SAN meltdown (again!) I thought I had everything back to normal. I did have an alert from Pnp RRD Health that my sharepoint Application_Log.rrd: found extra data on update argument: 0:0:0:0. This was odd. When I went to look at at the application log check I saw that it had a disk check results. Looking in the actual application log via event viewer there were thousands of error messages, several a min, about it having a database issue. Yet another green check on what should have been a critical result.
NDO2DB Issue out of the blue
Re: NDO2DB Issue out of the blue
Grumpy Olde IT Guy
Re: NDO2DB Issue out of the blue
Forced the event log check to return to much data and immediately started getting results from other checks in the application log check. The debug.log is attached. The Hostname is SQL2k8, the Check Description is Application Log and the actual check is check_wmi_plus checkevents.
You do not have the required permissions to view the files attached to this post.
Grumpy Olde IT Guy
-
jfrickson
Re: NDO2DB Issue out of the blue
Great, thanks! I'll take a look at it.rseiwert wrote:Forced the event log check to return to much data and immediately started getting results from other checks in the application log check. The debug.log is attached. The Hostname is SQL2k8, the Check Description is Application Log and the actual check is check_wmi_plus checkevents.
-
jfrickson
Re: NDO2DB Issue out of the blue
Edit: Oops, looked at the wrong thing. The messages I'm looking for are in there.
Hmm, there's only INSERT statements in there. Did you update the config file?rseiwert wrote:The debug.log is attached.
-
jfrickson
Re: NDO2DB Issue out of the blue
The good news is that the buffers and messages look like they're all being processed correctly. I looked through the log and everything looks like it worked ok.
The bad news is that after the HUGE messages, there were only 9 additional messages, none of which appeared to be a failure status.
The bad news is that after the HUGE messages, there were only 9 additional messages, none of which appeared to be a failure status.
Re: NDO2DB Issue out of the blue
Yep, it appears the failure is upstream. This is so much better than the message queues crashing and Nagios not updating. Below is an application log check from a domain controller cut from the ndo2db.debuf and it has the results from a ping check. The application log check should have been critical and not OK. I did update the ndo2db.conf as instructed to log everything verbosely.
Code: Select all
213:
1=1202
2=0
3=0
4=1442416432.828432
53=DC1
114=Application Log
95=OK - 10.1.5.151: rta 0.449ms, lost 0%
125=
99=rta=0.449ms;3000.000;5000.000;0; pl=0%;80;100;; rtmax=0.800ms;;;; rtmin=0.250ms;;;;
Grumpy Olde IT Guy
-
jfrickson
Re: NDO2DB Issue out of the blue
Yup, it's definitely bad coming in off the IPC queue. So it's either being sent to the socket corrupted, or the function that reads from the socket and writes it to the queue is messing it up.rseiwert wrote:Below is an application log check from a domain controller cut from the ndo2db.debuf and it has the results from a ping check.
The attached patch adds logging of the data coming in over the socket and being written to the IPC queue. Please run this when you have a chance and send me the output.
You do not have the required permissions to view the files attached to this post.