Nagios Plugin: Monitoring Windows Clock Sync.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
BiFo
Posts: 8
Joined: Fri Oct 15, 2010 9:34 am

Nagios Plugin: Monitoring Windows Clock Sync.

Post by BiFo »

Hi,
Im running Nagios Core Version 3.2.3 on a Red Hat Enterprise Linux Server release 5.4 (Tikanga).
I created a script (VBScript) to monitor Window's date and time. As i understand, the Nagios Plugins exit status are the following:
0 = OK
1 = Warning
2 = Critical
3 = Unknown

My script never return 3 as exit status, it only return 0, 1 or 2 (depending on the result of it, cause it checks if its syncronized against another server).

The scripts works fine all the time, however, in random moments (approximately once every 3 or 4 hours) nagios report me things like this:

Code: Select all

[06-10-2011 04:11:00] SERVICE ALERT: MyServer;Win Clock;OK;SOFT;2;NTP OK: Offset 0 secs.
[06-10-2011 04:10:00] SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1;NTP OK: Offset 0 secs.
The code of my script:

Code: Select all

Option Explicit
On Error Resume Next
Dim strCommand, objProc, objShell, input, strOutput, myRegExp, myMatches, myMatch, Status, result, Args, warn, crit, serverlist, criteria, i
Set Args = WScript.Arguments
criteria="least"
i=0
Err = 0

' Funcion que captura los argumentos.
Function CaptureArguments()
	for i=0 to WScript.Arguments.Count-1
		Select Case Args.Item(i)
			Case "-H" , "-h", "--help", "--HELP", "--Help", "/?", "/Help", "/HELP", "/help"
				ShowHelp
				Salir
			Case "-R", "-r", "--readme", "--README", "--ReadMe", "/r", "/R", "/readme", "/README", "/ReadMe"
				ShowReadMe
				Salir
			Case "-V", "-v", "--version", "--VERSION", "--Version", "/V", "/v", "/Version", "/VERSION", "/version"
				ShowVersion
				Salir
			Case "-S", "-s", "--serverlist", "--SERVERLIST", "--ServerList", "/S", "/s", "/ServerList", "/SERVERLIST", "/ServerList"
				CheckNextParameter
				serverlist = Args.Item(i+1)
				i=i+1
			Case "-W", "-w", "--warn", "--WARN", "--Warn", "--warning", "--WARNING", "--Warning", "/W", "/w", "/warn", "/WARN", "/Warn", "/warning", "/WARNING", "/Warning"
				CheckNextParameter
				warn = Args.Item(i+1)
				i=i+1
			Case "-C", "-c", "--crit", "--CRIT", "--Crit", "--critical", "--CRITICAL", "--Critical", "/C", "/c", "/crit", "/CRIT", "/Crit", "/critical", "/CRITICAL", "/Critical"
				CheckNextParameter
				crit = Args.Item(i+1)
				i=i+1
			Case "-B", "-b", "/B", "/b"
				criteria="biggest"
			Case Else
				wscript.echo "ERROR: Se ingreso un argumento Desconocido o Inesperado. Utilice el Help (/H) o ReadMe (/R)."
				Err = 1
				Salir
		End Select
	Next
End Function

' Funcion que verifica la correctitud del proximo parametro utilizado.
Function CheckNextParameter()
	' Existe el Argumento para el Parametro?
	if (i+1 > WScript.Arguments.Count-1) then
		' No, muestro error y salgo.
		WScript.echo "ERROR: Falta argumento en parametro " & Args.Item(i) & "."
		Err = 1
		Salir
	end if
	' El argumento del parametro contiene un "-" o un "/" (indica el inicio de un nuevo parametro)?
	if instr(Args.Item(i+1),"-") OR instr(Args.Item(i+1),"/") then
		' Si, muestro error y salgo.
		WScript.echo "ERROR: El argumento " & Args.Item(i+1) & " es invalido para el parametro " & Args.Item(i) & "."
		Err = 1
		Salir
	end if
End Function

' Funcion que verifica la correctitud de los Argumentos.
Function CheckArguments()
	' Esta definido el parametro warn?
	If IsNull(warn) OR (warn = "") Then
		WScript.echo "ERROR: El parametro warn (/W) no esta definido."
		Err = 1
		Salir	
	End If
	' Esta definido el parametro crit?
	If IsNull(crit) OR (crit = "") Then
		WScript.echo "ERROR: El parametro crit (/C) no esta definido."
		Err = 1
		Salir
	End If
	' Esta definido el parametro serverlist?
	If IsNull(serverlist) OR (serverlist = "")  Then
		WScript.echo "ERROR: El parametro serverlist (/S) no esta definido."
		Err = 1
		Salir
	End If
	' El argumento warn es numericos?
	If IsNumeric(warn) Then
		' Si, El argumento warn es un numero entero (verifico si contiene un ".") mayor que 0?
		If (not instr(warn,".") = 0) OR (warn <= 0) Then
			' No, muestro error y salgo.
			WScript.Echo "ERROR: El argumento warn (" & warn & ") no es un numero entero mayor a 0."
			Err = 1
			Salir
		Else
			' Si, lo convierto a Double.
			warn=CDbl(warn)
		End If
	Else
		' No, muestro error y salgo.
		WScript.Echo "ERROR: El argumento warn (" & warn & ") no es numerico."
		Err = 1
		Salir
	End if
	' El argumento crit es numericos?
	If IsNumeric(crit) Then
		' Si, El argumento crit es un numero entero (verifico si contiene un ".") mayor que 0?
		If (not instr(crit,".") = 0) OR (crit <= 0) Then
			' No, muestro error y salgo.
			WScript.Echo "ERROR: El argumento crit (" & crit & ") no es un numero entero mayor a 0."
			Err = 1
			Salir
		Else
			' Si, lo convierto a Double.
			crit=CDbl(crit)    
		End If
	Else
		' No, muestro error y salgo.
		WScript.Echo "ERROR: El argumento crit (" & crit & ") no es numerico."
		Err = 1
		Salir
	End if
	' El argumento crit es mayor que el argumento warn?
	If crit < warn then
		' No, muestro error y salgo.
		WScript.Echo "ERROR: El argumento crit (" & crit & ") es menor que el argumento warn (" & warn & ")"
		Err = 1
		Salir
	End if
End Function

' Funcion que muestra el HELP.
Function ShowHelp()
	Wscript.Echo ""
	Wscript.Echo "Parametros		Descripcion"
	Wscript.Echo "/S serverlist	 	Uno o varios servidores separados por coma."
	Wscript.Echo "/W warn 		Warning offset en segundos."
	Wscript.Echo "/C crit 		Critical offset en segundos."
	Wscript.Echo "/B			Si varios servidores NTPd son especificados, se utilizara el de mayor offset." 
	Wscript.Echo ""
End Function

' Funcion que muestra la ultima version.
Function ShowVersion()
	WScript.Echo "*****"
	WScript.Echo " Modificado por: Fabian Olender"
	WScript.Echo " Version: 1.2"
	WScript.Echo " Fecha: 13/01/2011"
	WScript.Echo " Descripcion: Traduccion al Español y arreglo de BUGs en casos que no funcionaba."
	WScript.Echo "*****"
End Function

' Funcion que muestra el ReadMe.
Function ShowReadMe()
	WScript.Echo ""
	WScript.Echo " ----- "
	WScript.Echo "| Uso |"
	WScript.Echo " ----- "
	WScript.Echo "Uso: "
	WScript.Echo "	cscript /NoLogo check_time.vbs serverlist warn crit [biggest]"
	WScript.Echo "Ejemplos de Uso: "
	Wscript.Echo "	cscript /NoLogo check_time.vbs Server1,Server2 0.4 5 biggest"
	Wscript.Echo "	cscript /NoLogo check_time.vbs www.xxx.yyy.zzz 10 120"
	Wscript.Echo "	cscript /T:30 /NoLogo check_time.vbs TuDominio.net 120 300"
	WScript.Echo ""
	WScript.Echo " ------------ "
	WScript.Echo "| Parametros |"
	WScript.Echo " ------------ "
	Wscript.Echo ""
	Wscript.Echo "-H , -h, --help, --HELP, --Help, /?, /Help, /HELP, /help"
	Wscript.Echo "	Default: Sin Asignar"
	Wscript.Echo "	Descripcion: Muestra el Help del Script para conocer su funcionamiento."
	Wscript.Echo ""
	Wscript.Echo "-R, -r, --readme, --README, --ReadMe, /r, /R, /readme, /README, /ReadMe"
	Wscript.Echo "	Default: Sin Asignar"
	Wscript.Echo "	Descripcion: Muestra el ReadMe que se esta viendo ahora."
	Wscript.Echo ""
	Wscript.Echo "-V, -v, --version, --VERSION, --Version, /V, /v, /Version, /VERSION, /version"
	Wscript.Echo "	Default: Sin Asignar"
	Wscript.Echo "	Descripcion: Muestra la version actual del Script."
	Wscript.Echo "-S, -s, --serverlist, --SERVERLIST, --ServerList, /S, /s, /ServerList, /SERVERLIST, /ServerList"
	Wscript.Echo ""
	Wscript.Echo "	Default: Sin Asignar. Requerido para la ejecucion del Script."
	Wscript.Echo "	Descripcion: Listado de servidores con NTPd."
	Wscript.Echo ""
	Wscript.Echo "-W, -w, --warn, --WARN, --Warn, --warning, --WARNING, --Warning, /W, /w, /warn, /WARN, /Warn, /warning, /WARNING, /Warning"
	Wscript.Echo "	Default: Sin Asignar. Requerido para la ejecucion del Script."
	Wscript.Echo "	Descripcion: Tiempo en Segundos a partir de generar Warning de Dessincronizacion."
	Wscript.Echo "	Requerimientos: Valor Entero mayor que 0."
	Wscript.Echo ""
	Wscript.Echo "-C, -c, --crit, --CRIT, --Crit, --critical, --CRITICAL, --Critical, /C, /c, /crit, /CRIT, /Crit, /critical, /CRITICAL, /Critical"
	Wscript.Echo "	Default: Sin Asignar. Requerido para la ejecucion del Script."
	Wscript.Echo "	Descripcion: Tiempo en Segundos a partir de generar Warning de Dessincronizacion."
	Wscript.Echo "	Requerimientos: Valor Entero mayor que 0."
	Wscript.Echo ""
	Wscript.Echo "-B, -b, /B, /b"
	Wscript.Echo "	Default: Desactivado. Se utiliza por defecto el de Mayor Offset. Parametro Opcional"
	Wscript.Echo "	Descripcion: Si varios servidores NTPd son especificados, se utilizara el de mayor offset."
	Wscript.Echo ""
	WScript.Echo " ----------- "
	WScript.Echo "| Versiones |"
	WScript.Echo " ----------- "
	WScript.Echo " Modificado por: Freddy"
	WScript.Echo " Version: 1.1"
	WScript.Echo " Fecha: 02/12/2010"
	WScript.Echo " Descripcion: Modificacion para mejorar performance de pnp4nagios addon y sugerencias para arreglar ciertos casos de mal funcionamiento."
	WScript.Echo ""
	WScript.Echo " Modificado por: Dmitry Vayntrub (dvayntrub@yahoo.com)"
	WScript.Echo " Version: 1.01"
	WScript.Echo " Fecha: 17/01/2010"
	WScript.Echo " Descripcion: Modificacion general y renombre del script."
	WScript.Echo ""
	WScript.Echo " Creado por: Mattias Ryrlén (check_ad_time.vbs)"
	WScript.Echo " Version: 1.0"
	WScript.Echo " Fecha: -"
	WScript.Echo " Descripcion: Verifica el time offset de un cliente Windows contra uno o multiples servidor NTPd."
	WScript.Echo ""
End Function

' Funcion que ejecuta el comando W32TM y parsea el Output.
Function W32TM()
	' Ejecuto el comando w32time.exe
	Set objShell = CreateObject("Wscript.Shell")
	strCommand = "%SystemRoot%\System32\w32tm.exe /monitor /computers:" & serverlist
	set objProc = objShell.Exec(strCommand)
	' Parseo el Output para obtener unicamente el NTP offset.
	input = ""
	strOutput = ""
	Do While Not objProc.StdOut.AtEndOfStream
		input = objProc.StdOut.ReadLine
		If InStr(input, "NTP") Then
			strOutput = strOutput & input
		End If
	Loop
	Set myRegExp = New RegExp
	myRegExp.IgnoreCase = True
	myRegExp.Global = True
	myRegExp.Pattern = " NTP: ([+-][0-9]+\.[0-9]+)s"
	Set myMatches = myRegExp.Execute(strOutput)
	result = ""
	If myMatches(0).SubMatches(0) <> "" Then
		result = myMatches(0).SubMatches(0)
	End If
	For Each myMatch in myMatches
		If myMatch.SubMatches(0) <> "" Then
			If criteria = "biggest" Then
				If abs(result) < Abs(myMatch.SubMatches(0)) Then
					result = myMatch.SubMatches(0)
				End If
			Else
				If abs(result) > Abs(myMatch.SubMatches(0)) Then
					result = myMatch.SubMatches(0)
				End If
			End If
		End If
	'	Wscript.Echo myMatch.SubMatches(0) & " -debug"
	Next
End Function

' Funcion que muestra el resultado del Script con el formato adecuado.
Function ShowResult()
	' Quito lo que esta despues del ".", quito el signo ("+" o "-") y obtengo el numero.
	result=Left(result,instr(result,".")-1)
	result=Right(result,Len(result)-1)
	result=CDbl(result)
	If (result > 0 AND result > crit ) OR (result < 0 AND result < crit) Then
		Err = 2
		status = "CRITICAL"
	Else
		If (warn > 0 AND result > warn ) OR (warn < 0 AND result < warn) Then
			Err = 1
			status = "WARNING"
		else
			Err = 0
			status = "OK"		
		End If
	End If
	WScript.Echo "NTP " & status & ": Offset " & result & " secs."
	Salir
End Function

Function Salir()
	WScript.Quit(Err)
End Function

' ########
' # Main #
' ########
CaptureArguments
CheckArguments
W32TM
ShowResult
Salir
How to use it with command line (DOS):

Code: Select all

cscript.exe //T:30 //NoLogo check_time.vbs /S www.xxx.yyy.zzz /W 120 /C 300
 * www.xxx.yyy.zzz = NTP Server
"NSC.ini":

Code: Select all

...
[External Scripts]
check_windows_time=cscript.exe //T:30 //NoLogo check_time.vbs /S 10.45.225.11 /W 120 /C 300
...
;[LUA Scripts]
command[check_windows_time]=cscript.exe //T:30 //NoLogo check_time.vbs /S 10.45.225.11 /W 120 /C 300
...
Service definition in Nagios Server:

Code: Select all

define service{
           use                  generic-service
           host_name            MyServer
           service_description  Win Clock
           check_command        check_nrpe!check_windows_time
}
I've set "nagios.cfg" to the max debug level ("debug_level=-1" and "debug_verbosity=2") and what i've saw in "nagios.debug" for the "Win Clock" service i've defined in Host "MyServer " when and "Unknown" result appeats is:

Code: Select all

[1307617373.175579] [016.2] [pid=377] Found a check result (#4) to handle...
[1307617373.175615] [016.1] [pid=377] Handling check result for service 'Win Clock' on host 'MyServer'...
[1307617373.175636] [001.0] [pid=377] handle_async_service_check_result()
[1307617373.175657] [016.0] [pid=377] ** Handling check result for service 'Win Clock' on host 'MyServer'...
[1307617373.175676] [016.1] [pid=377] HOST: MyServer, SERVICE: Win Clock, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 3, OUTPUT: NTP OK: Offset 1 secs.\n
[1307617373.175755] [016.2] [pid=377] Parsing check output...
[1307617373.175776] [016.2] [pid=377] Short Output: NTP OK: Offset 1 secs.
[1307617373.175822] [016.2] [pid=377] Long Output:  NULL
[1307617373.175840] [016.2] [pid=377] Perf Data:    NULL
[1307617373.175858] [016.2] [pid=377] ST: HARD  CA: 1  MA: 3  CS: 3  LS: 0  LHS: 0
[1307617373.175886] [016.2] [pid=377] Service has changed state since last check!
[1307617373.175918] [016.1] [pid=377] Service is in a non-OK state!
[1307617373.175939] [016.1] [pid=377] Host is currently UP, so we'll recheck its state to make sure...
[1307617373.175959] [001.0] [pid=377] run_async_host_check_3x()
[1307617373.175976] [016.0] [pid=377] ** Running async check of host 'MyServer'...
[1307617373.175996] [001.0] [pid=377] check_host_check_viability_3x()
[1307617373.176017] [001.0] [pid=377] check_time_against_period()
[1307617373.176043] [001.0] [pid=377] check_host_dependencies()
[1307617373.176065] [064.1] [pid=377] Making callbacks (type 14)...
[1307617373.176101] [064.2] [pid=377] Callback #1 (type 14) return code = 0
[1307617373.176122] [016.0] [pid=377] Checking host 'MyServer'...
[1307617373.176141] [001.0] [pid=377] adjust_host_check_attempt_3x()
[1307617373.176161] [016.2] [pid=377] Adjusting check attempt number for host 'MyServer': current attempt=1/10, state=0, state type=1
[1307617373.176230] [016.2] [pid=377] New check attempt number = 1
[1307617373.176347] [001.0] [pid=377] get_raw_command_line()
[1307617373.176396] [2320.2] [pid=377] Raw Command Input: $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
[1307617373.176417] [2320.2] [pid=377] Expanded Command Output: $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
[1307617373.176438] [001.0] [pid=377] process_macros()
[1307617373.176457] [2048.1] [pid=377] **** BEGIN MACRO PROCESSING ***********
[1307617373.176477] [2048.1] [pid=377] Processing: '$USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.176498] [2048.2] [pid=377]   Processing part: ''
[1307617373.176515] [2048.2] [pid=377]   Not currently in macro.  Running output (0): ''
[1307617373.176536] [2048.2] [pid=377]   Processing part: 'USER1'
[1307617373.176560] [2048.2] [pid=377]   Processed 'USER1', Clean Options: 0, Free: 0
[1307617373.176588] [2048.2] [pid=377]   Processed 'USER1', Clean Options: 0, Free: 0
[1307617373.176614] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.176636] [2048.2] [pid=377]   Uncleaned macro.  Running output (25): '/usr/local/nagios/libexec'
[1307617373.176655] [2048.2] [pid=377]   Just finished macro.  Running output (25): '/usr/local/nagios/libexec'
[1307617373.176675] [2048.2] [pid=377]   Processing part: '/check_ping -H '
[1307617373.176749] [2048.2] [pid=377]   Not currently in macro.  Running output (40): '/usr/local/nagios/libexec/check_ping -H '
[1307617373.176788] [2048.2] [pid=377]   Processing part: 'HOSTADDRESS'
[1307617373.176827] [2048.2] [pid=377]   macro_x[2] (HOSTADDRESS) match.
[1307617373.176854] [2048.2] [pid=377]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 1
[1307617373.176874] [2048.2] [pid=377]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 1
[1307617373.176894] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.176915] [2048.2] [pid=377]   Uncleaned macro.  Running output (52): '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address'
[1307617373.176936] [2048.2] [pid=377]   Just finished macro.  Running output (52): '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address'
[1307617373.176971] [2048.2] [pid=377]   Processing part: ' -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.176996] [2048.2] [pid=377]   Not currently in macro.  Running output (86): '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.177016] [2048.1] [pid=377]   Done.  Final output: '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.177036] [2048.1] [pid=377] **** END MACRO PROCESSING *************
[1307617373.177126] [016.1] [pid=377] Check result output will be written to '/usr/local/nagios/var/spool/checkresults/checkX3Szdk' (fd=8)
[1307617373.177237] [064.1] [pid=377] Making callbacks (type 14)...
[1307617373.177277] [064.2] [pid=377] Callback #1 (type 14) return code = 0
[1307617373.178927] [016.2] [pid=377] Host check is executing in child process (pid=9253)
[1307617373.182785] [001.0] [pid=9253] process_macros()
[1307617373.183092] [001.0] [pid=9253] process_macros()
[1307617373.183119] [001.0] [pid=9253] process_macros()
[1307617373.183792] [001.0] [pid=9253] process_macros()
[1307617373.183859] [001.0] [pid=9253] process_macros()
[1307617373.183888] [001.0] [pid=9253] process_macros()
[1307617373.196901] [016.1] [pid=377] Current/Max Attempt(s): 1/3
[1307617373.196944] [016.1] [pid=377] Host is UP, so we'll retry the service check...
[1307617373.197046] [001.0] [pid=377] process_macros()
[1307617373.197084] [2048.1] [pid=377] **** BEGIN MACRO PROCESSING ***********
[1307617373.197130] [2048.1] [pid=377] Processing: 'SERVICE ALERT: MyServer;Win Clock;$SERVICESTATE$;$SERVICESTATETYPE$;$SERVICEATTEMPT$;NTP OK: Offset 1 secs.
[1307617373.197170] [2048.2] [pid=377]   Processing part: 'SERVICE ALERT: MyServer;Win Clock;'
[1307617373.197228] [2048.2] [pid=377]   Not currently in macro.  Running output (31): 'SERVICE ALERT: MyServer;Win Clock;'
[1307617373.197251] [2048.2] [pid=377]   Processing part: 'SERVICESTATE'
[1307617373.197359] [2048.2] [pid=377]   macro_x[4] (SERVICESTATE) match.
[1307617373.197383] [2048.2] [pid=377]   Processed 'SERVICESTATE', Clean Options: 0, Free: 1
[1307617373.197417] [2048.2] [pid=377]   Processed 'SERVICESTATE', Clean Options: 0, Free: 1
[1307617373.197445] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.197503] [2048.2] [pid=377]   Uncleaned macro.  Running output (38): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN'
[1307617373.197526] [2048.2] [pid=377]   Just finished macro.  Running output (38): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN'
[1307617373.197547] [2048.2] [pid=377]   Processing part: ';'
[1307617373.197566] [2048.2] [pid=377]   Not currently in macro.  Running output (39): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;'
[1307617373.197586] [2048.2] [pid=377]   Processing part: 'SERVICESTATETYPE'
[1307617373.197608] [2048.2] [pid=377]   macro_x[42] (SERVICESTATETYPE) match.
[1307617373.197629] [2048.2] [pid=377]   Processed 'SERVICESTATETYPE', Clean Options: 0, Free: 1
[1307617373.197649] [2048.2] [pid=377]   Processed 'SERVICESTATETYPE', Clean Options: 0, Free: 1
[1307617373.197670] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.197726] [2048.2] [pid=377]   Just finished macro.  Running output (43): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT'
[1307617373.197747] [2048.2] [pid=377]   Processing part: ';'
[1307617373.197766] [2048.2] [pid=377]   Not currently in macro.  Running output (44): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;'
[1307617373.197787] [2048.2] [pid=377]   Processing part: 'SERVICEATTEMPT'
[1307617373.197808] [2048.2] [pid=377]   macro_x[6] (SERVICEATTEMPT) match.
[1307617373.197831] [2048.2] [pid=377]   Processed 'SERVICEATTEMPT', Clean Options: 0, Free: 1
[1307617373.197851] [2048.2] [pid=377]   Processed 'SERVICEATTEMPT', Clean Options: 0, Free: 1
[1307617373.197871] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.197893] [2048.2] [pid=377]   Uncleaned macro.  Running output (45): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1'
[1307617373.197911] [2048.2] [pid=377]   Just finished macro.  Running output (45): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1'
[1307617373.197931] [2048.2] [pid=377]   Processing part: ';NTP OK: Offset 1 secs.'
[1307617373.197953] [2048.2] [pid=377]   Not currently in macro.  Running output (69): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1;NTP OK: Offset 1 secs.'
[1307617373.197973] [2048.1] [pid=377]   Done.  Final output: 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1;NTP OK: Offset 1 secs.'
[1307617373.197994] [2048.1] [pid=377] **** END MACRO PROCESSING *************
[1307617373.198388] [064.1] [pid=377] Making callbacks (type 9)...
[1307617373.198523] [064.2] [pid=377] Callback #1 (type 9) return code = 0
[1307617373.198555] [001.0] [pid=377] handle_service_event()
[1307617373.198575] [064.1] [pid=377] Making callbacks (type 30)...
[1307617373.198619] [064.2] [pid=377] Callback #1 (type 30) return code = 0
[1307617373.198647] [001.0] [pid=377] run_global_service_event_handler()
[1307617373.198669] [001.0] [pid=377] check_for_external_commands()
[1307617373.198695] [016.1] [pid=377] Rescheduling next check of service at Thu Jun  9 08:03:45 2011
[1307617373.198715] [001.0] [pid=377] get_next_valid_time()
[1307617373.198734] [001.0] [pid=377] check_time_against_period()
[1307617373.198762] [001.0] [pid=377] schedule_service_check()
[1307617373.198783] [016.0] [pid=377] Scheduling a non-forced, active check of service 'Win Clock' on host 'MyServer' @ Thu Jun  9 08:03:45 2011
[1307617373.198837] [016.2] [pid=377] Scheduling new service check event.
[1307617373.198859] [001.0] [pid=377] reschedule_event()
[1307617373.198877] [001.0] [pid=377] add_event()
[1307617373.198900] [064.1] [pid=377] Making callbacks (type 8)...
[1307617373.198935] [064.2] [pid=377] Callback #1 (type 8) return code = 0
[1307617373.198990] [064.1] [pid=377] Making callbacks (type 20)...
[1307617373.199051] [064.2] [pid=377] Callback #1 (type 20) return code = 0
[1307617373.199075] [064.1] [pid=377] Making callbacks (type 13)...
[1307617373.199114] [064.2] [pid=377] Callback #1 (type 13) return code = 0
[1307617373.199171] [064.1] [pid=377] Making callbacks (type 20)...
[1307617373.199230] [064.2] [pid=377] Callback #1 (type 20) return code = 0
[1307617373.199275] [001.0] [pid=377] check_for_service_flapping()
[1307617373.199309] [016.1] [pid=377] Checking service 'Win Clock' on host 'MyServer' for flapping...
[1307617373.199331] [001.0] [pid=377] check_for_host_flapping()
[1307617373.199361] [016.1] [pid=377] Checking host 'MyServer' for flapping...
[1307617373.199414] [016.2] [pid=377] LFT=5.00, HFT=20.00, CPC=0.00, PSC=0.00%
[1307617373.199440] [016.1] [pid=377] Host is not flapping (0.00% state change).
[1307617373.199487] [016.1] [pid=377] Deleted check result file '/usr/local/nagios/var/spool/checkresults/creM40o'
I'm sure that theres only one instance of Nagios running (checked with "# ps faux", restarting nagios and even rebooting the system).

I've already try increasing the cscript timeout ("cscript.exe //T XXX ..." in "NSC.ini") and the NRPE timout ("...check_nrpe -t XXX..." in "commands.cfg") but the problem remains.

Does anyone known what this could be happeining? I cant understand why Nagios Core report an "Unknown" state for my "Win Clock" service in host "MyServer" when my script never return me and exit status of "3".

Please tell me if more information (like the "commands.cfg" or something else) is needed to help me solve this issue.

Thanks in advance!

PD: Sorry for my poor English.
BiFo
Posts: 8
Joined: Fri Oct 15, 2010 9:34 am

Re: Nagios Plugin: Monitoring Windows Clock Sync.

Post by BiFo »

I've updated NSClient++ to its last version (0.3.9) but the problem remains.

"NSClient++" LOGs show me this when the UNKNOWN response ocurrs:

Code: Select all

YYY-MM-DD hh:mm:ss: error:modules\CheckExternalScripts\CheckExternalScripts.cpp:214: The command (cscript.exe) returned an invalid return code: 128
Looking around in google i've notice that the "128" return code means: There are no child processes to wait for.

My script doesnt create any child process that i'm aware of, so i'm really confunsed now...

Thanks in advance.
Bye!
BiFo
Posts: 8
Joined: Fri Oct 15, 2010 9:34 am

Re: Nagios Plugin: Monitoring Windows Clock Sync.

Post by BiFo »

Hi.
I'm still looking around a solution for this error.
Is there anything i can do to debug this a little more?

Thanks in advance.
Bye!
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios Plugin: Monitoring Windows Clock Sync.

Post by jsmurphy »

Looking around in google i've notice that the "128" return code means: There are no child processes to wait for.

My script doesnt create any child process that i'm aware of, so i'm really confunsed now...
NSClient++ creates a child process in order to execute your script in, so what's happening is that NSClient++ is failing to execute your script. Usually 128 is the result of a permissions issue, an incorrect path to the script or the script failing before it starts proper (compile issues normally). I have no idea why it would be doing any of the above intermittently though.
BiFo
Posts: 8
Joined: Fri Oct 15, 2010 9:34 am

Re: Nagios Plugin: Monitoring Windows Clock Sync.

Post by BiFo »

Its not a permissions issue cause i've give FULL PERMISSIONS to EVERYONE in all Nagios folder (My scripts included) and the problem remains.
Using the Debugger, the log file shows me the following when the error ocurrs:

Code: Select all

2012-03-18 11:03:03: debug:NSClient++.cpp:1144: Injecting: check_windows_time: 
2012-03-18 11:03:08: error:modules\CheckExternalScripts\CheckExternalScripts.cpp:214: The command (cscript.exe) returned an invalid return code: 128
2012-03-18 11:03:08: debug:NSClient++.cpp:1180: Injected Result: WARNING 'NTP WARNING: Offset 136 secs.'
2012-03-18 11:03:08: debug:NSClient++.cpp:1181: Injected Performance Result: ''
I've also found this two threads speaking about the same issue:
* http://www.nsclient.org/nscp/discussion/topic/849
* http://www.nsclient.org/nscp/discussion/topic/530
Both are still unsolve.

To test if the code is the problem, i've creted in very simple "intermediate script" between what my code returns and the NSClient++ itself: Creating a simple Batch scripts that calls the VBScript and workarround that nasty "128" error code when it happends:

Code: Select all

@echo off
C:\WINDOWS\system32\cscript.exe //T:30 //NoLogo check_time.vbs /S www.xxx.yyy.zzz /W 120 /C 300
IF ERRORLEVEL 128 GOTO SKIP
IF ERRORLEVEL 0 exit /b 0
IF ERRORLEVEL 1 exit /b 1
IF ERRORLEVEL 2 exit /b 2
IF ERRORLEVEL 3 exit /b 3

REM Retorno codigo 128, desconocido (BUG de NSClient++ sin solucion a la fecha: 18/03/2012). Devuelvo codigo de error 0 (OK).
:SKIP
exit /b 0
It simply exits with the same error code that the VBScript would do, however when 128 happends, it ends with 0 (OK) exit status.
The 128 error code still appears in the NSClient++ LOGs and the UNKNOWN status randomly appearing remains...

Attached here you can see some LOGs taken from the "/usr/local/nagios/var/nagios.debug" file (at maximum debug level) from the moment it starts...

Bye!
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios Plugin: Monitoring Windows Clock Sync.

Post by jsmurphy »

Hmmmm I've never seen that before, if you haven't already might be a good idea to post it on the nsclient++ bug tracker. Thanks for the follow up.
Locked