Page 1 of 2
warn- and critical level
Posted: Mon Jan 02, 2012 9:52 am
by nagiosnext
Hello Supporter,
currently i'm using Nagios 3.3.1 and 1.4.5 plugins. At the moment I'm monitoring around 500 services of different types.
I have added recently another SQL to monitor and it doesn't throw errors. For a german SQL Server Express 2003 I use:
Code: Select all
define command{
command_name check_nt_bb_sql_user_connections_german
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen","SQL Server User Connections: %.f" -w $ARG1$ -c $ARG2$
}
("MSSQL\$BES" is correct. I checked performance counters on that machine and I have used the same typing for other counters)
along with:
Code: Select all
define service{
use generic-service,srv-pnp
host_name HOSTNAME
service_description SQL Server User Connections
check_command check_nt_bb_sql_user_connections_german!10!20
}
and in the command line it shows me the warn and critical level. for example:
Code: Select all
/check_nt -H IP -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen","User Connections: %.f" -w 10 -c 20
User Connections: 118 | 'User Connections: %.f'=118.000000%;10.000000;20.000000;
But, in the webinterface it tells me:
Code: Select all
Performance Data: 'SQL Server User Connections %.f'=118.000000%;0.000000;0.000000;
So, I assume the perfdata warning and critical level is gone somewhere, but I have no clue where.
Any ideas?
Re: warn- and critical level
Posted: Mon Jan 02, 2012 2:00 pm
by mguthrie
Try setting your -w and -c values to floats instead of integers. 10.0 instead of 10.
Re: warn- and critical level
Posted: Mon Jan 02, 2012 2:02 pm
by nagiosnext
Hello mguthrie,
will try it, thanks for your answer.
Just wondering why it is working with other checks...............
Re: warn- and critical level
Posted: Mon Jan 02, 2012 2:09 pm
by mguthrie
I'm guessing it's an issue with that particular plugin related to data types. int vs float. This isn't so much an issue with Nagios as it is with the individual check plugin returning data correctly.
Re: warn- and critical level
Posted: Tue Jan 03, 2012 1:48 am
by nagiosnext
Hello mguthrie and supporter,
need to raise the issue again. Doing a:
Code: Select all
/usr/local/nagios/libexec/check_nt -H HOSTNAME -p PORT -v COUNTER -l "\\MSSQL\$BES:Datenbanken(_Total)\\Größe der Datendatei(en) (KB)","SQL Server Datafile size total is %.f" -w 150000.000000 -c 200000.000000
results in:
Code: Select all
SQL Server Datafile size total is 115264 | 'SQL Server Datafile size total is %.f'=115264.000000%;150000.000000;200000.000000;
So, it shows my warn and critical level.
Looking in the webinterface, it tells me:
Code: Select all
Status Information: SQL Server Datafile size total is 115264
Performance Data: 'SQL Server Datafile size total is %.f'=115264.000000%;0.000000;0.000000;
my command definition:
Code: Select all
define command{
command_name check_nt_bb_sql_database_file_size_german
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\MSSQL\$BES:Datenbanken(_Total)\\Größe der Datendatei(en) (KB)","SQL Server Datafile size total is %.f" -w $ARG1$ -c $ARG2$
}
my service definition:
Code: Select all
define service{
use generic-service,srv-pnp
host_name NEXTBB
service_description BB Overall DB file size
check_command check_nt_bb_sql_database_file_size_german!150000.000000!200000.000000
}
using the same plugin (check_nt) on other servers and sql services is working fine. for example:
Code: Select all
define command{
command_name check_nt_sql_database_files_size
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\SQLServer:Databases(_Total)\\Data File(s) Size (KB)","SQL Server Datafile size total is %.f" -w $ARG1$ -c $ARG2$
}
along with:
Code: Select all
define service{
use generic-service,srv-pnp
host_name HOSTNAME
service_description Overall DB file size
check_command check_nt_sql_database_files_size!5500000!6000000
}
results in corrrect warn + crit level, so perfdata is working.
Do you see any issues? any mistyped words?
Do I have any further debugging options for this particular plugin, or the way to the webinterface?
Re: warn- and critical level
Posted: Fri Jan 06, 2012 3:46 am
by nagiosnext
Anyone an idea what I'm doing wrong?
Re: warn- and critical level
Posted: Fri Jan 06, 2012 11:22 am
by mguthrie
\\MSSQL\$BES:Datenbanken
Any chance the dollar sign is throwing something off. The $ is a special character for nagios, and typically to have a dollar sign pass correctly you need to use \\$$
You could also try single quotes instead of double quotes so that the shell won't interpret anything inside of them.
Re: warn- and critical level
Posted: Mon Jan 09, 2012 2:41 am
by nagiosnext
mguthrie wrote:\\MSSQL\$BES:Datenbanken
Any chance the dollar sign is throwing something off. The $ is a special character for nagios, and typically to have a dollar sign pass correctly you need to use \\$$
You could also try single quotes instead of double quotes so that the shell won't interpret anything inside of them.
Ok, I've tried:
Code: Select all
$USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l '\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen',"SQL Server User Connections %.f" -w $ARG1$ -c $ARG2$
using:
check_nt_bb_sql_user_connections_german!150.000000!200.000000
which results in:
Code: Select all
Performance Data: 'SQL Server User Connections %.f'=0.000000%;0.000000;0.000000;
Code: Select all
$USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l '\\MSSQL\\$$BES:Allgemeine Statistik\\Benutzerverbindungen',"SQL Server User Connections %.f" -w $ARG1$ -c $ARG2$
using:
check_nt_bb_sql_user_connections_german!150.000000!200.000000
which results in:
Code: Select all
Performance Data: 'SQL Server User Connections %.f'=0.000000%;150.000000;200.000000;
Code: Select all
$USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen","SQL Server User Connections %.f" -w $ARG1$ -c $ARG2$
using:
check_nt_bb_sql_user_connections_german!150.000000!200.000000
which results in:
Code: Select all
Performance Data: 'SQL Server User Connections %.f'=120.000000%;0.000000;0.000000;
This is the only check which returns the correct value from the SQL, here "120", but it doesn't pass the warn and critical levels............
The origin name of the instance is "MSSQL$BES", so I assume it is correct to use "MSSQL\$BES".
Also I have passed float values for ARG1 and ARG2, so this is getting odd.
Using this command on the commandline works fine.
Code: Select all
$USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\MSSQL\\$$BES:Allgemeine Statistik\\Benutzerverbindungen","SQL Server User Connections %.f" -w $ARG1$ -c $ARG2$
using:
check_nt_bb_sql_user_connections_german!150.000000!200.000000
which results in:
Code: Select all
Performance Data: 'SQL Server User Connections %.f'=0.000000%;150.000000;200.000000;
Re: warn- and critical level
Posted: Mon Jan 09, 2012 3:40 am
by nagiosnext
Found a topic which might explaing why I see no warn and critical level (?):
http://forum.centreon.com/showthread.ph ... ce-counter
Got the same issue here.
I think it's because rrd is'nt able to build file(s) with something which has a "%" in the name.
Have to deal with it for now, i didn't find any way to resolve it.
will try a new search and see if I find a solution for that...........
Re: warn- and critical level
Posted: Mon Jan 09, 2012 5:12 am
by nagiosnext
I activated debugging and got the following output:
[1326099991.112527] [008.0] [pid=18373] ** Service Check Event ==> Host: 'HOSTNAME', Service: 'BB SQL Server User Connections', Options: 0, Latency: 0.112000 sec
[1326099991.112549] [001.0] [pid=18373] run_scheduled_service_check() start
[1326099991.112563] [016.0] [pid=18373] Attempting to run scheduled check of service 'BB SQL Server User Connections' on host 'HOSTNAME': check options=0, latency=0.112000
[1326099991.112581] [001.0] [pid=18373] run_async_service_check()
[1326099991.112596] [001.0] [pid=18373] check_service_check_viability()
[1326099991.112612] [001.0] [pid=18373] check_time_against_period()
[1326099991.112659] [001.0] [pid=18373] check_service_dependencies()
[1326099991.112836] [064.1] [pid=18373] Making callbacks (type 13)...
[1326099991.112855] [016.0] [pid=18373] Checking service 'BB SQL Server User Connections' on host 'HOSTNAME'...
[1326099991.112949] [001.0] [pid=18373] get_raw_command_line_r()
[1326099991.112988] [2320.2] [pid=18373] Raw Command Input: $USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1$ -c $ARG2$
[1326099991.113029] [001.0] [pid=18373] process_macros_r()
[1326099991.113046] [2048.1] [pid=18373] **** BEGIN MACRO PROCESSING ***********
[1326099991.113060] [2048.1] [pid=18373] Processing: '150.000000'
[1326099991.113094] [2048.2] [pid=18373] Processing part: '150.000000'
[1326099991.113111] [2048.2] [pid=18373] Not currently in macro. Running output (10): '150.000000'
[1326099991.113126] [2048.1] [pid=18373] Done. Final output: '150.000000'
[1326099991.113140] [2048.1] [pid=18373] **** END MACRO PROCESSING *************
[1326099991.113154] [001.0] [pid=18373] process_macros_r()
[1326099991.113168] [2048.1] [pid=18373] **** BEGIN MACRO PROCESSING ***********
[1326099991.113182] [2048.1] [pid=18373] Processing: '200.000000'
[1326099991.113217] [2048.2] [pid=18373] Processing part: '200.000000'
[1326099991.113233] [2048.2] [pid=18373] Not currently in macro. Running output (10): '200.000000'
[1326099991.113248] [2048.1] [pid=18373] Done. Final output: '200.000000'
[1326099991.113262] [2048.1] [pid=18373] **** END MACRO PROCESSING *************
[1326099991.113276] [2320.2] [pid=18373] Expanded Command Output: $USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1$ -c $ARG2$
[1326099991.113291] [001.0] [pid=18373] process_macros_r()
[1326099991.113318] [2048.1] [pid=18373] Processing: '$USER1$/check_nt -H $HOSTADDRESS$ -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1$ -c $ARG2$'
[1326099991.113403] [2048.2] [pid=18373] Processing part: ''
[1326099991.113419] [2048.2] [pid=18373] Not currently in macro. Running output (0): ''
[1326099991.113434] [2048.2] [pid=18373] Processing part: 'USER1'
[1326099991.113474] [2048.2] [pid=18373] Processed 'USER1', Clean Options: 0, Free: 0
[1326099991.113490] [2048.2] [pid=18373] Processed 'USER1', Clean Options: 0, Free: 0
[1326099991.113505] [2048.2] [pid=18373] Cleaning options: global=0, local=0, effective=0
[1326099991.113540] [2048.2] [pid=18373] Uncleaned macro. Running output (25): '/usr/local/nagios/libexec'
[1326099991.113556] [2048.2] [pid=18373] Just finished macro. Running output (25): '/usr/local/nagios/libexec'
[1326099991.113571] [2048.2] [pid=18373] Processing part: '/check_nt -H '
[1326099991.113604] [2048.2] [pid=18373] Not currently in macro. Running output (38): '/usr/local/nagios/libexec/check_nt -H '
[1326099991.113621] [2048.2] [pid=18373] Processing part: 'HOSTADDRESS'
[1326099991.113638] [2048.2] [pid=18373] macros[2] (HOSTADDRESS) match.
[1326099991.113766] [2048.2] [pid=18373] Processed 'HOSTADDRESS', Clean Options: 0, Free: 1
[1326099991.113784] [2048.2] [pid=18373] Processed 'HOSTADDRESS', Clean Options: 0, Free: 1
[1326099991.113799] [2048.2] [pid=18373] Cleaning options: global=0, local=0, effective=0
[1326099991.113814] [2048.2] [pid=18373] Uncleaned macro. Running output (52): '/usr/local/nagios/libexec/check_nt -H IP'
[1326099991.113829] [2048.2] [pid=18373] Just finished macro. Running output (52): '/usr/local/nagios/libexec/check_nt -H IP'
[1326099991.113844] [2048.2] [pid=18373] Processing part: ' -p PORT -v COUNTER -l "\\MSSQL\'
[1326099991.113859] [2048.2] [pid=18373] Not currently in macro. Running output (85): '/usr/local/nagios/libexec/check_nt -H IP -p PORT -v COUNTER -l "\\MSSQL\'
[1326099991.113874] [2048.2] [pid=18373] Processing part: 'BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w '
[1326099991.113913] [2048.0] [pid=18373] WARNING: Could not find a macro matching 'BES'!
[1326099991.113947] [2048.2] [pid=18373] Processed 'BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w ', Clean Options: 0, Free: 1
[1326099991.113962] [2048.0] [pid=18373] WARNING: An error occurred processing macro 'BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w '!
[1326099991.113977] [2048.2] [pid=18373] Non-macro. Running output (85): '/usr/local/nagios/libexec/check_nt -H IP -p PORT -v COUNTER -l "\\MSSQL\'
[1326099991.113994] [2048.2] [pid=18373] Processing part: 'ARG1'
[1326099991.114009] [2048.2] [pid=18373] Not currently in macro. Running output (177): '/usr/local/nagios/libexec/check_nt -H IP -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1'
[1326099991.114024] [2048.2] [pid=18373] Processing part: ' -c '
[1326099991.114040] [2048.0] [pid=18373] WARNING: Could not find a macro matching ' -c '!
[1326099991.114055] [2048.2] [pid=18373] Processed ' -c ', Clean Options: 0, Free: 1
[1326099991.114070] [2048.0] [pid=18373] WARNING: An error occurred processing macro ' -c '!
[1326099991.114084] [2048.2] [pid=18373] Non-macro. Running output (177): '/usr/local/nagios/libexec/check_nt -H IP -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1'
[1326099991.114123] [2048.2] [pid=18373] Processing part: 'ARG2'
[1326099991.114140] [2048.2] [pid=18373] Not currently in macro. Running output (187): '/usr/local/nagios/libexec/check_nt -H IP -p POT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1$ -c $ARG2'
[1326099991.114154] [2048.2] [pid=18373] Processing part: ''
[1326099991.114170] [2048.0] [pid=18373] WARNING: Could not find a macro matching ''!
[1326099991.114185] [2048.2] [pid=18373] Processed '', Clean Options: 0, Free: 1
[1326099991.114200] [2048.0] [pid=18373] WARNING: An error occurred processing macro ''!
[1326099991.114214] [2048.2] [pid=18373] Escaped $. Running output (187): '/usr/local/nagios/libexec/check_nt -H IP -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1$ -c $ARG2'
[1326099991.114230] [2048.1] [pid=18373] Done. Final output: '/usr/local/nagios/libexec/check_nt -H IP -p PORT -v COUNTER -l "\\MSSQL\$BES:Allgemeine Statistik\\Benutzerverbindungen",'SQL Server User Connections %.f' -w $ARG1$ -c $ARG2$'
So, what I see is that he truncates my MSSQL\$BES into 2 pieces:
[1326099991.113844] [2048.2] [pid=18373] Processing part: ' -p PORT -v COUNTER -l "\\MSSQL\'
but how do I prevent that?