Return code of 137 is out of bounds

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
lamy2008
Posts: 2
Joined: Thu Feb 16, 2012 10:57 am

Return code of 137 is out of bounds

Post by lamy2008 »

Hi,

I have installed nagios +plugins+NSCA module on Solaris 10 SPARC (test server) and add a host (windows server 2008 ) to monitor,i definied all services that i need to monitor and use the commands (default) already existing in the "commands.cfg" :


For the Nagios server : localhost

Code: Select all

 # 'check_local_disk' command definition
    77  define command{
    78          command_name    check_local_disk
    79          command_line    $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
    80          }
    81
    82
    83  # 'check_local_load' command definition
    84  define command{
    85          command_name    check_local_load
    86          command_line    $USER1$/check_load -w $ARG1$ -c $ARG2$
    87          }
    88
    89
    90  # 'check_local_procs' command definition
    91  define command{
    92          command_name    check_local_procs
    93          command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
    94          }
    95
    96
    97  # 'check_local_users' command definition
    98  define command{
    99          command_name    check_local_users
   100          command_line    $USER1$/check_users -w $ARG1$ -c $ARG2$
   101          }
   102
   103
   104  # 'check_local_swap' command definition
   105  define command{
   106          command_name    check_local_swap
   107          command_line    $USER1$/check_swap -w $ARG1$ -c $ARG2$
   108          }
For the Windows server (client) i use the check_nt:

Code: Select all

 # 'check_nt' command definition
   226  define command{
   227          command_name    check_nt
   228          command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
   229          }
i definied all services like this:

Code: Select all

 define service{
     3          name                            check_NT_SERVICE_eventlog
     4          service_description             check_NT_SERVICE_eventlog
     5          use                             generic-service
     6          host_name                       win2k8r2
     7          check_command                   check_nt!SERVICESTATE!-d SHOWALL -l eventlog
     8          notification_options            c,r
     9          register                                0
    10  }
    11
    12
    13  define service{
    14  #hostgroup_name                  Windows_Servers
    15          service_description             NagiosEventlogService
    16          use                             check_NT_SERVICE_eventlog
    17          notification_options            w,c,r
    18          notifications_enabled           1
    19  }
    20
    21
    22  # Charge CPU
    23  # CRITICAL si charge > 90% pendant plus de 2 minutes
    24
    25  define service{
    26          name                            check_NT_CPU
    27          service_description             check_NT_CPU
    28          use                             generic-service
    29          host_name                       win2k8r2
    30          check_command                   check_nt!CPULOAD!-l 2,90
    31          notifications_enabled           1
    32          notification_options            c,r
    33  }
    34
    35  define service{
    36          name                            check_NT_DISK_C
    37          service_description             check_NT_DISK_C
    38          use                             generic-service
    39          host_name                       win2k8r2
    40          check_command                   check_nt!USEDDISKSPACE!-l C -c95
    41          notification_options            c,r
    42          notifications_enabled           1
    43         #active_checks_enabled           1

 define service{
    64          name                            check_NT_MEMUSE
    65          service_description             check_NT_MEMUSE
    66          use                             generic-service
    67          host_name                       win2k8r2
    68          check_command                   check_nt!MEMUSE! -c90
    69          check_period                    24x7
    70          notification_options            c,r
    71          notifications_enabled           1
    72          register                        0
    73  }
USER1=/usr/local/nagios/libexec

Things seems ok ,but i have two major problems in the web interface:


-All service have the status (for localhost and win2k8r2): CRITICAL and the Return code error 137 is out of bounds
-I can see only the NagiosEventlogService and check_nt_CPU ,even if i defined as you see bellow memuse,disk etc


Can u help me to resolve this issue please?
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Return code of 137 is out of bounds

Post by jsmurphy »

The most glaring omissions are:
- check_NT_MEMUSE will never display (register 0 is set).
- The check_NT_DISK_C has no closing bracket, it also has no warning threshold. (It shouldn't even be passing verification without the bracket.)
- check_NT_CPU has no critical threshold.
- check_NT_MEMUSE has no warning threshold.

Execute check_nt from the nix command line and look at what the parameters it expects to be parsed to it for each of the different checks (usually located in /usr/local/nagios/libexec/). You can also use the command line to test if a check will work... fill in the parameters yourself and see what output you get.
lamy2008
Posts: 2
Joined: Thu Feb 16, 2012 10:57 am

Re: Return code of 137 is out of bounds

Post by lamy2008 »

Thanks to reply "jsmurphy":

Code: Select all

check_NT_MEMUSE will never display (register 0 is set).
I have compared the definitions of services in the services.cfg,the difference between the "check_NT_CPU " and the other services is this lines :

Code: Select all

register                                0
I removed it and it works ,i can visualize all services defined in the interface.But the problem still the status = CRITICAl , i don't know if it's a problem of CGI authentification,library of check something like that,no idea.

Code: Select all

The check_NT_DISK_C has no closing bracket
I made a mistake when i copied this part of configuration.I have a closing bracket for each definition section in my conf file.

Code: Select all

check_NT_CPU has no critical threshold
The threshold is 90%: check_command check_nt!CPULOAD!-l 2,90

Code: Select all

check_NT_MEMUSE has no warning threshold
I'm not interested to warning alert,just critical.

I already test executing check-nt and it's work ,that's why i post my message cause it"s bizarre to not have the same results in the interface:

Code: Select all

# ./check_nt -H xx.xx.xx.xx -p 12489 -v CLIENTVERSION
NSClient++ 0.3.8.76 2010-05-27

Code: Select all

./check_nt -H  xx.xx.xx.xx  -p 12489 -v USEDDISKSPACE -l c
c:\ - total: 31.90 Gb - used: 12.77 Gb (40%) - free 19.13 Gb (60%) | 'c:\ Used Space'=12.77Gb;0.00;0.00;0.00;31.90
Also,i have this result in the status information of Windows ressources (CPU,DIsk,memory):

Code: Select all

non-recoverable name resolution failure 
Even if i put the right name of client server,just to verify one thing :in the alias option i must put the name of server ? what represent this option?

For the localhost (Nagios server) ,must i put the "127.0.0.1" or the ip of adress?

In the status information of check_ping i have this:

Code: Select all

check_ping: Invalid hostname/address - xx.xx.xx.x 
xx.xx.xx.x=IP adress of server i masked here for reason of security

For "Current-load" and "Current-users" things seems ok :

Code: Select all

Current Load  OK 03-09-2012 13:35:33 0d 3h 13m 7s 1/4 OK - load average: 0.03, 0.02, 0.02  
Current Users OK 03-09-2012 13:36:37 0d 3h 17m 3s 1/4 USERS OK - 0 users currently logged in  
For check-disk i have this status= critical and :

Code: Select all

DISK CRITICAL: / not found 
That 's strange,any ideas?

Thanks
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Return code of 137 is out of bounds

Post by scottwilkerson »

lamy2008 wrote: check_NT_CPU has no critical threshold

The threshold is 90%: check_command check_nt!CPULOAD!-l 2,90
Not correct, it takes 3 parameters, the first is minutes range. From the help file:
CPULOAD =
Average CPU load on last x minutes.
Request a -l parameter with the following syntax:
-l <minutes range>,<warning threshold>,<critical threshold>.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Return code of 137 is out of bounds

Post by jsmurphy »

Ok, We need to focus a little more on single problems here I think... because I can't tell if you are talking about only a single windows server here or multiple windows servers or any linux/windows server (i.e the check_disk comment... which is a linux only check).

Is a single host failing to resolve via DNS when you run it on the command line? Multiple hosts? If you just run a ping does it resolve the hostname then? Lets pick a single check and work with that because right now based on your symptoms it could be:
- DNS resolution failure
- Network configuration issues
- Nagios Configuration problems
- File permission problems
- Plugin package not compiled properly for that OS.
Or any combination of the above.

Pick a service, give me all the relevant information for that one service, host config, service config, command definition and what happens when you run it from the command line for a single host and what's in the Nagios UI. I think once we diagnose one the rest should begin to fall into place.
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

Re: Return code of 137 is out of bounds

Post by jbruyet »

I apologize for the thread hijack but --> scottwilkerson what help file? I've been looking for some documentation that would help me figure out how to monitor other services (on HP ProLiant and ProCurve devices to start with) but I can't find a man page for check_nt or CPULOAD or anything else I'm trying to use. I searched the Nagios Core Version 3.x Documentation and found nothing there. Any help here would be greatly appreciated.

Thanks, and again I apologize for the hijack,

Joe B
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Return code of 137 is out of bounds

Post by scottwilkerson »

That wasn't very good of me to not include the link...
You can find it here but you can also just add the -h arg at the command line

Code: Select all

# ./check_nt -h
check_nt v1991 (nagios-plugins 1.4.13)
Copyright (c) 2000 Yves Rubin (rubiyz@yahoo.com)
Copyright (c) 2000-2007 Nagios Plugin Development Team
<nagiosplug-devel@lists.sourceforge.net>

This plugin collects data from the NSClient service running on a
Windows NT/2000/XP/2003 server.


Usage:check_nt -H host -v variable [-p port] [-w warning] [-c critical][-l params] [-d SHOWALL] [-t timeout]

Options:
-h, --help
Print detailed help screen
-V, --version
Print version information
Options:
-H, --hostname=HOST
Name of the host to check
-p, --port=INTEGER
Optional port number (default: 1248)
-s <password>
Password needed for the request
-w, --warning=INTEGER
Threshold which will result in a warning status
-c, --critical=INTEGER
Threshold which will result in a critical status
-t, --timeout=INTEGER
Seconds before connection attempt times out (default: 10)
-h, --help
Print this help screen
-V, --version
Print version information
-v, --variable=STRING
Variable to check

Valid variables are:
CLIENTVERSION = Get the NSClient version
If -l <version> is specified, will return warning if versions differ.
CPULOAD =
Average CPU load on last x minutes.
Request a -l parameter with the following syntax:
-l <minutes range>,<warning threshold>,<critical threshold>.
<minute range> should be less than 24*60.
Thresholds are percentage and up to 10 requests can be done in one shot.
ie: -l 60,90,95,120,90,95
UPTIME =
Get the uptime of the machine.
No specific parameters. No warning or critical threshold
USEDDISKSPACE =
Size and percentage of disk use.
Request a -l parameter containing the drive letter only.
Warning and critical thresholds can be specified with -w and -c.
MEMUSE =
Memory use.
Warning and critical thresholds can be specified with -w and -c.
SERVICESTATE =
Check the state of one or several services.
Request a -l parameters with the following syntax:
-l <service1>,<service2>,<service3>,...
You can specify -d SHOWALL in case you want to see working services
in the returned string.
PROCSTATE =
Check if one or several process are running.
Same syntax as SERVICESTATE.
COUNTER =
Check any performance counter of Windows NT/2000.
Request a -l parameters with the following syntax:
-l "\\<performance object>\\counter","<description>
The <description> parameter is optional and is given to a printf
output command which requires a float parameter.
If <description> does not include "%%", it is used as a label.
Some examples:
"Paging file usage is %%.2f %%%%"
"%%.f %%%% paging file used."
INSTANCES =
Check any performance counter object of Windows NT/2000.
Syntax: check_nt -H <hostname> -p <port> -v INSTANCES -l <counter object>
<counter object> is a Windows Perfmon Counter object (eg. Process),
if it is two words, it should be enclosed in quotes
The returned results will be a comma-separated list of instances on
the selected computer for that object.
The purpose of this is to be run from command line to determine what instances
are available for monitoring without having to log onto the Windows server
to run Perfmon directly.
It can also be used in scripts that automatically create Nagios service
configuration files.
Some examples:
check_nt -H 192.168.1.1 -p 1248 -v INSTANCES -l Process

Notes:
- The NSClient service should be running on the server to get any information
(http://nsclient.ready2run.nl).
- Critical thresholds should be lower than warning thresholds
- Default port 1248 is sometimes in use by other services. The error
output when this happens contains "Cannot map xxxxx to protocol number".
One fix for this is to change the port to something else on check_nt
and on the client service it's connecting to.

Send email to nagios-users@lists.sourceforge.net if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to nagiosplug-devel@lists.sourceforge.net
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Return code of 137 is out of bounds

Post by jsmurphy »

jbruyet wrote:I apologize for the thread hijack but --> scottwilkerson what help file? I've been looking for some documentation that would help me figure out how to monitor other services (on HP ProLiant and ProCurve devices to start with) but I can't find a man page for check_nt or CPULOAD or anything else I'm trying to use. I searched the Nagios Core Version 3.x Documentation and found nothing there. Any help here would be greatly appreciated.

Thanks, and again I apologize for the hijack,

Joe B
The NSclient++ site also has some good information, but can be difficult to navigate sometimes: http://www.nsclient.org/nscp/wiki/doc/u ... s/nsclient
jbruyet
Posts: 235
Joined: Wed Dec 28, 2011 12:14 pm

Re: Return code of 137 is out of bounds

Post by jbruyet »

Thanks jsmurphy. I'm in your debt.

Thanks,

Joe B
Locked