Possible client issues?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mobiledataforce
Posts: 68
Joined: Fri Mar 15, 2013 1:35 pm

Possible client issues?

Post by mobiledataforce »

I have a few remaining issues in wrapping up my Nagios configuration. I have two problem hosts that are not monitoring properly. One of these hosts does not monitor at all. It shows the host/services in a Critical - Socket timeout after 10 seconds state. I can monitor custom ports successfully on this host but I assume the client isn't utilized to verify the port is open.

The second host monitors intermittently. It sometimes monitors correctly and shows all the host/servers in an OK state, but for the majority of the time it shows in a Warning - No data was received from host state. This host is currently showing the Ping monitor as OK, and thus the Host; but this has been intermittent. It more often shows all host/services in a Warning or Critical state.

I am monitoring several other hosts/servers without these issues. I have opened the following on the firewall for all hosts:
echo, echo-reply, 5666, 5667, 12489
I have also enabled the NSClient service to interact with the desktop, tried uninstall/reinstall of the client, and disabled the local firewalls for these two problem hosts. Nothing I have done has corrected the issue of the host that ALWAYS shows the host/services in a critical state and I have no idea why the other host sometimes works, but more often does not.

I have two other issues unrelated to this, but would very much like to get them all wrapped up so I can get approval for the license purchase. If I need to split into another topic I will, otherwise..
For some reason I can not receive sms messages to a Verizon carrier device. I have tested t mobile successfully but something is holding the Verizon back.
I have one problem service that is not seen for some reason. I have configured this to monitor the service name, just I have all the others which are monitoring properly. I see nothing unique about this service as to why Nagios is not seeing it. It happens to be a SQL Express service, MSSQL$SQLEXPRESS in this case, but as I mentioned, this host is monitoring other services properly and I am not having any issues with any of the other SQL services monitoring for anything else.

Any help or guidance would be much appreciated it.
Thanks!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Possible client issues?

Post by slansing »

Lets start here:
I have a few remaining issues in wrapping up my Nagios configuration. I have two problem hosts that are not monitoring properly. One of these hosts does not monitor at all. It shows the host/services in a Critical - Socket timeout after 10 seconds state. I can monitor custom ports successfully on this host but I assume the client isn't utilized to verify the port is open.

The second host monitors intermittently. It sometimes monitors correctly and shows all the host/servers in an OK state, but for the majority of the time it shows in a Warning - No data was received from host state. This host is currently showing the Ping monitor as OK, and thus the Host; but this has been intermittent. It more often shows all host/services in a Warning or Critical state.
First question: What are you trying to monitor with this service? If you run the command Nagios XI is using from the command line, what does it look like? And what does it output?

Second question: Is this also a windows server? What is the check the service is running? Similarly to the first question please show us the command Nagios is running to return the check data, and also it's output once you run it from the command line. Also, are you using SSL?
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Possible client issues?

Post by sreinhardt »

Issue 1: What checks are you using that are timing out? More than likely the checks will allow a -t flag or similar that can extend the timeout period, so that if network latency or server load causes a delay, you still receive an accurate result. By client are you referring to nsclient or another remote client?

Issue 2: More than likely this is a network or firewall issue. As with above, are you using local active checks on the nagios machine or a remote client on the other device?

What version of nsclient are you using presently?

As for sending to vzw, have you confirmed that it is using the correct domain name? I believe it is [number]@vztext.net, last time I looked. Also have you tried both with a 1 and without prior to the number? This would of course be assuming you are in the US, otherwise it would be another country code.

NSClient does have some quirks with service names. I would not be surprised if the $ needs to be escaped in someway to resolve the issue. Can you confirm if any other service names that are working have spaces or non alpha-numeric characters?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
mobiledataforce
Posts: 68
Joined: Fri Mar 15, 2013 1:35 pm

Re: Possible client issues?

Post by mobiledataforce »

I am trying to monitor many different things. CPU Usage, Disk space, Memory Usage, Uptime, services, ports etc, things you would imagine would be monitored. These are Windows servers. I can not run the nagios commands from windows, the are obviously not recognized commands, running them from the nagios server directly results in event not found. I am almost certain this issue stems from something on the windows servers as i have MANY others that are working properly with this EXACT same configuration. I am not using SSL for these connections.

An example of one of the commands is check_xi_service_nsclient!PASSWORD!UPTIME I don't see the need to list them all, they are exactly what you would imagine.

Combined posts:

Yes I am refering to the nsclient, as I believe the Windows servers I am trying to monitor are the issue, and not my nagios configuration. I agree it looks like a firewall issue, but AV and firewall are disabled on the windows server and I am CERTAIN the proper ports are opened for the outside firewalls. The Client version is 0.4.0.172. I am using local active checks which send to my remote nagios server.

I can confirm that I have/am monitoring other service names with non-alpha-numeric characters. Again, I am monitoring MANY other hosts successfully, it is something on the windows serves that is the issue.

I have @vtext.com, I'll try the .net and see if that works.

All the commands look the same on working and non-working monitoring. I do see appending !!!!!! for some of the commands, unsure where this comes from and I see it on both working and non-working monitoring services.
Last edited by slansing on Mon Apr 08, 2013 11:38 am, edited 1 time in total.
Reason: Merged posts, please do not double post.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Possible client issues?

Post by slansing »

As was stated we need additional information:
If you run the command Nagios XI is using from the command line, what does it look like? And what does it output?
The command Nagios XI is using, from the Nagios Xi server's command line...

It is difficult to imagine the commands without you letting us know what you are using since, by default, NSClient has two methods of running active checks that can time out.. check_nt, and check_nrpe. The purpose of this question is also twofold, the other side being we need a command that you are familiar with that we can ask you to run to see if changes we make during the troubleshooting process are working properly.

What do you mean by:
running them from the nagios server directly results in event not found
That sounds like an event log error, which would go against what was previously stated and is why we need to know the command you ran.

We need to know the exact output when running the command, as, in some cases, it would have to be altered to run properly through the XI UI, though you have multiple servers configured the same way so we should be fine on that end.

Lets start with the basics, run the following "from the Nagios server" and return the output to us:

Code: Select all

telnet <Windows server's IP> 5666

telnet <Windows server's IP> 12489
To make sure that we can connect via that port socket, it could be a latency issue causing the timeout so we need to know if they are reachable normally.
mobiledataforce
Posts: 68
Joined: Fri Mar 15, 2013 1:35 pm

Re: Possible client issues?

Post by mobiledataforce »

Wow.

Like I already said, the command is check_xi_service_nsclient!PASSWORD!UPTIME. Running them from the nagios server directly means running commands from the CENT OS Server terminal running NagiosXI.. When I do this, what is returned is :event not found, when trying to run nagios commands. Trying to telnet from the nagios server returns, Telnet: command not found..
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Possible client issues?

Post by slansing »

You see though, that is not the command. That is the shorthand version that service definitions use so it is not a valid "command" when running it through a console.

If you were to open the Core Config manager, and look at one of your services it will display the entire command as well as portions that would be filled in by your arguments.

When nagios runs an active check to a host, it returns the same information you would see when running it's command manually, for instance a basic nrpe check would be as follows:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H <IP of windows host>

Since you do not have telnet installed, we can test to see if port 5666 is reachable by running the above to the Windows server, as it will attempt to reach the server over port 5666.

In the case of UPTIME, you are using check_nt to check against the windows server, so we could use:

Code: Select all

/usr/local/nagios/libexec/check_nt -H <IP of windows host> -p 12489 -v UPTIME
To check if the port you are using for check_nt is open. This is assuming you are using the standard 12489 port.
mobiledataforce
Posts: 68
Joined: Fri Mar 15, 2013 1:35 pm

Re: Possible client issues?

Post by mobiledataforce »

For some reason it is telling me these commands are not found. I have navigated to /usr/local/nagios/libexec/ I can see the commands if I list folder contents, but they won't run.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Possible client issues?

Post by sreinhardt »

please run "ll /usr/local/nagios/libexec" and return the output. Also can you cd to that directory and try ./check_nt -h
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
mobiledataforce
Posts: 68
Joined: Fri Mar 15, 2013 1:35 pm

Re: Possible client issues?

Post by mobiledataforce »

Here are the outputs:

Code: Select all

[root@localhost ~]# ll /usr/local/nagios/libexec
total 4688
-rwxr-xr-x. 1 root   root    65068 Aug 29  2011 check_apt
-rwxr-xr-x. 1 root   root     6897 Aug 29  2011 check_asterisk.pl
-rwxr-xr-x  1 nagios users    4173 Oct  9 18:09 check_bl
-rwxr-xr-x  1 nagios users    2289 Oct  9 18:09 check_bpi.php
-rwxr-xr-x. 1 root   root     2274 Aug 29  2011 check_breeze
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_clamd -> check_tcp
-rwxr-xr-x. 1 root   root    40575 Aug 29  2011 check_cluster
-r-sr-xr-x. 1 root   root    73301 Aug 29  2011 check_dhcp
-rwxr-xr-x. 1 root   root    71777 Aug 29  2011 check_dig
-rwxr-xr-x. 1 root   root   101991 Aug 29  2011 check_disk
-rwxr-xr-x. 1 root   root     8163 Aug 29  2011 check_disk_smb
-rwxr-xr-x. 1 root   root    79570 Aug 29  2011 check_dns
-rwxr-xr-x. 1 root   root    36412 Aug 29  2011 check_dummy
-rwxr-xr-x  1 nagios users    5625 Oct  9 18:09 check_em01.pl
-rwxr-xr-x  1 nagios users   38345 Oct  9 18:09 check_email_delivery
-rwxr-xr-x  1 nagios users   20511 Oct  9 18:09 check_email_delivery_epn
-rwxr-xr-x. 1 root   root    20039 Aug 29  2011 check_email_loop.pl
-rwxr-xr-x  1 nagios users   82841 Mar 14 11:00 check_esx3.pl
-rwxr-xr-x. 1 root   root     3143 Aug 29  2011 check_file_age
-rwxr-xr-x. 1 root   root     6395 Aug 29  2011 check_flexlm
-rwxr-xr-x. 1 root   root    74960 Aug 29  2011 check_fping
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_ftp -> check_tcp
-rwxr-xr-x  1 nagios users    3446 Jan 31 11:37 check_ftp_fully
-rwxr-xr-x. 1 root   root    73830 Aug 29  2011 check_hpjd
-rwxr-xr-x. 1 root   root   171978 Aug 29  2011 check_http
-r-sr-xr-x. 1 root   root    80061 Aug 29  2011 check_icmp
-rwxr-xr-x. 1 root   root    47064 Aug 29  2011 check_ide_smart
-rwxr-xr-x. 1 root   root    15310 Aug 29  2011 check_ifoperstatus
-rwxr-xr-x. 1 root   root    12853 Aug 29  2011 check_ifstatus
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_imap -> check_tcp
-rwxr-xr-x  1 nagios users   35413 Oct  9 18:09 check_imap_receive
-rwxr-xr-x  1 nagios users   15576 Oct  9 18:09 check_imap_receive_epn
-rwxr-xr-x. 1 root   root     7429 Aug 29  2011 check_ircd
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_jabber -> check_tcp
-rwxr-xr-x. 1 root   root    59134 Aug 29  2011 check_ldap
lrwxrwxrwx. 1 root   root       10 Aug 29  2011 check_ldaps -> check_ldap
-rwxr-xr-x. 1 root   root    59194 Aug 29  2011 check_load
-rwxr-xr-x. 1 root   root     6062 Aug 29  2011 check_log
-rwxr-xr-x. 1 root   root    20367 Aug 29  2011 check_mailq
-rwxr-xr-x. 1 root   root    47300 Aug 29  2011 check_mrtg
-rwxr-xr-x. 1 root   root    46703 Aug 29  2011 check_mrtgtraf
-rwxr-xr-x. 1 root   root    12537 Oct  9 18:09 check_mssql
-rwxr-xr-x  1 nagios users   13759 Jan 31 11:39 check_mssql_database.py
-rwxr-xr-x  1 nagios users   20336 Jan 31 11:39 check_mssql_server.py
-rwxr-xr-x. 1 root   root    78008 Aug 29  2011 check_mysql
-rwxr-xr-x. 1 root   root    98711 Aug 29  2011 check_mysql_health
-rwxr-xr-x. 1 root   root    71648 Aug 29  2011 check_mysql_query
-rwxr-xr-x. 1 root   root    59417 Aug 29  2011 check_nagios
-rwxr-xr-x  1 nagios users    6364 Oct  9 18:09 check_nagios_performance.php
-rwxr-xr-x  1 nagios users   17443 Jan 31 11:39 check_nagiosxiserver.php
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_nntp -> check_tcp
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_nntps -> check_tcp
-rwxrwxr-x  1 nagios nagios  69180 Aug 22  2012 check_nrpe
-rwxr-xr-x. 1 root   root    74851 Aug 29  2011 check_nt
-rwxr-xr-x. 1 root   root    76550 Aug 29  2011 check_ntp
-rwxr-xr-x. 1 root   root    68987 Aug 29  2011 check_ntp_peer
-rwxr-xr-x. 1 root   root    66180 Aug 29  2011 check_ntp_time
-rwxr-xr-x. 1 root   root   102418 Aug 29  2011 check_nwstat
-rwxr-xr-x. 1 root   root     8366 Aug 29  2011 check_oracle
-rwxr-xr-x. 1 root   root    60153 Aug 29  2011 check_overcr
-rwxr-xr-x. 1 root   root    55894 Aug 29  2011 check_pgsql
-rwxr-xr-x. 1 root   root    80896 Aug 29  2011 check_ping
-rwxr-xr-x  1 nagios nagios   6183 Dec 19  2011 check_pnp_rrds.pl
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_pop -> check_tcp
-rwxr-xr-x. 1 root   root   281655 Oct  9 18:09 check_postgres.pl
-rwxr-xr-x. 1 root   root    80817 Aug 29  2011 check_procs
-rwxr-xr-x  1 nagios users   23327 Jan 31 11:40 check_radius_adv
-rwxr-xr-x. 1 root   root    57100 Aug 29  2011 check_real
-rwxr-xr-x. 1 root   root     9707 Aug 29  2011 check_rpc
-rwxr-xr-x. 1 root   root     9232 Aug 29  2011 check_rrdtraf
-rwxr-xr-x. 1 root   root     5299 Aug 29  2011 check_rrdtraf.php
-rwxr-xr-x. 1 root   root     1176 Aug 29  2011 check_sensors
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_simap -> check_tcp
-rwxr-xr-x. 1 root   root     7599 Aug 29  2011 check_sip
-rwxr-xr-x. 1 root   root   119329 Aug 29  2011 check_smtp
-rwxr-xr-x  1 nagios users   20226 Oct  9 18:09 check_smtp_send
-rwxr-xr-x  1 nagios users   10440 Oct  9 18:09 check_smtp_send_epn
-rwxr-xr-x. 1 root   root   119321 Aug 29  2011 check_snmp
-rwxr-xr-x. 1 root   root    10951 Aug 29  2011 check_snmp_boostedge.pl
-rwxr-xr-x. 1 root   root    17866 Aug 29  2011 check_snmp_cpfw.pl
-rwxr-xr-x. 1 root   root     8747 Aug 29  2011 check_snmp_css_main.pl
-rwxr-xr-x. 1 root   root    16786 Aug 29  2011 check_snmp_css.pl
-rwxr-xr-x. 1 root   root    33562 Aug 29  2011 check_snmp_env.pl
-rwxr-xr-x  1 nagios users   23464 Jan 31 11:38 check_snmp_generic.pl
-rwxr-xr-x. 1 root   root    31919 Aug 29  2011 check_snmp_int.pl
-rwxr-xr-x. 1 root   root    10108 Aug 29  2011 check_snmp_linkproof_nhr.pl
-rwxr-xr-x. 1 root   root    22839 Oct  9 18:09 check_snmp_load.pl
-rwxr-xr-x  1 nagios users   22845 Jan 31 11:38 check_snmp_load_wizard.pl
-rwxr-xr-x. 1 root   root    18734 Aug 29  2011 check_snmp_mem.pl
-rwxr-xr-x. 1 root   root    11898 Aug 29  2011 check_snmp_nsbox.pl
-rwxr-xr-x. 1 root   root    26182 Oct  9 18:09 check_snmp_process.pl
-rwxr-xr-x  1 nagios users   26183 Jan 31 11:38 check_snmp_process_wizard.pl
-rwxr-xr-x. 1 root   root    25483 Oct  9 18:09 check_snmp_storage.pl
-rwxr-xr-x  1 nagios users   25484 Jan 31 11:38 check_snmp_storage_wizard.pl
-rwxr-xr-x. 1 root   root    14489 Aug 29  2011 check_snmp_vrrp.pl
-rwxr-xr-x. 1 root   root    12058 Oct  9 18:09 check_snmp_win.pl
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_spop -> check_tcp
-rwxr-xr-x. 1 root   root    54402 Aug 29  2011 check_ssh
-rwxr-xr-x  1 nagios users    8337 Oct  9 18:09 check_ssh_expect.pl
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_ssmtp -> check_tcp
-rwxr-xr-x. 1 root   root    61634 Aug 29  2011 check_swap
-rwxr-xr-x  1 apache apache    971 Mar 14 13:06 check_synology_disk
-rwxr-xr-x  1 apache apache    645 Mar 14 13:06 check_synology_raid
-rwxr-xr-x  1 apache apache   1563 Mar 14 13:04 check_synology_status
-rwxr-xr-x. 1 root   root   105714 Aug 29  2011 check_tcp
-rwxr-xr-x. 1 root   root    57050 Aug 29  2011 check_time
lrwxrwxrwx. 1 root   root        9 Aug 29  2011 check_udp -> check_tcp
-rwxr-xr-x. 1 root   root    67522 Aug 29  2011 check_ups
-rwxr-xr-x. 1 root   root    56145 Aug 29  2011 check_users
-rwxr-xr-x  1 nagios users     169 Mar 14 10:52 check_vmware_config_vcenter01_example
-rwxr-xr-x  1 apache apache  59991 Mar 14 10:48 check_vmware.pl
-rwxr-xr-x. 1 root   root     3019 Aug 29  2011 check_wave
-rwxr-xr-x. 1 root   root      307 Aug 29  2011 check_webinject.sh
-rwxr-xr-x  1 nagios users    7065 Jan 31 11:41 check_win_snmp_disk.pl
-rwxr-xr-x  1 nagios users    2405 Jan 31 11:41 check_wmi_plus.conf
-rwxr-xr-x  1 nagios users   46543 Jan 31 11:41 check_wmi_plus.ini
-rwxr-xr-x  1 nagios users  237155 Jan 31 11:41 check_wmi_plus.pl
-rwxr-xr-x. 1 root   root    60559 Aug 29  2011 negate
-rwxr-xr-x  1 nagios nagios  42724 Dec 19  2011 process_perfdata.pl
-rwxr-xr-x. 1 root   root    48481 Aug 29  2011 send_nsca
-rwxr-xr-x. 1 root   root    55590 Aug 29  2011 urlize
-rwxr-xr-x. 1 root   root     2035 Aug 29  2011 utils.pm
-rwxr-xr-x. 1 root   root      862 Aug 29  2011 utils.sh

Code: Select all


[root@localhost libexec]# ./check_nt -h
check_nt v1991 (nagios-plugins 1.4.13)
Copyright (c) 2000 Yves Rubin ([email protected])
Copyright (c) 2000-2007 Nagios Plugin Development Team
        <[email protected]>

This plugin collects data from the NSClient service running on a
Windows NT/2000/XP/2003 server.


Usage:check_nt -H host -v variable [-p port] [-w warning] [-c critical][-l params] [-d SHOWALL] [-t timeout]

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
Options:
 -H, --hostname=HOST
   Name of the host to check
 -p, --port=INTEGER
   Optional port number (default: 1248)
 -s <password>
   Password needed for the request
 -w, --warning=INTEGER
   Threshold which will result in a warning status
 -c, --critical=INTEGER
   Threshold which will result in a critical status
 -t, --timeout=INTEGER
   Seconds before connection attempt times out (default: 10)
 -h, --help
   Print this help screen
 -V, --version
   Print version information
 -v, --variable=STRING
   Variable to check

Valid variables are:
 CLIENTVERSION = Get the NSClient version
  If -l <version> is specified, will return warning if versions differ.
 CPULOAD =
  Average CPU load on last x minutes.
  Request a -l parameter with the following syntax:
  -l <minutes range>,<warning threshold>,<critical threshold>.
  <minute range> should be less than 24*60.
  Thresholds are percentage and up to 10 requests can be done in one shot.
  ie: -l 60,90,95,120,90,95
 UPTIME =
  Get the uptime of the machine.
  No specific parameters. No warning or critical threshold
 USEDDISKSPACE =
  Size and percentage of disk use.
  Request a -l parameter containing the drive letter only.
  Warning and critical thresholds can be specified with -w and -c.
 MEMUSE =
  Memory use.
  Warning and critical thresholds can be specified with -w and -c.
 SERVICESTATE =
  Check the state of one or several services.
  Request a -l parameters with the following syntax:
  -l <service1>,<service2>,<service3>,...
  You can specify -d SHOWALL in case you want to see working services
  in the returned string.
 PROCSTATE =
  Check if one or several process are running.
  Same syntax as SERVICESTATE.
 COUNTER =
  Check any performance counter of Windows NT/2000.
  Request a -l parameters with the following syntax:
  -l "\\<performance object>\\counter","<description>
  The <description> parameter is optional and is given to a printf
  output command which requires a float parameter.
  If <description> does not include "%%", it is used as a label.
  Some examples:
  "Paging file usage is %%.2f %%%%"
  "%%.f %%%% paging file used."
 INSTANCES =
  Check any performance counter object of Windows NT/2000.
  Syntax: check_nt -H <hostname> -p <port> -v INSTANCES -l <counter object>
  <counter object> is a Windows Perfmon Counter object (eg. Process),
  if it is two words, it should be enclosed in quotes
  The returned results will be a comma-separated list of instances on
   the selected computer for that object.
  The purpose of this is to be run from command line to determine what instances
   are available for monitoring without having to log onto the Windows server
    to run Perfmon directly.
  It can also be used in scripts that automatically create Nagios service
   configuration files.
  Some examples:
  check_nt -H 192.168.1.1 -p 1248 -v INSTANCES -l Process

Notes:
 - The NSClient service should be running on the server to get any information
   (http://nsclient.ready2run.nl).
 - Critical thresholds should be lower than warning thresholds
 - Default port 1248 is sometimes in use by other services. The error
   output when this happens contains "Cannot map xxxxx to protocol number".
   One fix for this is to change the port to something else on check_nt
   and on the client service it's connecting to.

Send email to [email protected] if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to [email protected]
Last edited by abrist on Wed Apr 10, 2013 2:26 pm, edited 1 time in total.
Reason: code wraps save scroll wheels
Locked