Page 1 of 1

NCPA Socket Timeouts

Posted: Tue Oct 19, 2021 9:14 am
by meganwilliford
Hello, I'm looking for some additional troubleshooting steps around not being able to reach the NCPA agent primarily on linux devices.

Here is the command and response:

/usr/local/nagios/libexec/check_http -H <remote host name> -s "Success" -f ok -u '/testconnect/?token=<token>' -S --sni -p 5693
CRITICAL - Socket timeout

We've verified the server is accessible, port is open, and have restarted services. Are there any other troubleshooting steps that can be performed to figure out why the connection is timing out?

Re: NCPA Socket Timeouts

Posted: Tue Oct 19, 2021 3:39 pm
by pbroste
Hello @meganwilliford

Thanks for reaching out about the socket errors.

Are you able to view the 'ncpa' web console and run it via [api] > menu:

Code: Select all

https://yourhostaddresshere:5693
Copy the 'check_http' plugin to the host plugin directory that you want to check and run the following:

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H yourhostaddresshere -t yourtokenhere -M 'plugins/check_http'  <your args here>
Allow hosts in the 'ncpa.cfg' configured to allow found in the '..../nagios/ncpa/etc/' directory:
allowed_hosts = xxx.xxx.xxx.0/24
Let us know how things look,
Perry

Re: NCPA Socket Timeouts

Posted: Tue Oct 19, 2021 4:01 pm
by meganwilliford
Thanks for those steps! So what is the difference in stability or success of reaching the agent by adding the plugin to the remote host (plugins/check_http) vs checking the remote host agent api (https://<remote host name>:5693/api/?token=<token>)?

I ask because we are seeing a couple scenarios where the remote host agent connection is constantly flapping and then another scenario where we can't reach the agent at all, both scenrios result in Socket Timeouts. Do you know what would cause the flapping and what would cause the timeouts when all the configurations look okay?

Thanks!

Re: NCPA Socket Timeouts

Posted: Tue Oct 19, 2021 4:39 pm
by pbroste
Hello @meganwilliford

Thanks for following up, basically the api and backend command are doing the same function.

Want to suggest that running the ncpa checks with verbose to get more info from the 'socket error' to help define what is going on:

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H yourhostaddresshere -t yourtokenhere -T <increasetimeout> --verbose -M 'plugins/check_http'  <your args here>

[quote] -T TIMEOUT, --timeout=TIMEOUT
                        Enforced timeout, will terminate plugins after this
                        amount of seconds. [60]
-v, --verbose         Print more verbose error messages.
  -D, --debug           Print LOTS of error messages. Used mostly for
                        debugging.[/quote]

Please let us know how things look,
Perry

Re: NCPA Socket Timeouts

Posted: Wed Oct 20, 2021 9:55 am
by meganwilliford
-D was an invalid option and -v only returned "CRITICAL - Socket timeout".

I instead tried this: curl -k "https://<ip address>:5693/testconnect/?token=<token>" -v.

A successful return looks like this:

[root@<nagios xi host> ~]# curl -k "https://<remote host>:5693/testconnect/?token=<token>" -v
* About to connect() to <remote host> port 5693 (#0)
* Trying <remote host IP>...
* Connected to <remote host> (<remote host IP>) port 5693 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=<remote host>,OU=Development,O="Nagios Enterprises, LLC",L=St. Paul,ST=Minnesota,C=US
* start date: Dec 16 04:39:33 2019 GMT
* expire date: Dec 13 04:39:33 2029 GMT
* common name: <remote host>
* issuer: CN=<remote host>,OU=Development,O="Nagios Enterprises, LLC",L=St. Paul,ST=Minnesota,C=US
> GET /testconnect/?token=<token> HTTP/1.1
> User-Agent: curl/7.29.0
> Host: <remote host>:5693
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 25
< X-Frame-Options: SAMEORIGIN
< Content-Security-Policy: frame-ancestors 'self'
< Date: Wed, 20 Oct 2021 14:26:01 GMT
<
{
"value": "Success."
* Connection #0 to host <remote host> left intact




And then I ran it again after it was successful and it timed out:

[root@<nagios xi host> ~]# curl -k "https://<remote host>:5693/testconnect/?token=<token>" -v
* About to connect() to <remote host> port 5693 (#0)
* Trying <remote host IP>...
* Connected to <remote host> (<remote host IP>) port 5693 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* Operation timed out after 300417 milliseconds with 0 out of 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 300417 milliseconds with 0 out of 0 bytes received



Any ideas why the connection is intermittent? Is there a consecutive number of connections restriction? The timeout is happening at this step after about 5 minutes: Initializing NSS with certpath: sql:/etc/pki/nssdb.

Re: NCPA Socket Timeouts

Posted: Wed Oct 20, 2021 12:18 pm
by pbroste
Hello @meganwilliford

Thanks for following up and want to get a copy of your environment System Profile to see what is going on.

To send us your system profile.
1. Login to the Nagios XI GUI using a web browser.
2. Click the "Admin" > "System Profile" Menu
3. Click the "Download Profile" button
4. Save the profile.zip file and share this in a private message

Thanks,
Perry

Re: NCPA Socket Timeouts

Posted: Wed Oct 20, 2021 12:47 pm
by meganwilliford
Okay thanks, I will send that over in a PM. In the meantime, I think there has been some resolution. We found there were a few hung file system mounts. Those got cleared and the monitoring started working again. Is there anything that could help explain that? Is the NCPA dependent on being able to access the file system it's installed in?

Re: NCPA Socket Timeouts

Posted: Wed Oct 20, 2021 3:50 pm
by pbroste
Hello @meganwillford

Sounds like the NCPA access read/write config, logging on that mount point was affected.

Please let us know how things look,
Perry