Hello, I'm looking for some additional troubleshooting steps around not being able to reach the NCPA agent primarily on linux devices.
Here is the command and response:
/usr/local/nagios/libexec/check_http -H <remote host name> -s "Success" -f ok -u '/testconnect/?token=<token>' -S --sni -p 5693
CRITICAL - Socket timeout
We've verified the server is accessible, port is open, and have restarted services. Are there any other troubleshooting steps that can be performed to figure out why the connection is timing out?
NCPA Socket Timeouts
Re: NCPA Socket Timeouts
Hello @meganwilliford
Thanks for reaching out about the socket errors.
Are you able to view the 'ncpa' web console and run it via [api] > menu:
Copy the 'check_http' plugin to the host plugin directory that you want to check and run the following:
Allow hosts in the 'ncpa.cfg' configured to allow found in the '..../nagios/ncpa/etc/' directory:
Perry
Thanks for reaching out about the socket errors.
Are you able to view the 'ncpa' web console and run it via [api] > menu:
Code: Select all
https://yourhostaddresshere:5693Code: Select all
/usr/local/nagios/libexec/check_ncpa.py -H yourhostaddresshere -t yourtokenhere -M 'plugins/check_http' <your args here>Let us know how things look,allowed_hosts = xxx.xxx.xxx.0/24
Perry
-
meganwilliford
- Posts: 101
- Joined: Tue Aug 06, 2019 7:49 am
Re: NCPA Socket Timeouts
Thanks for those steps! So what is the difference in stability or success of reaching the agent by adding the plugin to the remote host (plugins/check_http) vs checking the remote host agent api (https://<remote host name>:5693/api/?token=<token>)?
I ask because we are seeing a couple scenarios where the remote host agent connection is constantly flapping and then another scenario where we can't reach the agent at all, both scenrios result in Socket Timeouts. Do you know what would cause the flapping and what would cause the timeouts when all the configurations look okay?
Thanks!
I ask because we are seeing a couple scenarios where the remote host agent connection is constantly flapping and then another scenario where we can't reach the agent at all, both scenrios result in Socket Timeouts. Do you know what would cause the flapping and what would cause the timeouts when all the configurations look okay?
Thanks!
Re: NCPA Socket Timeouts
Hello @meganwilliford
Thanks for following up, basically the api and backend command are doing the same function.
Want to suggest that running the ncpa checks with verbose to get more info from the 'socket error' to help define what is going on:
Thanks for following up, basically the api and backend command are doing the same function.
Want to suggest that running the ncpa checks with verbose to get more info from the 'socket error' to help define what is going on:
Code: Select all
/usr/local/nagios/libexec/check_ncpa.py -H yourhostaddresshere -t yourtokenhere -T <increasetimeout> --verbose -M 'plugins/check_http' <your args here>
[quote] -T TIMEOUT, --timeout=TIMEOUT
Enforced timeout, will terminate plugins after this
amount of seconds. [60]
-v, --verbose Print more verbose error messages.
-D, --debug Print LOTS of error messages. Used mostly for
debugging.[/quote]
Please let us know how things look,
Perry-
meganwilliford
- Posts: 101
- Joined: Tue Aug 06, 2019 7:49 am
Re: NCPA Socket Timeouts
-D was an invalid option and -v only returned "CRITICAL - Socket timeout".
I instead tried this: curl -k "https://<ip address>:5693/testconnect/?token=<token>" -v.
A successful return looks like this:
[root@<nagios xi host> ~]# curl -k "https://<remote host>:5693/testconnect/?token=<token>" -v
* About to connect() to <remote host> port 5693 (#0)
* Trying <remote host IP>...
* Connected to <remote host> (<remote host IP>) port 5693 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=<remote host>,OU=Development,O="Nagios Enterprises, LLC",L=St. Paul,ST=Minnesota,C=US
* start date: Dec 16 04:39:33 2019 GMT
* expire date: Dec 13 04:39:33 2029 GMT
* common name: <remote host>
* issuer: CN=<remote host>,OU=Development,O="Nagios Enterprises, LLC",L=St. Paul,ST=Minnesota,C=US
> GET /testconnect/?token=<token> HTTP/1.1
> User-Agent: curl/7.29.0
> Host: <remote host>:5693
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 25
< X-Frame-Options: SAMEORIGIN
< Content-Security-Policy: frame-ancestors 'self'
< Date: Wed, 20 Oct 2021 14:26:01 GMT
<
{
"value": "Success."
* Connection #0 to host <remote host> left intact
And then I ran it again after it was successful and it timed out:
[root@<nagios xi host> ~]# curl -k "https://<remote host>:5693/testconnect/?token=<token>" -v
* About to connect() to <remote host> port 5693 (#0)
* Trying <remote host IP>...
* Connected to <remote host> (<remote host IP>) port 5693 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* Operation timed out after 300417 milliseconds with 0 out of 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 300417 milliseconds with 0 out of 0 bytes received
Any ideas why the connection is intermittent? Is there a consecutive number of connections restriction? The timeout is happening at this step after about 5 minutes: Initializing NSS with certpath: sql:/etc/pki/nssdb.
I instead tried this: curl -k "https://<ip address>:5693/testconnect/?token=<token>" -v.
A successful return looks like this:
[root@<nagios xi host> ~]# curl -k "https://<remote host>:5693/testconnect/?token=<token>" -v
* About to connect() to <remote host> port 5693 (#0)
* Trying <remote host IP>...
* Connected to <remote host> (<remote host IP>) port 5693 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=<remote host>,OU=Development,O="Nagios Enterprises, LLC",L=St. Paul,ST=Minnesota,C=US
* start date: Dec 16 04:39:33 2019 GMT
* expire date: Dec 13 04:39:33 2029 GMT
* common name: <remote host>
* issuer: CN=<remote host>,OU=Development,O="Nagios Enterprises, LLC",L=St. Paul,ST=Minnesota,C=US
> GET /testconnect/?token=<token> HTTP/1.1
> User-Agent: curl/7.29.0
> Host: <remote host>:5693
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 25
< X-Frame-Options: SAMEORIGIN
< Content-Security-Policy: frame-ancestors 'self'
< Date: Wed, 20 Oct 2021 14:26:01 GMT
<
{
"value": "Success."
* Connection #0 to host <remote host> left intact
And then I ran it again after it was successful and it timed out:
[root@<nagios xi host> ~]# curl -k "https://<remote host>:5693/testconnect/?token=<token>" -v
* About to connect() to <remote host> port 5693 (#0)
* Trying <remote host IP>...
* Connected to <remote host> (<remote host IP>) port 5693 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* Operation timed out after 300417 milliseconds with 0 out of 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 300417 milliseconds with 0 out of 0 bytes received
Any ideas why the connection is intermittent? Is there a consecutive number of connections restriction? The timeout is happening at this step after about 5 minutes: Initializing NSS with certpath: sql:/etc/pki/nssdb.
Re: NCPA Socket Timeouts
Hello @meganwilliford
Thanks for following up and want to get a copy of your environment System Profile to see what is going on.
To send us your system profile.
1. Login to the Nagios XI GUI using a web browser.
2. Click the "Admin" > "System Profile" Menu
3. Click the "Download Profile" button
4. Save the profile.zip file and share this in a private message
Thanks,
Perry
Thanks for following up and want to get a copy of your environment System Profile to see what is going on.
To send us your system profile.
1. Login to the Nagios XI GUI using a web browser.
2. Click the "Admin" > "System Profile" Menu
3. Click the "Download Profile" button
4. Save the profile.zip file and share this in a private message
Thanks,
Perry
-
meganwilliford
- Posts: 101
- Joined: Tue Aug 06, 2019 7:49 am
Re: NCPA Socket Timeouts
Okay thanks, I will send that over in a PM. In the meantime, I think there has been some resolution. We found there were a few hung file system mounts. Those got cleared and the monitoring started working again. Is there anything that could help explain that? Is the NCPA dependent on being able to access the file system it's installed in?
Re: NCPA Socket Timeouts
Hello @meganwillford
Sounds like the NCPA access read/write config, logging on that mount point was affected.
Please let us know how things look,
Perry
Sounds like the NCPA access read/write config, logging on that mount point was affected.
Please let us know how things look,
Perry