check_docker.py is not working

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

check_docker.py is not working

Post by Amit_Alone »

Hi,

I'm trying to monitor the docker. Unfortunately it is not working as expected. I went through many of the related forum query but none of them was able to resolve my case.

Attaching the screen shot of service status of my server from my Nagios XI.

One of the service alert is showing OK status but status information is not present. Below is the cmd o/p from the terminal.

Code: Select all

[e5613751@avgdlnxvp126 libexec]$ ./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_cpu' -C 'AGASPCTNVT0009-*******-mcm4387-151074492723' -t 30 --networks-use-avg -w '90:' -c '90:' --debug
containers
['AGASPCTNVT0009-*******-mcm4387-151074492723']
End selection + type
hit containers_list_to_dict
hit is_docker_container_ID
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/json?&filters={%22id%22:%5B%22AGASPCTNVT0009-*******-mcm4387-151074492723%22%5D}&all=1' -k -g -f
hit is_docker_container_name
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/json?&filters={%22name%22:%5B%22AGASPCTNVT0009-*******-mcm4387-151074492723%22%5D}&all=true' -k -g -f
hit get_container_IDs_from_names
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/json?&filters={%22name%22:%5B%22AGASPCTNVT0009-*******-mcm4387-151074492723%22%5D}&all=true' -k -g -f
selection before assigning thresholds
{'AGASPCTNVT0009-*******-mcm4387-151074492723': [u'5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35']}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
threshold maps
{'AGASPCTNVT0009-*******-mcm4387-151074492723': {'value': 0, 'warning': (90.0, inf, False), 'critical': (90.0, inf, False), 'name': 'AGASPCTNVT0009-*******-mcm4387-151074492723', 'containers': [u'5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35']}}
hit get_container_IDs
hit do_check
hit check_containers_CPU
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35/stats?&stream=false' -k -g -f
hit process_value
hit process_usage
container_id_to_usage is
{u'5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35': 0.0030965983866722406}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
hit check_all_values_against_thresholds
hit check_against_thresholds
hit check_against_threshold
hit nagios_exit
CRITICAL:  |
AGASPCTNVT0009-*******-mcm4387-151074492723 returned CRITICAL (value 0.00309659838667%)
 | AGASPCTNVT0009-*******-mcm4387-151074492723=0.00309659838667%;90.0:;90.0:
Please assist me for resolving the same.
You do not have the required permissions to view the files attached to this post.
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: check_docker.py is not working

Post by dchurch »

That's odd that it's stating critical, but the web interface says OK. Something is not right there. Is --debug in the configured check command?

AFAIK dockerd doesn't use SSL unless you configure it to.

Can you post the output from the following commands (run these on the XI server):

Code: Select all

curl http://10.173.14.214:2376

Code: Select all

curl https://10.173.14.214:2376

Code: Select all

openssl s_client -connect 10.173.14.214:2376 </dev/null |openssl x509 -text
(This one might involve installing nmap using yum install nmap or the like)

Code: Select all

nmap -PN -p 2376 10.173.14.214
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: check_docker.py is not working

Post by Amit_Alone »

Is --debug in the configured check command?
No, It is configured without --debug mode.
AFAIK dockerd doesn't use SSL unless you configure it to.
Correct, shared URL are accessible using secure mode only i.e. https. Because of that I have made changes in the script by adding -k option.

Code: Select all

cmd = "curl %s '%s' -k -g -f %s %s %s" % (options.socket, full_url, options.cert, options.key, options.cacert)
As requested, attaching the o/p of the requested command.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: check_docker.py is not working

Post by ssax »

Based on the plugin output from the CLI it seems the plugin seems to be working.

You could be hitting a bug depending on what XI version you are running. Please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.

Does the nagios core interface show properly? Use your XI credentials here and check:

Code: Select all

http://YOURXISERVER/nagios/
The way the plugin is written it seems to only show the full status on the the service details page, so go back to your screenshot and click on the service name to view the full output.

Thank you!
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: check_docker.py is not working

Post by Amit_Alone »

As requested, I have PM the system profile.

Attaching the Nagios core GUI screen shot as suggested. I can see GUI is displaying as expected.

To try different possibilities I have deleted the existing configuration and tried to setup the same configuration only by making one change i.e. under the monitor option I have selected All visible containers. Previously I have mention individual container name/ID to monitor. To my surprise after this changes all the service were displaying in OK state. However, container Health was displaying unknow. Below is the Docker - Containers Are Healthy service o/p in debug mode from terminal.

Code: Select all

[e5613751@avgdlnxvp126 libexec]$ ./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_healthy' --all -t 30 --ignore-no-healthcheck -l -w '50:' -c '30:' --debug
all
{'total_usage': []}
End selection + type
selection before assigning thresholds
{'total_usage': []}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
threshold maps
{'total_usage': {'value': 0, 'warning': (50.0, inf, False), 'critical': (30.0, inf, False), 'name': 'total_usage', 'containers': []}}
hit get_all_container_IDs
hit get_all_containers
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/json?&all=1' -k -g -f
hit do_check
hit check_containers_healthy
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}' -k -g -f
ERR   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 500 Internal Server Error

STDOUT
hit nagios_exit
UNKNOWN: cURL call failed
[e5613751@avgdlnxvp126 libexec]$
Here, my concern is why individual container were not showing status information whereas if I'm selecting all visible option it is working as expected. Reason for asking is in future we may receive a requirement where selected container need to monitor.

Also, could you please assist me why Health status is showing unknow.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: check_docker.py is not working

Post by ssax »

Sorry, what I meant is you need to go view the service status in this interface here by clicking the links and see if the status of the service is proper in the Core interface:

Code: Select all

http://YOURXISERVER/nagios/
The way the plugin is written you'll need to view the service status details in order to see the multi-line output. If you want that functionality changed I would need to submit a feature request to development to change the way the plugin works.

It's getting a 500 error which means the remote system didn't like what was sent (or is misconfigured somehow).

Does it work if you do this?

Code: Select all

./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_healthy' --all -t 30 -l -w '50:' -c '30:' --debug
What is the output of these commands?

Code: Select all

curl  -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'

curl  -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: check_docker.py is not working

Post by Amit_Alone »

As suggested, I tried login on core nagios GUI and observed status information is showing OK state only. However, after clicking on service it was displaying the information.

Code: Select all

OK:
AGASPCTNVT0008-AMEXCBP-mcm4388-151074492754 returned OK (value 0.00619319677334%)
I'm not sure what is been sent. Also, I have not made any changes in the configuration.

Below is the o/p of the command which you have requested.

Code: Select all

[e5613751@avgdlnxvp126 libexec]$ ./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_healthy' --all -t 30 -l -w '50:' -c '30:' --debug
all
{'total_usage': []}
End selection + type
selection before assigning thresholds
{'total_usage': []}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
threshold maps
{'total_usage': {'value': 0, 'warning': (50.0, inf, False), 'critical': (30.0, inf, False), 'name': 'total_usage', 'containers': []}}
hit get_all_container_IDs
hit get_all_containers
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/json?&all=1' -k -g -f
hit do_check
hit check_containers_healthy
hit talk_to_docker
curl  'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}' -k -g -f
ERR   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 500 Internal Server Error

STDOUT
hit nagios_exit
UNKNOWN: cURL call failed
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$ curl  -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'
* About to connect() to 10.173.14.214 port 2376 (#0)
*   Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
*       subject: CN=AGASPVCHVT0003,O=default
*       start date: Sep 30 11:36:47 2020 GMT
*       expire date: Oct 01 11:36:47 2021 GMT
*       common name: AGASPVCHVT0003
*       issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$ curl  -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'
* About to connect() to 10.173.14.214 port 2376 (#0)
*   Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
*       subject: CN=AGASPVCHVT0003,O=default
*       start date: Sep 30 11:36:47 2020 GMT
*       expire date: Oct 01 11:36:47 2021 GMT
*       common name: AGASPVCHVT0003
*       issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
[e5613751@avgdlnxvp126 libexec]$
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: check_docker.py is not working

Post by cdienger »

The 10.173.13.214 is generating a 500 internal error when the request is made. It may not like the request or there may be another issue on the server that is causing this error. We can try submitting the request a bit differently. Try running:

Code: Select all

curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
curl -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Amit_Alone
Posts: 89
Joined: Fri May 08, 2020 11:47 am

Re: check_docker.py is not working

Post by Amit_Alone »

Here is the requested cmd o/p.

Code: Select all

[e5613751@avgdlnxvp126 ~]$ curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
* About to connect() to 10.173.14.214 port 2376 (#0)
*   Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
*       subject: CN=AGASPVCHVT0003,O=default
*       start date: Sep 30 11:36:47 2020 GMT
*       expire date: Oct 01 11:36:47 2021 GMT
*       common name: AGASPVCHVT0003
*       issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={"health":["unhealthy"]} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
[e5613751@avgdlnxvp126 ~]$ curl -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
* About to connect() to 10.173.14.214 port 2376 (#0)
*   Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
*       subject: CN=AGASPVCHVT0003,O=default
*       start date: Sep 30 11:36:47 2020 GMT
*       expire date: Oct 01 11:36:47 2021 GMT
*       common name: AGASPVCHVT0003
*       issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={"health":["unhealthy"]} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: check_docker.py is not working

Post by cdienger »

Does it take a long time to get the results? Something like this would indicate a timeout occuring. Time it with:

Code: Select all

time curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
also see if it returns anything when you run:

Code: Select all

curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?all=1'
It may provide a lot of output and you can save the output to a file with:

Code: Select all

curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?all=1' > output.txt
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked