Page 1 of 2
check_docker.py is not working
Posted: Tue Nov 17, 2020 11:36 am
by Amit_Alone
Hi,
I'm trying to monitor the docker. Unfortunately it is not working as expected. I went through many of the related forum query but none of them was able to resolve my case.
Attaching the screen shot of service status of my server from my Nagios XI.
One of the service alert is showing OK status but status information is not present. Below is the cmd o/p from the terminal.
Code: Select all
[e5613751@avgdlnxvp126 libexec]$ ./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_cpu' -C 'AGASPCTNVT0009-*******-mcm4387-151074492723' -t 30 --networks-use-avg -w '90:' -c '90:' --debug
containers
['AGASPCTNVT0009-*******-mcm4387-151074492723']
End selection + type
hit containers_list_to_dict
hit is_docker_container_ID
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/json?&filters={%22id%22:%5B%22AGASPCTNVT0009-*******-mcm4387-151074492723%22%5D}&all=1' -k -g -f
hit is_docker_container_name
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/json?&filters={%22name%22:%5B%22AGASPCTNVT0009-*******-mcm4387-151074492723%22%5D}&all=true' -k -g -f
hit get_container_IDs_from_names
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/json?&filters={%22name%22:%5B%22AGASPCTNVT0009-*******-mcm4387-151074492723%22%5D}&all=true' -k -g -f
selection before assigning thresholds
{'AGASPCTNVT0009-*******-mcm4387-151074492723': [u'5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35']}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
threshold maps
{'AGASPCTNVT0009-*******-mcm4387-151074492723': {'value': 0, 'warning': (90.0, inf, False), 'critical': (90.0, inf, False), 'name': 'AGASPCTNVT0009-*******-mcm4387-151074492723', 'containers': [u'5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35']}}
hit get_container_IDs
hit do_check
hit check_containers_CPU
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35/stats?&stream=false' -k -g -f
hit process_value
hit process_usage
container_id_to_usage is
{u'5e6759a22520f7793cbb7b7df546b82e2a3751825153ad5f795c737514974a35': 0.0030965983866722406}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
hit check_all_values_against_thresholds
hit check_against_thresholds
hit check_against_threshold
hit nagios_exit
CRITICAL: |
AGASPCTNVT0009-*******-mcm4387-151074492723 returned CRITICAL (value 0.00309659838667%)
| AGASPCTNVT0009-*******-mcm4387-151074492723=0.00309659838667%;90.0:;90.0:
Please assist me for resolving the same.
Re: check_docker.py is not working
Posted: Tue Nov 17, 2020 5:27 pm
by dchurch
That's odd that it's stating critical, but the web interface says OK. Something is not right there. Is
--debug in the configured check command?
AFAIK dockerd doesn't use SSL unless you configure it to.
Can you post the output from the following commands (run these on the XI server):
Code: Select all
openssl s_client -connect 10.173.14.214:2376 </dev/null |openssl x509 -text
(This one might involve installing nmap using
yum install nmap or the like)
Re: check_docker.py is not working
Posted: Wed Nov 18, 2020 5:13 am
by Amit_Alone
Is --debug in the configured check command?
No, It is configured without --debug mode.
AFAIK dockerd doesn't use SSL unless you configure it to.
Correct, shared URL are accessible using secure mode only i.e. https. Because of that I have made changes in the script by adding -k option.
Code: Select all
cmd = "curl %s '%s' -k -g -f %s %s %s" % (options.socket, full_url, options.cert, options.key, options.cacert)
As requested, attaching the o/p of the requested command.
Re: check_docker.py is not working
Posted: Wed Nov 18, 2020 5:31 pm
by ssax
Based on the plugin output from the CLI it seems the plugin seems to be working.
You could be hitting a bug depending on what XI version you are running. Please PM me a copy of your profile, you can download it from Admin > System Profile by clicking the Download Profile button.
Does the nagios core interface show properly? Use your XI credentials here and check:
The way the plugin is written it seems to only show the full status on the the service details page, so go back to your screenshot and click on the service name to view the full output.
Thank you!
Re: check_docker.py is not working
Posted: Thu Nov 19, 2020 11:26 am
by Amit_Alone
As requested, I have PM the system profile.
Attaching the Nagios core GUI screen shot as suggested. I can see GUI is displaying as expected.
To try different possibilities I have deleted the existing configuration and tried to setup the same configuration only by making one change i.e. under the monitor option I have selected
All visible containers. Previously I have mention individual container name/ID to monitor. To my surprise after this changes all the service were displaying in OK state. However, container Health was displaying unknow. Below is the Docker - Containers Are Healthy service o/p in debug mode from terminal.
Code: Select all
[e5613751@avgdlnxvp126 libexec]$ ./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_healthy' --all -t 30 --ignore-no-healthcheck -l -w '50:' -c '30:' --debug
all
{'total_usage': []}
End selection + type
selection before assigning thresholds
{'total_usage': []}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
threshold maps
{'total_usage': {'value': 0, 'warning': (50.0, inf, False), 'critical': (30.0, inf, False), 'name': 'total_usage', 'containers': []}}
hit get_all_container_IDs
hit get_all_containers
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/json?&all=1' -k -g -f
hit do_check
hit check_containers_healthy
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}' -k -g -f
ERR % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 500 Internal Server Error
STDOUT
hit nagios_exit
UNKNOWN: cURL call failed
[e5613751@avgdlnxvp126 libexec]$
Here, my concern is why individual container were not showing status information whereas if I'm selecting all visible option it is working as expected. Reason for asking is in future we may receive a requirement where selected container need to monitor.
Also, could you please assist me why Health status is showing unknow.
Re: check_docker.py is not working
Posted: Thu Nov 19, 2020 6:08 pm
by ssax
Sorry, what I meant is you need to go view the service status in this interface here by clicking the links and see if the status of the service is proper in the Core interface:
The way the plugin is written you'll need to view the service status details in order to see the multi-line output. If you want that functionality changed I would need to submit a feature request to development to change the way the plugin works.
It's getting a 500 error which means the remote system didn't like what was sent (or is misconfigured somehow).
Does it work if you do this?
Code: Select all
./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_healthy' --all -t 30 -l -w '50:' -c '30:' --debug
What is the output of these commands?
Code: Select all
curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'
curl -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'
Re: check_docker.py is not working
Posted: Mon Nov 23, 2020 8:55 am
by Amit_Alone
As suggested, I tried login on core nagios GUI and observed status information is showing OK state only. However, after clicking on service it was displaying the information.
Code: Select all
OK:
AGASPCTNVT0008-AMEXCBP-mcm4388-151074492754 returned OK (value 0.00619319677334%)
I'm not sure what is been sent. Also, I have not made any changes in the configuration.
Below is the o/p of the command which you have requested.
Code: Select all
[e5613751@avgdlnxvp126 libexec]$ ./check_docker.py -H https://10.173.14.214:2376/ --check-type 'containers_healthy' --all -t 30 -l -w '50:' -c '30:' --debug
all
{'total_usage': []}
End selection + type
selection before assigning thresholds
{'total_usage': []}
hit threshold_string_to_tuple
hit threshold_string_to_tuple
threshold maps
{'total_usage': {'value': 0, 'warning': (50.0, inf, False), 'critical': (30.0, inf, False), 'name': 'total_usage', 'containers': []}}
hit get_all_container_IDs
hit get_all_containers
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/json?&all=1' -k -g -f
hit do_check
hit check_containers_healthy
hit talk_to_docker
curl 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}' -k -g -f
ERR % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (22) The requested URL returned error: 500 Internal Server Error
STDOUT
hit nagios_exit
UNKNOWN: cURL call failed
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$ curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'
* About to connect() to 10.173.14.214 port 2376 (#0)
* Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=AGASPVCHVT0003,O=default
* start date: Sep 30 11:36:47 2020 GMT
* expire date: Oct 01 11:36:47 2021 GMT
* common name: AGASPVCHVT0003
* issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$
[e5613751@avgdlnxvp126 libexec]$ curl -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D}'
* About to connect() to 10.173.14.214 port 2376 (#0)
* Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=AGASPVCHVT0003,O=default
* start date: Sep 30 11:36:47 2020 GMT
* expire date: Oct 01 11:36:47 2021 GMT
* common name: AGASPVCHVT0003
* issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={%22health%22:%5B%22unhealthy%22%5D} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
[e5613751@avgdlnxvp126 libexec]$
Re: check_docker.py is not working
Posted: Tue Nov 24, 2020 2:47 pm
by cdienger
The 10.173.13.214 is generating a 500 internal error when the request is made. It may not like the request or there may be another issue on the server that is causing this error. We can try submitting the request a bit differently. Try running:
Code: Select all
curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
curl -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
Re: check_docker.py is not working
Posted: Wed Nov 25, 2020 5:11 am
by Amit_Alone
Here is the requested cmd o/p.
Code: Select all
[e5613751@avgdlnxvp126 ~]$ curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
* About to connect() to 10.173.14.214 port 2376 (#0)
* Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=AGASPVCHVT0003,O=default
* start date: Sep 30 11:36:47 2020 GMT
* expire date: Oct 01 11:36:47 2021 GMT
* common name: AGASPVCHVT0003
* issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={"health":["unhealthy"]} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
[e5613751@avgdlnxvp126 ~]$ curl -k -L -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
* About to connect() to 10.173.14.214 port 2376 (#0)
* Trying 10.173.14.214...
* Connected to 10.173.14.214 (10.173.14.214) port 2376 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=AGASPVCHVT0003,O=default
* start date: Sep 30 11:36:47 2020 GMT
* expire date: Oct 01 11:36:47 2021 GMT
* common name: AGASPVCHVT0003
* issuer: CN=AGASPVCHVT0003,O=default
> GET /containers/json?&filters={"health":["unhealthy"]} HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.173.14.214:2376
> Accept: */*
>
* The requested URL returned error: 500 Internal Server Error
* Closing connection 0
curl: (22) The requested URL returned error: 500 Internal Server Error
Re: check_docker.py is not working
Posted: Wed Nov 25, 2020 5:40 pm
by cdienger
Does it take a long time to get the results? Something like this would indicate a timeout occuring. Time it with:
Code: Select all
time curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?&filters={"health":["unhealthy"]}'
also see if it returns anything when you run:
Code: Select all
curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?all=1'
It may provide a lot of output and you can save the output to a file with:
Code: Select all
curl -k -g -f -vvv 'https://10.173.14.214:2376/containers/json?all=1' > output.txt