Bug #4839
docker.io sometimes returns EOF, breaking our builds
100%
Description
We have plenty of situations where docker.io seemingly returns EOF (i.e. nothing) when pulling a base image like debian:stretch
. The failure to pull will cause our jenkins job (e.g. a TTCN3 test) to fail, despite no failure on our side.
This has appeared even before docker introduced rate limiting today, so it is unrelated to that.
Related issues
History
#1 Updated by laforge 3 months ago
- Status changed from New to In Progress
- % Done changed from 0 to 10
I tried to use our docker/registry instance at registry.sysmocom.de as a 'pull-throug cache' as documented at https://docs.docker.com/registry/recipes/mirror/
This is broken, it is a known bug in docker since 2016, see https://github.com/docker/distribution/issues/1486 and many other reports like https://www.reddit.com/r/docker/comments/bek6yv/how_do_you_do_registrymirror_with_auth/
So what we are moving towards is a setup where:- one jenkins job does a daily pull of all our base images from docker.io, and pushes them to the private registry
- our jenkins jobs will then always pull directly from that private registry instead of the public one
If the pull from docker.io then fails occasionally, it will fail that re-sync jenkins job, but the (ttcn3 and other) jobs that verify osmocom software will not fail, and simply use the 1..N days old base image.
#2 Updated by laforge 3 months ago
https://gerrit.osmocom.org/c/docker-playground/+/21019 prepares our Dockerfiles with a way to override the registry when building images.
#3 Updated by laforge 3 months ago
- Status changed from In Progress to Resolved
- % Done changed from 10 to 100
Related patches all merged, hopefully those problems are now gone.
- https://gerrit.osmocom.org/c/docker-playground/+/21019
- https://gerrit.osmocom.org/c/osmo-ci/+/21021
- https://gerrit.osmocom.org/c/osmo-ci/+/21023
I've manually verified that the registry-update-base-images job works, and also executed ttcn3-stp-test once to see if it actually pulls from registry.osmocom.org now.
#4 Updated by laforge 3 months ago
- Related to Feature #4840: migrate osmo-gsm-tester docker images to registry.osmocom.org added
#5 Updated by laforge 3 months ago
And of course, on day 1 of this new mechansim, we see:
- the docker image update job failing:
[registry-update-base-images] $ /bin/sh -xe /tmp/jenkins5987388568045535390.sh + REGISTRY=registry.osmocom.org + IMAGES=debian:stretch debian:buster debian:jessie debian:sid ubuntu:zesty centos:centos8 + src=debian:stretch + dst=registry.osmocom.org/debian:stretch + echo + echo ======= debian:stretch ======= debian:stretch + docker pull debian:stretch Error response from daemon: Get https://registry-1.docker.io/v2/library/debian/manifests/stretch: EOF Build step 'Execute shell' marked build as failure
while all other builds succeed, using base images from registry.osmocom.org.
yay.