Bug #5447
closeddocker "failed to get digest"
100%
Description
Every so often we have spontaneous build failures on our jenkins slave looking like this:
failed to get digest sha256:e8b3f56b281aa832fb0664d1553d17d2dc93217ece10b440bd9c2c492107c31a: open /var/lib/docker/image/overlay2/imagedb/content/sha256/e8b3f56b281aa832fb0664d1553d17d2dc93217ece10b440bd9c2c492107c31a: no such file or directory ../make/Makefile:87: recipe for target 'docker-build' failed make: *** [docker-build] Error 1
from https://jenkins.osmocom.org/jenkins/view/All%20no%20Gerrit/job/nplab-m3ua-test/1677/console
To me this looks like something has purged intermediate docker layers while the docker build is running? Something like our docker cleanup tasks?
The job example above is running at 3am (UTC?) on host2-deb9build-ansible
there are more examples in the recent build history with similar problems:
https://jenkins.osmocom.org/jenkins/view/All%20no%20Gerrit/job/nplab-m3ua-test/1667/console
Updated by osmith about 2 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 90
Yes, this is caused by the docker-cleanup script. This will be resolved with the patches in https://gerrit.osmocom.org/q/topic:docker-clean, as only images that have not been used for the longest time will be removed with this until a size limit is reached. Images that are currently in use should not get removed anymore.
Updated by osmith about 2 years ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
Updated by osmith about 2 years ago
- Status changed from Resolved to In Progress
- % Done changed from 100 to 90
This still happens, even with the new clean up script. As I understand it: when we start building an image, all intermediate steps are dangling images, until at the very end the last image gets tagged and only then each step is not dangling anymore. Looks like when the clean up script runs after the image build was started, but before the last step is finished, it will remove the images from the steps finished and then we get the error message and it fails.
I've adjusted the timer by 10 minutes, this should fix it.
https://gerrit.osmocom.org/c/osmo-ci/+/27349
Updated by osmith about 2 years ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
Applied in changeset osmo-ci|f5ab1346db7243dba9a4450f1051e6491272db41.