Project

General

Profile

Actions

Bug #5139

closed

rpi4build3 out of disk space

Added by laforge almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
04/29/2021
Due date:
% Done:

100%

Spec Reference:

Description

osmo-bts builds started failing
https://jenkins.osmocom.org/jenkins/job/gerrit-osmo-pcu/1937/FIRMWARE_VERSION=master,WITH_MANUALS=0,label=rpi4-raspbian10,with_dsp=none,with_vty=False/console

root@rpi4build3:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        59G   56G     0 100% /

it looks like whatever jobs that normally run to periodically clean up space (e.g. docker system prune) on our build slaves are not running on this slave. Please investigate and check if that also applies to other rpi4build slaves.

I've manually started a 'docker system prune' after docker image ls showed lots of up to 7 weeks old images. It's still running so I don't know how much space will be recovered in this manual run.

Actions #1

Updated by laforge almost 3 years ago

already 28GB recovered by 'docker system prune' by now, and still runnning.

Actions #2

Updated by laforge almost 3 years ago

Total reclaimed space: 38.19GB

Actions #3

Updated by osmith almost 3 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 90

There is an ansible rule that sets up a cron job to run docker-cleanup.sh: https://gerrit.osmocom.org/c/osmo-ci/+/7716/

This has been set up on the rpi nodes successfully:

# cat /var/spool/cron/crontabs/osmocom-build
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (/tmp/crontabPVPUxS installed on Mon Mar  8 09:17:27 2021)
# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
#Ansible: cleanup-docker-images
0 */3 * * * test -x /home/osmocom-build/osmo-ci/scripts/docker-cleanup.sh && /home/osmocom-build/osmo-ci/scripts/docker-cleanup.sh >/dev/null

However, the cron job did not get executed because the file was owned by the wrong uid 1001 instead of 1000:

# ls -l /var/spool/cron/crontabs/
total 4
-rw------- 1 1001 crontab 366 Mar  8 09:17 osmocom-build

On the raspbian nodes this has happened because:
  • there was a "pi" user with uid=1000
  • existing ansible rules created "osmocom-build" without specifying the uid, so it became uid=1001
  • running the docker role created the crontab with uid=1001
  • as docker-playground.git assumes that its containers run as uid=1000, I had modified the ansible rules to remove the "pi" user and set "osmocom-build"'s uid to 1000
  • after running the updated rule, ansible changed the uid of the already created user to 1000 but did not change ownership of the crontab file

So it's a problem with these three provisioned nodes specifically and won't happen when we provision new nodes. I've changed ownership of the file manually on these three:

# chown osmocom-build /var/spool/cron/crontabs/osmocom-build
# systemctl restart cron

I'll check back later to verify that the cronjob actually ran and then I'll close this issue.

Actions #4

Updated by osmith almost 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100

Working as expected.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)