Project

General

Profile

Actions

Bug #5802

closed

don't globally block/conflict TTCN3 jobs; just don't run same job on same node multiple times

Added by laforge 2 months ago. Updated 21 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
11/29/2022
Due date:
% Done:

100%

Spec Reference:

Description

We have jobs, particularly the ttcn3 dockerized test jobs, which can not run multiple times on the same node as they use static IP addresses and docker refuses to have the same addresses in multiple isolated networks/namespaces.

Right now the "workaround" is to globally block the job. So if any of our TTCN3 bts tests (e.g. master) is running anywhere, we won't do another BTS test (e.g. latest, centos) on any other executor. Particularly since those BTS/BSC tests are taking hours to execute, this is causing significant delays.

There are multiple suggestions in
https://stackoverflow.com/questions/36454130/how-do-i-prevent-two-pipeline-jenkins-jobs-of-the-same-type-to-run-in-parallel-o how to achieve the desired behavior. I of course don't know if any of them work, but it might be worth having a look.

Actions #1

Updated by laforge 2 months ago

it looks like https://plugins.jenkins.io/throttle-concurrents/ could do the trick: throttling the number of concurrend builds ofa project running per node...

Actions #2

Updated by osmith about 2 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 70

docker refuses to have the same addresses in multiple isolated networks/namespaces.

I've changed the logic in the jenkins.sh scripts to automatically find a free subnet. This way blocking shouldn't be necessary anymore.

Patches: https://gerrit.osmocom.org/q/topic:ttcn3-subnet

(I'll wait with rolling this out until after the holidays, as this is a major change.)

Actions #3

Updated by osmith 22 days ago

  • % Done changed from 70 to 80
Actions #4

Updated by osmith 21 days ago

There's some fallout, multiple ttcn3 jobs are failing now. Looking into it.

Actions #5

Updated by osmith 21 days ago

  • % Done changed from 80 to 90
Actions #6

Updated by osmith 21 days ago

After the fixups, looks like all testsuites are running properly again. I've restarted them and removed the failed entry of today.

One more patch to increase a timeout that failed: https://gerrit.osmocom.org/c/docker-playground/+/31001

Actions #7

Updated by osmith 21 days ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)