Project

General

Profile

Bug #3660

TTCN3 Jenkins jobs leave Docker containers running when stopped

Added by osmith about 1 year ago. Updated 13 days ago.

Status:
Stalled
Priority:
Normal
Assignee:
Target version:
-
Start date:
10/17/2018
Due date:
% Done:

20%

Spec Reference:
Tags:

Description

The jenkins.sh scripts start multiple containers and stops them at the end of the script. When aborting a job with the web interface, the script gets killed and can not do the clean up part. Jenkins kills all child processes, but not the containers as they get spawned by the docker daemon.

Ideas to resolve this:
  • open a bug report at Jenkins to not kill processes with -9 in a bug report, then wait for a stable release with the fix, then implement clean up on receiving SIGABRT in jenkins.sh
  • try out the PostBuildScript Plugin (or Post build task, that should work for sure), see if they allow executing code on abort of the job
  • try to attach the environment variable that jenkins is using to decide which processes will be killed (see link above) to the processes in the Docker containers

Checklist

  • fix lands in jenkins master
  • fix lands in jenkins release
  • release with fix is deployed on jenkins.osmocom.org
  • verify that trap with cleanup function on TERM is working now
  • write proper cleanup function to stop docker containers and use it in the jenkins.sh scripts of docker-playground.git

History

#1 Updated by laforge 7 months ago

please at the very least submit a bug report at Jenkins to get the discussion going upstream. If we had done that 7 months ago, a fix might already exist today.

#2 Updated by osmith 7 months ago

Sorry for the delay, I've done the research to prepare a bug report now. Turns out that a related pull request has been merged to jenkins master in August 2018 (so it must be in the version we are running already):

[JENKINS-17116] - When aborting a build, wait up to 2min for process termination

When a build is aborted by the user, Jenkins will now gracefully terminate involved processes by giving it up to 30 seconds time to exit after having received SIGTERM (on Linux) or Ctrl+C on Windows.
...

https://github.com/jenkinsci/jenkins/pull/3414

Here is the related issue:
https://issues.jenkins-ci.org/browse/JENKINS-17116

The issue is in the "in review" state, which means they are waiting for people to confirm that it works as expected, as I understand. I have tested on our jenkins, whether this works, and it does not work. So I've replied in detail to that issue:
https://issues.jenkins-ci.org/browse/JENKINS-17116?focusedCommentId=366455&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-366455

Test job result:
https://jenkins.osmocom.org/jenkins/job/TEST_trap_in_jenkins_job/8/console

#3 Updated by osmith 2 months ago

  • Checklist item fix lands in jenkins master added
  • Checklist item fix lands in jenkins release added
  • Checklist item release with fix is deployed on jenkins.osmocom.org added
  • Checklist item verify that trap with cleanup function on TERM is working now added
  • Checklist item write proper cleanup function to stop docker containers and use it in the jenkins.sh scripts of docker-playground.git added
  • % Done changed from 0 to 20

The ticket was marked as solved, and a patch had been merged that should make it possible to write a cleanup function and trap on it:

https://github.com/jenkinsci/jenkins/pull/4225

#4 Updated by laforge 13 days ago

  • Status changed from New to Stalled

the fix was introduced to jenkins 2.199, and rejected for 2.190.2 which we are running on. Seems like we need to wait for a major jenkins upgrade on our side :/

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)