Bug #3660
closedTTCN3 Jenkins jobs leave Docker containers running when stopped
100%
Description
The jenkins.sh scripts start multiple containers and stops them at the end of the script. When aborting a job with the web interface, the script gets killed and can not do the clean up part. Jenkins kills all child processes, but not the containers as they get spawned by the docker daemon.
Ideas to resolve this:- open a bug report at Jenkins to not kill processes with -9 in a bug report, then wait for a stable release with the fix, then implement clean up on receiving SIGABRT in jenkins.sh
- try out the PostBuildScript Plugin (or Post build task, that should work for sure), see if they allow executing code on abort of the job
- try to attach the environment variable that jenkins is using to decide which processes will be killed (see link above) to the processes in the Docker containers
Checklist
- fix lands in jenkins master
- fix lands in jenkins release
- release with fix is deployed on jenkins.osmocom.org
- verify that trap with cleanup function on TERM is working now
- write proper cleanup function to stop docker containers and use it in the jenkins.sh scripts of docker-playground.git
Updated by laforge almost 5 years ago
please at the very least submit a bug report at Jenkins to get the discussion going upstream. If we had done that 7 months ago, a fix might already exist today.
Updated by osmith almost 5 years ago
Sorry for the delay, I've done the research to prepare a bug report now. Turns out that a related pull request has been merged to jenkins master in August 2018 (so it must be in the version we are running already):
[JENKINS-17116] - When aborting a build, wait up to 2min for process termination
When a build is aborted by the user, Jenkins will now gracefully terminate involved processes by giving it up to 30 seconds time to exit after having received SIGTERM (on Linux) or Ctrl+C on Windows.
...
https://github.com/jenkinsci/jenkins/pull/3414
Here is the related issue:
https://issues.jenkins-ci.org/browse/JENKINS-17116
The issue is in the "in review" state, which means they are waiting for people to confirm that it works as expected, as I understand. I have tested on our jenkins, whether this works, and it does not work. So I've replied in detail to that issue:
https://issues.jenkins-ci.org/browse/JENKINS-17116?focusedCommentId=366455&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-366455
Test job result:
https://jenkins.osmocom.org/jenkins/job/TEST_trap_in_jenkins_job/8/console
Updated by osmith over 4 years ago
- Checklist item fix lands in jenkins master added
- Checklist item fix lands in jenkins release added
- Checklist item release with fix is deployed on jenkins.osmocom.org added
- Checklist item verify that trap with cleanup function on TERM is working now added
- Checklist item write proper cleanup function to stop docker containers and use it in the jenkins.sh scripts of docker-playground.git added
- % Done changed from 0 to 20
The ticket was marked as solved, and a patch had been merged that should make it possible to write a cleanup function and trap on it:
Updated by laforge over 4 years ago
- Status changed from New to Stalled
the fix was introduced to jenkins 2.199, and rejected for 2.190.2 which we are running on. Seems like we need to wait for a major jenkins upgrade on our side :/
Updated by osmith 9 months ago
- Status changed from Stalled to Resolved
- % Done changed from 20 to 100
- started a testsuite job
- checked with "docker ps" over ssh that a new container is running
- stopped the jenkins job
- checked with "docker ps" that the docker container was stopped