Project

General

Profile

Bug #3471

host2.osmocom.org not starting all services on reboot.

Added by zecke 4 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/18/2018
Due date:
% Done:

0%

Spec Reference:

Description

git.osmocom.org didn't respond, ICMP echo didn't respond either. Hetzner didn't have a know outage for this and I used the hardware reset option in the console. The machine became available again. I couldn't see anything in /var/log/messages (the machine must have been dead for ~3h). After reboot many of the services didn't run.

After checking the logs I saw "docker-compose pull" failing and was able to reproduce it. The image configs for cgit/gerrit seem to be wrong and need to be fixed. I don't know the intention behind these two changes and have not modified the file. For cgit we most likely don't want to stay with the previous version.

  • A failing pull should not prevent us from starting the containers we already have (and were happy to run). E.g. by changing:
    ExecStartPre=/usr/local/bin/docker-compose pull --quiet --parallel
    to
    ExecStartPre=-/usr/local/bin/docker-compose pull --quiet --parallel
    
  • We should have some sanity check around the docker-compose file. Maintain in git and have "docker-compose pull" tested in some way? Something ansible?

Log:

Aug 18 14:52:58 host2.osmocom.org systemd[4607]: docker-compose.service: Executing: /usr/local/bin/docker-compose pull --quiet --parallel
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling cgit         ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling registry     ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling jenkins      ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling redmine-db   ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling gitolite     ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling patchwork-db ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling patchwork    ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling dns          ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling git          ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling mumble       ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling gerrit       ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling redmine      ...
Aug 18 14:52:58 host2.osmocom.org docker-compose[4607]: Pulling nginx        ...
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]: [569B blob data]
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]: ERROR: for gerrit  manifest for registry.sysmocom.de/gerrit:previous not found
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]: ERROR: for cgit  manifest for registry.sysmocom.de/cgit:previous not found
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]: Traceback (most recent call last):
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:   File "/usr/local/bin/docker-compose", line 11, in <module>
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:     sys.exit(main())
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:   File "/usr/local/lib/python3.5/dist-packages/compose/cli/main.py", line 71, in main
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:     command()
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:   File "/usr/local/lib/python3.5/dist-packages/compose/cli/main.py", line 127, in perform_command
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:     handler(command, command_options)
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:   File "/usr/local/lib/python3.5/dist-packages/compose/cli/main.py", line 716, in pull
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:     include_deps=options.get('--include-deps'),
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:   File "/usr/local/lib/python3.5/dist-packages/compose/project.py", line 558, in pull
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]:     raise ProjectError(b"\n".join(errors.values()))
Aug 18 14:53:01 host2.osmocom.org docker-compose[4607]: TypeError: sequence item 0: expected a bytes-like object, str found
Aug 18 14:53:01 host2.osmocom.org systemd[1]: docker-compose.service: Child 4607 belongs to docker-compose.service

History

#1 Updated by laforge 4 months ago

On Sat, Aug 18, 2018 at 01:21:12PM +0000, zecke [REDMINE] wrote:

After checking the logs I saw "docker-compose pull" failing and was able to reproduce it. The image configs for cgit/gerrit seem to be wrong and need to be fixed. I don't know the intention behind these two changes and have not modified the file.

AFAIR, the new version of the related containers didn't work when I tried to update them, so I had to roll
back (on July 23rd)

For cgit we most likely don't want to stay with the previous version.

  • A failing pull should not prevent us from starting the containers we already have (and were happy to run). E.g. by changing:
    > ExecStartPre=/usr/local/bin/docker-compose pull --quiet --parallel
    > to
    > ExecStartPre=-/usr/local/bin/docker-compose pull --quiet --parallel
    > 

I've implemented this change now.

  • We should have some sanity check around the docker-compose file. Maintain in git and have "docker-compose pull" tested in some way? Something ansible?

the git repo already exists due to etckeeper. We could simply push that to our git server to make it
accessible.

Aug 18 14:53:01 host2.osmocom.org docker-compose4607: ERROR: for gerrit manifest for registry.sysmocom.de/gerrit:previous not found
Aug 18 14:53:01 host2.osmocom.org docker-compose4607: ERROR: for cgit manifest for registry.sysmocom.de/cgit:previous not found

I've now pushed those two tags to the registry as a work-around.

I'll investigate. At least for cgit I think the latest build/image also works.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)