Project

General

Profile

Bug #5191

people.osmocom.org is down

Added by domi 3 months ago. Updated 3 months ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
07/02/2021
Due date:
% Done:

90%

Spec Reference:

Description

people.osmocom.org seems to be down. I have checked it from multiple internet connections and also with DownDetector and the result was always the same.

I noticed this when I tried to watch a recording of the OsmoDevCall about SS7 and SIGTRAN - the link to the video timed out.

History

#1 Updated by laforge 3 months ago

  • Status changed from New to In Progress

The physical server (one of my "personal" machines co-located in a data centre in Nuernberg, none of the rented servers where most of osmocom.org runs on) is no longer reachable at all.

I've asked the colocation to physically power cycle it, let's see if that recovers or if it's some kind of hardware damage that needs further investigation.

If this cannot be resolved quickly we might upload at least the videos elswhere. downloads.osmocom.org has plenty of space these days, for example.

#2 Updated by laforge 3 months ago

  • Assignee changed from domi to laforge

#3 Updated by tnt 3 months ago

I have all the renders locally if needed.

#4 Updated by laforge 3 months ago

  • % Done changed from 0 to 80

back up and running.

it's yet unclear why the machine crashed (it could still be a hardware defect), but it is operational again. https://people.osmocom.org/ is reachable after the reboot.

There is nothing in the syslog about the incident. The RAID-1 seems good. I might do some SMART checks.

#5 Updated by laforge 3 months ago

I'll probably set up a mirror of people.osmocom.org on downloads.sysmocom.org, in case this should repeat itself we'd at least have a live mirror.

#6 Updated by domi 3 months ago

Thank you very much laforge, it is indeed back to normal. Sad that no kernel panic logs or similar could be harvested from the instance. I think the mirror idea is great. Thanks for fixing it so quickly.

#7 Updated by tnt 3 months ago

And ... it's down again.

#8 Updated by laforge 3 months ago

I'm currently restoring a backup to a different physical machine, that should be running for about two hours. After that some configuration is required, but we should have people.osmocom.org live again later today.

#9 Updated by laforge 3 months ago

The current state of the recovery (not all files yet present) can be observed at http://[2a01:4f8:201:344a::1:3]/

#10 Updated by laforge 3 months ago

  • % Done changed from 80 to 90

services have meanwhile been fully restored. The old IP address is being forwarded (http, https, ssh), so even without any DNS updates it is working

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)