Project

General

Profile

Feature #3304

Feature #3303: migrate openmoko.org into "archive mode" and remove dedicated server for it

create static archive of openmoko.org web pages

Added by laforge 5 months ago. Updated 12 days ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
05/29/2018
Due date:
% Done:

10%

Spec Reference:

Description

from http://lists.openmoko.org/pipermail/community/2017-October/069832.html

b) get rid of the existing server, by the following strategy:

   * web: convert the dynamically-generated media-wiki, trac, svnweb, gitweb,
     etc. pages into static renderings that can be served from a static
     web server.  This could be done by something like a recursive wget
     through a http cache.  This would remove the need to run trac,
     mediawiki and apache mod_svn, mysql, ... - and drastically reduce
     the CPU and Memory requirements.  In the end, it would be a bunch
     of static HTML pages rendered by nginx or lighttpd somewhere on a
     virtual server or shared server.

   * svn: discontinue svn service and simply have
     * caches of the rendered html pages (for old hyperlinks to work),
     [...]

   * git: discontinue git service and simply have
     * caches of the rendered gitweb html pages (for old hyperlinks to
       work), and
       [...]

Next to the fact of basically reducing our hosting
requirements to zero, it also has the advantage that we don't have to
worry about keeping trac,mediawiki,etc. installations secure and
updated.  Also, when moving to major new versions, there's always the
risk of some issues with migrating the old data, some wiki rendering
errors, etc.  - conserving the generated output saves us from all of
that.

If we go for 'b', this would include us releasing SQL dumps of the
trac, mediawiki, svn, etc. databases (probably clearing any passwords /
password hashes), so that the raw information can be restored by anyone
who has an interest to it.
Please take care of devising a process by which we can
  • generate the static cache/archive for anything web-visible on *.osmocom.org
  • serve the static cache/archive continuously in the future

Related issues

Related to Osmocom.org Servers - Feature #3085: where to migrate openmoko-backup?In Progress2018-03-19

History

#1 Updated by roh 5 months ago

after some research i did a dump of the mediawiki including the discussion pages but without any special pages or history by using wget.
the plugins for mediawiki seem to be very complicated and broken.

it seems to work in itself but has some (ignorable) rendering bugs. the links to the discussion pages are somehow broken, i guess by the ':' in the url as a relative link

i use this in /etc/hosts to test for now(http only)

217.197.86.133    wiki.openmoko.org

todo for wiki:
- cut out 'login etc header' (the whole <div class="portlet" id="p-personal"> ... </div>
- move to new server

#2 Updated by laforge 4 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10

I've been playing around a bit with httrack. Unfortuantely even with tuning various options it appears very slow.

Something like httrack openmoko.org wiki.openmoko.org docs.openmoko.org -O /foo/bar/websites/openmoko -s0 -w -bN -*p3 -%c50 -%p -%e0 -%k -c50 -%! -A9999999 -@i4 -v -iC1 -iC1 -v seems to be rendering pretty good results so far. Let's wait until it completes and test with some static web server.

As a side-note: I just removed the google-analytics.com link that was still present in he mediawiki skin/theme. That should have been removed 10 years ago :/

It luckily appeared to be the only third-party resource that I could find, and now at least we won't have it in the static archive.

#3 Updated by laforge 4 months ago

http://netpreserve.org/web-archiving/tools-and-software/ seems to be a good collection of tools. https://webarchive.jira.com/wiki/spaces/Heritrix/overview is what the internet archive appears to be using as their own crawler.

#4 Updated by laforge 4 months ago

  • Related to Feature #3085: where to migrate openmoko-backup? added

#5 Updated by laforge 4 months ago

  • Priority changed from Normal to High

#6 Updated by laforge 2 months ago

ping? This is been dragging too long, while we continue to have to pay for the server ....

#7 Updated by laforge 12 days ago

  • Status changed from In Progress to Rejected

after many failed attempt, and no solution online for archiving mediawikis into static pages (what a shame), I decided to upgrade the mediawiki to a dockerized setup using latest stable mediawiki.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)