Project

General

Profile

Actions

Bug #6381

open

libosmocore changes take too long until they propagate through OBS master

Added by laforge 4 months ago. Updated 13 days ago.

Status:
In Progress
Priority:
Low
Assignee:
Target version:
-
Start date:
03/01/2024
Due date:
% Done:

60%

Spec Reference:

Description

(05:23:15 PM) hwelte: I don't understand why https://gerrit.osmocom.org/c/libosmo-netif/+/35069 keeps failing even hours after the libosmocore patch has been merged. I even checked that the OBS package for libosmocore had been rebuilt meanwhile
(05:29:17 PM) hwelte: the debian10/debian12 builds fail despite the libosmocore on which the changes depend has long been built on obs.
(05:54:43 PM) osmith: hwelte, it says here: build jobs exist (gear icon): https://obs.osmocom.org/package/show/osmocom:master/libosmocore
(05:55:04 PM) hwelte: osmith: yes, but that's for much more recent libosmocore commits
(05:55:04 PM) osmith: the builders just seem to be busy building everything, I guess because of multiple libosmocore merges in a row, each triggering a rebuild of everything
(05:55:12 PM) osmith: https://obs.osmocom.org/monitor <- builders being busy
(05:55:34 PM) osmith: it might be that OBS doesn't publish a repository before all packages for a distribution are built
(05:58:26 PM) osmith: hwelte, so in summary: libosmocore changes cause a lot of packages to be built, and the repo (probably) only gets published once all packages are built. unfortunately it seems that the builders were not able to build everything within the last hours since the merge, or that it started building additional packages because of more merges.

So it somehow seems that if a longer series of patches is committed to libosmocore, each of them gets built as OBS packge for each arch/distibution, and that's what takes so long? Can we somehow get it to only build the most recent commit at a time, and first build that all the way to the end and make sure those packages are published/available for further downstream builds?

It's not exactly a rare situation that we add something to one library, and then there are also patches that use those additions elsewhere. So either
  • we can achieve the above, or
  • we can make sure that repos/feeds are updated after building [only] the debian10/debian12 on amd64 which we use for gerrit build verification [and even prioritize those]?
  • we can make find some other way to make sure the (in this case) libosmocore package is somehow available for gerrit build verification after it has been built, and not only after the entire universe has been re-built?

Files

obs.png View obs.png 65.3 KB osmith, 03/01/2024 09:36 AM
Actions #1

Updated by osmith 4 months ago

  • File obs.png obs.png added
  • Status changed from New to Feedback
  • Assignee changed from osmith to laforge

So it somehow seems that if a longer series of patches is committed to libosmocore, each of them gets built as OBS packge for each arch/distibution, and that's what takes so long? Can we somehow get it to only build the most recent commit at a time, and first build that all the way to the end and make sure those packages are published/available for further downstream builds?

From what I have observed, it works like this:
  • new source package gets pushed to OBS for libosmocore
  • OBS starts building libosmocore, and queues all reverse depends to be rebuilt afterwards
  • when the libosmocore package is done, it does not yet get published because that would render the repository in an inconsistent state - the other packages are not rebuilt yet against this version
  • OBS continues building packages in the queue
  • when a new source package gets pushed to OBS before the queue is done, it repeats with building libosmocore again. the packages are not published until the whole queue is done once for one distribution/arch combination

It's not exactly a rare situation that we add something to one library, and then there are also patches that use those additions elsewhere.

For some more context, I've attached the graph of the previous days. At least it does not seem to happen every day that it gets stuck like this.

Can we somehow get it to only build the most recent commit at a time, and first build that all the way to the end and make sure those packages are published/available for further downstream builds?

I guess in theory we could check whether OBS is currently building source packages, and in that case, not send further source packages until it is completely done. In some scenarios it will be faster until you have libosmocore built with a patch you need, but in others it will be slower. And the total number of packages built will increase.

faster case:

  • push libosmocore patch 1
  • reverse deps get rebuilt (queue 1)
  • before queue 1 is done, libosmocore patch 2 gets pushed, but we don't push a new source package to OBS yet because queue 1 is not done
  • queue 1 is complete
  • we push libosmocore patch 2
  • queue 2 starts building
  • osmo-bsc patch that needs libosmocore patch 1 gets verified in gerrit
  • it can already be verified, because we don't need to wait for queue 2

slower case:

  • push libosmocore patch 1
  • reverse deps get rebuilt (queue 1)
  • before queue 1 is done, libosmocore patch 2 gets pushed, but we don't push a new source package to OBS yet because queue 1 is not done
  • queue 1 is complete
  • we push libosmocore patch 2
  • queue 2 starts building
  • osmo-bsc patch that needs libosmocore patch 2 gets verified in gerrit
  • it fails, because queue 2 is not done yet

so all in all, it will cause more package builds, it may be slower depending on luck / what you need to build against, and it makes the logic for pushing packages a bit more complicated... IMHO we should try to optimize other things instead.

we can make find some other way to make sure the (in this case) libosmocore package is somehow available for gerrit build verification after it has been built, and not only after the entire universe has been re-built?

I guess this would require a hack of extracting the not-yet-published package from OBS somehow, and installing that instead of the real repository. IMHO too much effort.

we can make sure that repos/feeds are updated after building [only] the debian10/debian12 on amd64 which we use for gerrit build verification [and even prioritize those]?

As I understand it, the feeds do get updated after one distro + arch combination is complete. However OBS builds multiple distros in parallel, so they more or less finish at the same time.

We have built osmocom:master for the following distros, all x86_64 only:
  • Debian_10
  • Debian_11
  • Debian_12
  • CentOS_8
  • openSUSE_Tumbleweed

I just removed CentOS_8 and openSUSE_Tumbleweed. We don't use them in CI verification (CentOS8 not anymore since recently), so that should already speed it up significantly.

(We do use debian 10, 11 and 12. Most projects build test against 10 and 12, but some that don't work on 10 test against 11 and 12.)

Another thing to look into could be adding additional workers. I think it would be good to add another 8 workers on build5 once we have set it up (#6186).

And yet another thing could be making the package builds faster, e.g. by pre-installing dependencies for osmo-gsm-manuals... I think it is possible to build in a pre-built image, but haven't looked into this in detail yet.

Actions #2

Updated by laforge 18 days ago

  • Assignee changed from laforge to osmith

Ok, let's see how well it works after your most recent set of changes.

It would be great if we had some kind of observability, i.e. some tracking of how long it takes from merging something to a project in gerrit until the time that the obs packages show up. If we had that on a per-project base as metrics in prometheus, it would be a dream. but of course I understand that might be a lot of effort. So I'm also happy to settle on something less fancy, but having some kind of indication of the delay would be useful for everyone, I think.

We could also consider something like adding a comment to the gerrit patch once the debian package containing that patch has shown up in the "master" feed. I guess from the user (== developer here) point of view that's most useful, as we can see in gerrit once it makes sense to triger the patches that depend on a change?

Actions #4

Updated by osmith 16 days ago

  • Status changed from Feedback to In Progress
  • % Done changed from 0 to 50

laforge wrote in #note-2:

Ok, let's see how well it works after your most recent set of changes.

It would be great if we had some kind of observability, i.e. some tracking of how long it takes from merging something to a project in gerrit until the time that the obs packages show up. If we had that on a per-project base as metrics in prometheus, it would be a dream. but of course I understand that might be a lot of effort. So I'm also happy to settle on something less fancy, but having some kind of indication of the delay would be useful for everyone, I think.

I think this is a good idea. Wrote a simple prometheus exporter with metrics output like the following, which uses the OBS API (like the osc command). After merging a patch to master, the "published" number goes down, and "building" goes up, until eventually 0 "building" and all "published" again. The numbers indicate the amount of repository-arch combinations (Debian_12-x86_64, Debian_11-x86_64, ...).

# HELP obs_project_status OBS project status as count of repository-arch combinations
# TYPE obs_project_status gauge
obs_project_status{project="osmocom:master",status="building"} 0.0
obs_project_status{project="osmocom:master",status="broken"} 0.0
obs_project_status{project="osmocom:master",status="published"} 4.0
obs_project_status{project="osmocom:master",status="parser_error"} 0.0
obs_project_status{project="osmocom:nightly",status="building"} 0.0
obs_project_status{project="osmocom:nightly",status="broken"} 0.0
obs_project_status{project="osmocom:nightly",status="published"} 21.0
obs_project_status{project="osmocom:nightly",status="parser_error"} 0.0
obs_project_status{project="osmocom:nightly:asan",status="building"} 0.0
obs_project_status{project="osmocom:nightly:asan",status="broken"} 0.0
obs_project_status{project="osmocom:nightly:asan",status="published"} 1.0
obs_project_status{project="osmocom:nightly:asan",status="parser_error"} 0.0
obs_project_status{project="osmocom:latest",status="building"} 0.0
obs_project_status{project="osmocom:latest",status="broken"} 0.0
obs_project_status{project="osmocom:latest",status="published"} 19.0
obs_project_status{project="osmocom:latest",status="parser_error"} 0.0

https://gitea.osmocom.org/osmith/osmo-obs-exporter/

Next I'll add it to our prometheus/grafana setup.

We could also consider something like adding a comment to the gerrit patch once the debian package containing that patch has shown up in the "master" feed. I guess from the user (== developer here) point of view that's most useful, as we can see in gerrit once it makes sense to triger the patches that depend on a change?

IMHO the most useful for users would be, if jenkins just builds the dependency debian packages from source if they are outdated (as proposed in https://osmocom.org/issues/6478#note-3). Then the user needs to wait a few minutes longer for their CI job, but it will pass after merging the depending patch and retriggering CI. I think doing this would be feasible.

Actions #5

Updated by osmith 13 days ago

  • Assignee changed from osmith to laforge
  • % Done changed from 50 to 60

osmith wrote in #note-4:

laforge wrote in #note-2:

Ok, let's see how well it works after your most recent set of changes.

It would be great if we had some kind of observability, i.e. some tracking of how long it takes from merging something to a project in gerrit until the time that the obs packages show up. If we had that on a per-project base as metrics in prometheus, it would be a dream. but of course I understand that might be a lot of effort. So I'm also happy to settle on something less fancy, but having some kind of indication of the delay would be useful for everyone, I think.

I think this is a good idea. Wrote a simple prometheus exporter with metrics output like the following, which uses the OBS API (like the osc command). After merging a patch to master, the "published" number goes down, and "building" goes up, until eventually 0 "building" and all "published" again. The numbers indicate the amount of repository-arch combinations (Debian_12-x86_64, Debian_11-x86_64, ...).

[...]

https://gitea.osmocom.org/osmith/osmo-obs-exporter/

Next I'll add it to our prometheus/grafana setup.

I have done that now, documented it here:
https://osmocom.org/projects/osmocom-servers/wiki/OBS_server_setup#osmo-obs-exporter

Added it to the prometheus.yml:

  - job_name: obs
    static_configs:
      - targets: ['192.168.111.4:9123']

I don't have permssions to create dashboards in grafana (https://grafana.osmocom.org/profile says "Role": "Viewer"). But in theory it should work now.

Some fallout from deploying that, which is resolved again:
  • Before routing prometheus through the LXC lan, I first considered going through the public network with nginx and nftables rule to limit the IP, as done with other hosts. I attempted to restart docker-compose, but it failed. Turns out it was not restarted since docker was upgraded lately.
    • /usr/local/bin/docker-compose was gone (I've linked it to /usr/bin/docker-compose now)
    • The docker-compose config was not valid anymore ("invalid subnet 2a01:4f8:201:344a:1000::2/80: it should be 2a01:4f8:201:344a:1000::/80"), causing docker-compose to not start up anymore. I've fixed that.
  • DNS did not work anymore in the grafana LXC after restarting docker-compose and nftables services (not exactly sure how it would be caused by that...). I have changed /etc/resolv.conf of the grafana LXC and put in the same nameserver 192.168.111.1 as in the OBS LXC, now DNS works again. The previous entries are commented out.
Actions #6

Updated by laforge 13 days ago

On Mon, Jun 17, 2024 at 12:13:56PM +0000, osmith wrote:

I don't have permssions to create dashboards in grafana [...]

I tried to change this, but the option is greyed out. Grafna states that your permission of "viewer"
on the Osmocom organization is automatically synchronized from the auth provider. Interestingly, e.g.
@dwillmann or pespin have editor permissions. I have no idae how that happens and what to do about it.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)