OsmoPCU uses wrong CellID in BSSGP
While writing some unrelated PCU tests, I discovered that somehow the CellID on the BSSGP side is wrong.
Whatever the BTS passes in on the PCU interface side ends uo endian-swapped in the BVC-RESET on BSSGP.
This is really odd, as all of PCUIF is in host byte order.
Looking at the commitlog, it appears that the following patch is to be blamed:
commit bdc55fad62e333951a0eb1fc7b96aaaec16dd6ff Author: Neels Hofmeyr <email@example.com> Date: Wed Feb 21 00:39:07 2018 +0100 implement support for 3-digit MNC with leading zeros Receive the mnc_3_digits flag from the PCU interface. Bump the PCU interface to 9. This is one part of the three identical pcuif_proto.h patches: - I49cd762c3c9d7ee6a82451bdf3ffa2a060767947 (osmo-bts) - I787fed84a7b613158a5618dd5cffafe4e4927234 (osmo-pcu) - I78f30aef7aa224b2e9db54c3a844d8f520b3aee0 (osmo-bsc) Add 3-digit flags and use the new RAI and LAI API from libosmocore throughout the code base to be able to handle an MNC < 100 that has three digits (leading zeros). Depends: Id2240f7f518494c9df6c8bda52c0d5092f90f221 (libosmocore), Ib7176b1d65a03b76f41f94bc9d3293a8a07d24c6 (libosmocore) Change-Id: I787fed84a7b613158a5618dd5cffafe4e4927234
And with all due respect, it's a prime example of how we fail
- a completely unrelated change (changing the endinanness of cell_id) is made as part of implementing 3 digits MNC. This should clearly have been unrelated patches
- the reviewer (myself in this case) didn't spot or at least not reject this "making two unrelated changes in one commit"
- not having test coverage for OsmoPCU
So for more than one year, we've been reporting the wrong CellID to the SGSN:/
We should learn from this that we absolutely never accept unrelated changes in the same patch. In order to introduce 3 digit MNC support, there was no need to shift the htons() around.
- Status changed from Resolved to In Progress
- Assignee changed from laforge to osmith
- % Done changed from 80 to 50
Unfortunately, this issue is appearing again. I can see in both wireshark, and in the ttcn3 testsuite logs, that the CellID is endian-swapped again in BSSGP. That is, with osmo-pcu master, and strangely, after reverting your patch that fixed the issue earlier, the CellID is not swapped anymore.
I read through both your fix and bdc55f (Neels' patch linked above), and it is obvious that the swap was introduced into osmo-pcu in bdc55f and the fix does what it should.
This means, that some other component in the "osmo-bsc -- osmo-bts -- osmo-pcu" chain is mixing up the CellID. Looking at the osmo-pcu.log confirms, that osmo-pcu is already receiving the wrong cell ID from the BTS socket. I've looked at the RSL traces between osmo-bsc and osmo-bts, and there the right CellID is present. So it seems that the error happens in osmo-bts.
Then I searched through the code for mentions of cell_id in osmo-bts, but found only two places, and with debug prints added, they always showed the wrong ID.
My current guess is, that this is a bug in libosmocore, possibly related to the "struct osmo_cell_global_id", that internally combines mcc, mnc, lac and cell_id to pass all of them at once to various functions.
I can't simply try old versions or bisect libosmocore for this problem, because the various components won't build against much older libosmocore versions.