Bug #6191
openosmo-pcu from OBS latest feeds transmits Dummy blocks a few more FNs after last TBF in TS was freed
100%
Description
After new release 1.3.0/1.3.1, it was spotted that the following tests pass in ttcn3-pcu-test (nightly) but not in ttcn3-pcu-test-latest (running1.3.1).
TC_n3105_max_t3195 TC_pcuif_suspend_active_tbf TC_pdch_energy_saving TC_ta_ptcch_idle TC_ul_tbf_reestablish_with_pkt_resource_req_t3168
I can reproduce this locally using the docker setup with TC_n3105_max_t3195. Using default IMAGE_SUFFIX=master, the test passes. However, using IMAGE_SUFFIX=latest, the test fails.
This is strange since current master has no real big changes over latest release 1.3.1, only a commit modifying a bit a log message.
In fact, I tried running with IMAGE_SUFFIX=master and OSMO_PCU_BRANCH pointing to commit in 1.3.1, and then it passes too. So that means it has something to do with how osmo-pcu binary in the OBS package was built.
I attach a successful pcap and a failing pcap as reference.
Among other thing, the test expects that after T_3195 times out it should receive an empty dl data block from the PCU because there's no more TBFs active in that TS. This seems to work as expected when running against master or 1.3.1 compiled by the docket container, as seen in the good pcap. After the last DUMMY dl block is sent, one can see how the MS and TBF is freed and hence no more DUMMY dl blocks are sent.
However, when running against the package from OBS ("latest"), one can see that after the TBF and MS is freed, the PCU still submits DUMMY DL DATA blocks for a few (3-4) more FN rounds, which is wrong.
The following patch was submitted in the ttcn3 test to more clearly show the problem:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/34536 pcu: Fail immediately in TC_n3105_max_t3195
So all in all, it looks like:
1- Bug in the compiler used in OBS
2- Some bug of ours only appeareing when compiled against specific compile flags or compiler version
The logic on whether to submit a DL Dummy block or an empty one, is done in osmo-pcu at gprs_rlcmac_rcv_rts_block():
/* Prio 3: send dummy control message if need to poll or USF */ else { /* If there's no TBF attached to this PDCH, we can submit an empty * data_req since there's nothing to transmit nor to poll/USF. This * way we help BTS energy saving (on TRX!=C0) by sending nothing * instead of a dummy block. The early return is done here and * not at the start of the function because the condition below * (num_tbfs==0) may not be enough, because temporary dummy TBFs * created to send Imm Ass Rej (see handle_tbf_reject()) don't * have a TFI assigned and hence are not attached to the PDCH * TS, so they don't show up in the count below. */ const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF) + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF); bool skip_idle = (num_tbfs == 0); #ifdef ENABLE_DIRECT_PHY /* In DIRECT_PHY mode we want to always submit something to L1 in * TRX0, since BTS is not preparing dummy bursts on idle TS for us */ skip_idle = skip_idle && trx != 0; #endif if (!skip_idle && (msg = sched_dummy())) { /* increase counter */ gsmtap_cat = PCU_GSMTAP_C_DL_DUMMY; } else { msg = NULL; /* submit empty frame */ } }
In theory ENABLE_DIRECT_PHY should not be enabled in the OBS build, so in summary, the dummy message should only be generated based on:
const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF) + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF); bool skip_idle = (num_tbfs == 0); if (!skip_idle) msg = sched_dummy()
Files
Updated by pespin 2 months ago
Since a while ago (last release 1.3.0/1.3.1), debian/rules contain:
debian/rules 27: dh_auto_configure -- --with-systemdsystemunitdir=/lib/systemd/system --enable-manuals --enable-er-e1-ccu
Mind the "--enable-er-e1-ccu" part.
This activates the following in configure.ac:
AC_MSG_CHECKING([whether to enable direct E1 CCU access for PDCH of Ericsson RBS]) AC_ARG_ENABLE(er-e1-ccu, AC_HELP_STRING([--enable-er-e1-ccu], [enable code for Ericsson RBS E1 CCU [default=no]]), [enable_er_e1_ccu="$enableval"],[enable_er_e1_ccu="no"]) AC_MSG_RESULT([$enable_er_e1_ccu]) AM_CONDITIONAL(ENABLE_ER_E1_CCU, test "x$enable_er_e1_ccu" = "xyes") AS_IF([test "x$enable_er_e1_ccu" = "xyes"], [ PKG_CHECK_MODULES(LIBOSMOABIS, libosmoabis >= 1.5.0) PKG_CHECK_MODULES(LIBOSMOTRAU, libosmotrau >= 1.5.0) ])
which in turn activates:
if ENABLE_ER_E1_CCU AM_CPPFLAGS += -DENABLE_DIRECT_PHY endif
So in summary, after latest release, define ENABLE_DIRECT_PHY is enabled, while it wasn't before. This means ENABLE_DIRECT_PHY is being used by people potentially using the osmo-pcu binary against an osmo-bts-trx, which is unexpected.
Updated by pespin 2 months ago
As a result of ENABLE_DIRECT_PHY being enabled, Dummy dl blocks are always transmitted in TRX0, which is the case here. This unexpected when submitting the data to an osmo-bts-trx, breaks the testsuite and it's a regression which needs to be fixed and a new patch release made.
dexter this code path should only be enabled when transmitting against a BTS which is going through direct phy, and never when connected to an osmo-bts-trx:
#ifdef ENABLE_DIRECT_PHY /* In DIRECT_PHY mode we want to always submit something to L1 in * TRX0, since BTS is not preparing dummy bursts on idle TS for us */ skip_idle = skip_idle && trx != 0; #endif
Commit which introduced the regression:
commit 4bac39892399e2056fc03981140d654eb65b6e54 Author: Philipp Maier <pmaier@sysmocom.de> Date: Fri Feb 3 16:52:42 2023 +0100 support for Ericsson RBS E1 CCU Ericsson RBS series BTSs do not have a built in PCU. Rather than having the PCU on board the PCU is co-located to the BSC in those cases. Just like the MGW the PCU would connect to the CCU (Channel Coding Unit) via E1 line. The PCU is connected via an unix domain socket (pcu_sock) to the BSC to receive RACH requests and to control Paging and TBF assignment. This patch adds all the required functionality to run an Ercisson RBS in GPRS/EGPRS mode. It supports 16k I.460 E1 subslots and full 64k E1 timeslots (recommended) Change-Id: I5c0a76667339ca984a12cbd2052f5d9e5b0f9c4d Related: OS#5198
Updated by pespin 2 months ago
- Status changed from New to Feedback
- Assignee changed from pespin to dexter
dexter , first of all you should check if RBS works well if we send empty blocks (or no blocks) to it. It should generate dummy bursts as explained here:
/* If there's no TBF attached to this PDCH, we can submit an empty * data_req since there's nothing to transmit nor to poll/USF. This * way we help BTS energy saving (on TRX!=C0) by sending nothing * instead of a dummy block. The early return is done here and * not at the start of the function because the condition below * (num_tbfs==0) may not be enough, because temporary dummy TBFs * created to send Imm Ass Rej (see handle_tbf_reject()) don't * have a TFI assigned and hence are not attached to the PDCH * TS, so they don't show up in the count below. */ const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF) + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF); bool skip_idle = (num_tbfs == 0); #ifdef ENABLE_DIRECT_PHY /* In DIRECT_PHY mode we want to always submit something to L1 in * TRX0, since BTS is not preparing dummy bursts on idle TS for us */ skip_idle = skip_idle && trx != 0; #endif
/* If there's no TBF attached to this PDCH, we can skip Tx of PTCCH * since there's nothing worthy of being transmitted. This way BTS can * identify idle blocks and send nothing or dumy blocks with reduced * energy for the sake of energy saving. */ const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF) + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF); bool skip_idle = (num_tbfs == 0); #ifdef ENABLE_DIRECT_PHY /* In DIRECT_PHY mode we want to always submit something to L1 in * TRX0, since BTS is not preparing dummy bursts on idle TS for us: */ skip_idle = skip_idle && trx != 0; #endif
Then, we should add some way for osmo-pcu to know if a given BTS is an RBS, to know what to do based on which BTS it is generating a block for. sysmobts seems to be using:
if ((info_ind->flags & PCU_IF_FLAG_SYSMO) && info_ind->trx[trx_nr].hlayer1) {
Updated by dexter 2 months ago
- % Done changed from 0 to 30
When I understand the problem correctly than the define constant ENABLE_DIRECT_PHY is used incorrectly. ENABLE_DIRECT_PHY only means that the support for direct phy access is enabled, it does not tell anything about if we actually accessing the phy directly during runtime. I have uploaded a fix to gerrit, but I didn't test it with TTCN3 yet:
https://gerrit.osmocom.org/c/osmo-pcu/+/34575 gprs_rlcmac_sched: check if we really use direct phy
Updated by dexter 2 months ago
To make it more visible what PCU_IF_FLAG_SYSMO is actually for, I have renamed it to PCU_IF_FLAG_DIRECT_PHY
https://gerrit.osmocom.org/c/osmo-pcu/+/34582 pcuif_proto: rename PCU_IF_FLAG_SYSMO to PCU_IF_FLAG_DIRECT_PHY [NEW]
https://gerrit.osmocom.org/c/osmo-bsc/+/34583 pcuif_proto: rename PCU_IF_FLAG_SYSMO to PCU_IF_FLAG_DIRECT_PHY [NEW]
https://gerrit.osmocom.org/c/osmo-bts/+/34584 pcuif_proto: rename PCU_IF_FLAG_SYSMO to PCU_IF_FLAG_DIRECT_PHY [NEW]
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/34585 rename sysmo_direct_dsp to direct_phy [NEW]
Updated by dexter about 2 months ago
Unfortunately we found out that we cannot decide so easily if we have to generate dummy dl blocks or not as it really depends on the BTS model and not on the fact whether we use direct phy access or not. This means we have to signal the BTS model to the PCU. The following patches implement such a mechanism. Once they made it into master we can easily use the BTS model info to fix dl-block generation problem.
https://gerrit.osmocom.org/c/osmo-pcu/+/34647 pcu_l1_if: signal BTS model via PCUIF
https://gerrit.osmocom.org/c/osmo-bts/+/34648 pcuif_proto: signal BTS model via PCUIF
https://gerrit.osmocom.org/c/osmo-bsc/+/34649 pcuif_proto: signal BTS model via PCUIF
Updated by dexter 30 days ago
The patches above have enough review points to get merged, but we still cannot merge them right now. The TTCN3 testsuite also needs to be updated. Eventually it is also very important that the docker tests are aware of the PCUIF version change so that the configs are changed on the fly to the correct PCUIF version number (depending on whether we test on latest or on master).
This means that the following two patches must be merged together with the patches above at once:
https://gerrit.osmocom.org/c/docker-playground/+/34998 ttcn3-bts/pcu-test: Use PCUIF v12 for current master and PCUIF v11 for latest
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/34934 PCUIF: upgrade to PCUIF v12
Updated by dexter 20 days ago
The patches for PCU, BTS and BSC are ready to get merged. However we must not merge them without merging the corresponding patches for TTCN3 and docker. Those are the two patches mentioned in #9 (above). I have tested everything in docker locally to make sure there is no fallout. Everything looks good and I would not expect any problems.
Updated by laforge 20 days ago
On Mon, Nov 20, 2023 at 02:00:37PM +0000, dexter wrote:
The patches for PCU, BTS and BSC are ready to get merged. However we must not merge them without merging the corresponding patches for TTCN3 and docker. Those are the two patches mentioned in #9 (above). I have tested everything in docker locally to make sure there is no fallout. Everything looks good and I would not expect any problems.
if the review is sufficient, go ahead and merge all of them on a day when you are prepared to fix any
potential fall-out the next 1-2 days. So I think now is a good time as the week still has a few days
to catch fall-out or revert.
Updated by dexter 18 days ago
The patches are now all merged. Unfortunately there is a problem in ttcn3-bts-test-latest, the following patch should fix it:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/35090 PCUIF_Types: make PCUIF_info_ind field bts_model optional
Updated by dexter 18 days ago
- % Done changed from 30 to 50
I have now revisited the older patches and fixed everything so that it uses the new bts->bts_model flag to decide whether we have to generate dummy blocks or not:
https://gerrit.osmocom.org/c/osmo-pcu/+/35096 gprs_rlcmac_sched: fix condition for generating dummy blocks on idle
Updated by dexter 12 days ago
The few testcases that currently fail were indeed related to PCUIF (not protocol wise, but we didn't set a realistic BTS model yet)
I hope this fixes the problem:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/35144 PCU_Tests: tell PCU a more realistic BTS model
Updated by fixeria 4 days ago
For the record, we had a Jitsi call with jolly who volunteered to do some testing. We decided that it would be best to confirm that things are working as expected by sniffing the Um interface.
For this purpose, I hacked up a GPRS sniffer based on the Sylvain's burst_ind hack:
https://cgit.osmocom.org/osmocom-bb/log/?h=fixeria/burst_ind_gprs
All you need to do is:
- patch osmocom, in case you're using CP2102 (grep for
I_HAVE_A_CP210x
); - patch timeslot number / TSC in app_ccch_scan.c (grep for
RSL_CHAN_OSMO_PDCH
); - compile the firmware and load the 'layer1';
- run
ccch_scan -i 127.0.0.1 -a ARFCN
.
Updated by fixeria 4 days ago
fixeria wrote in #note-18:
All you need to do is:
- run
ccch_scan -i 127.0.0.1 -a ARFCN
.
This app writes raw BURST.ind into a file and additionally decodes PDTCH blocks on the fly, so you should see GSMTAP in wireshark.
The BURST.ind captures (looking like bursts_20231106_2218_1018_1767319_c7.dat
) can later be decoded using the gprsdecode tool:
fixeria@LEGION:/home/fixeria/projects/osmocom/burst_ind/src/host/gprsdecode$ ./gprsdecode -i 127.0.0.1 -c /tmp/bursts_20231106_2218_1018_1766812_c7.dat
Updated by fixeria 4 days ago
Together with jolly we found in the code that in theory sysmoBTS (and the likes) can handle empty PDCH DATA.req. Here is the related code:
static int ph_data_req(struct gsm_bts_trx *trx, struct msgb *msg,
struct osmo_phsap_prim *l1sap, bool use_cache)
{
struct femtol1_hdl *fl1 = trx_femtol1_hdl(trx);
struct msgb *l1msg = l1p_msgb_alloc();
struct gsm_lchan *lchan;
uint32_t u32Fn;
uint8_t u8Tn, subCh, u8BlockNbr = 0, sapi = 0;
uint8_t chan_nr, link_id;
int len;
if (!msg) {
LOGP(DL1C, LOGL_FATAL, "PH-DATA.req without msg. "
"Please fix!\n");
abort();
}
len = msgb_l2len(msg); // <-- (!) segfault, jolly has a patch
// ...
/* convert l1sap message to GsmL1 primitive, keep payload */
if (len) {
// ...
} else {
/* empty frame */
GsmL1_Prim_t *l1p = msgb_l1prim(l1msg);
empty_req_from_l1sap(l1p, fl1, u8Tn, u32Fn, sapi, subCh, u8BlockNbr);
}
Using the GPRS Um sniffer it should be possible to see what exactly it's transmitting in this case.
Updated by pespin 1 day ago
- Assignee changed from pespin to osmith
I think this ticket can be closed whenever a new osmo-pcu is released.
We should check that ttcn3-pcu-test-latest PCU_Tests.TC_n3105_max_t3195 turns green after releasing.
Assigning to osmith since I think he's tracking some ticket already about those releases.