Bug #5020


osmo-pcu: Poll and SBA allocation improvements and fixes

Added by pespin about 3 years ago. Updated almost 3 years ago.

Target version:
Start date:
Due date:
% Done:


Spec Reference:


Current implementations are quite basic and as far as i can tell prone to errors and sometimes unable to schedule CTRL polling on TBFs (RRBP) or SBAs (RACH).

  • SBA blocks allocated (reserved) are stored in SBAController class under gprs_rlcmac_bts object [bts_sba(bts)]. It sets the SBA block on FN + "AGCH_START_OFFSET=52" (also theoretically linked to timer X2002).
  • SBAController doesn't seem to have into account that a TBF may have already reserved that FN when allocating a SBA (hence either there may be a collision at that time).
  • Current allocation of TBF polls consist on always allocating poll on RRBP as FN+13. This has the advantage that there's no need to check for collisions between different TBFs on a given PDCH, since on a given FN only a TBF is selected/executed and hence only that one can even reserve FN+13 (because the +13 is fixed offset). However, it may happen sometimes that a TBF tries to allocate FN+13 (tbf->check_polling()/set_polling()), but that FNa+13 is already taken by a previous SBA which allocated it to FNb+52 such that FNb=FNa-52+13, and hence the TBF fails to allocate a poll for that CTRL message, which in the end it means it ends up failing to construct the CTRL message and the scheduler ends up sending a RLCMAC Dummy Block instead.
  • Since that can happen, the tbf->check_polling() should try harder finding a poll FN by trying other RRBP values than FN+13.
  • TBF class implementation of polling seems to only support 1 active poll at a time. That means, If for instance scheduler selects the same TBF at let's say FNa=10 and immediatelly afeter at FNb=10+4=14, If both where to request poll allocation, it would fail in tbf->check_polling() due to "poll_state != GPRS_RLCMAC_POLL_NONE" being hit. That's basically because the TBF class stores the active poll FN in class attribute "tbf->poll_fn/ts". So ideally there should be some sort of linkedlist or circular buffer or whatever supporting several active pollings. This becomes more important at the time we start using bigger values for RRBP, where the "poll active" state can be there for longer periods and hence probability to hti this issue increments.

Related specs:
3GPP TS 44.060 "10.4.5 Relative Reserved Block Period (RRBP) field"

Related issues

Related to OsmoPCU - Bug #5033: N3101 potentially being updated only during RBBP poll but not during USF pollResolvedpespin02/17/2021

Related to OsmoPCU - Feature #5122: Choose SBA allocation offset based on AGCH queue load in the BTSNewpespin04/20/2021

Actions #1

Updated by pespin about 3 years ago

The last point (TBF only supporting 1 active poll FN) is worse than expected, because PollController seems to be timing out non-acked polls for a larger period of time than the FN+13 one:

void bts_set_current_frame_number(struct gprs_rlcmac_bts *bts, int fn)
    /* The UL frame numbers lag 3 behind the DL frames and the data
     * indication is only sent after all 4 frames of the block have been
     * received. Sometimes there is an idle frame between the end of one
     * and start of another frame (every 3 blocks).  So the timeout should
     * definitely be there if we're more than 8 frames past poll_fn. Let's
     * stay on the safe side and say 13 or more. An additional delay can
     * happen due to the block processing time in the DSP, so the delay of
     * decoded blocks relative to the timing clock can be much larger.
     * Values up to 50 frames have been observed under load. */
    const static int max_delay = 60;

    bts->cur_fn = fn;
    bts->pollController->expireTimedout(bts->cur_fn, max_delay);

For the ACKed one's is fine, since gprs_rlcmac_pdch::rcv_control_ack() calls:


Actions #2

Updated by pespin about 3 years ago

  • Related to Bug #5033: N3101 potentially being updated only during RBBP poll but not during USF poll added
Actions #3

Updated by pespin almost 3 years ago

Some more facts after further investigation:

  • bts->cur_fn holds the current/last MAC block received on the UL, so it drives the clock.
  • The bts->cur_fn ield is updated in 2 code paths:
    • bts_set_current_frame_number: the main way, triggered by PCUIF time_ind received on every start of MAC block FN from BTS (even when direct phy is used, TIME.IND in lower layers are only received in BTS)
    • bts_set_current_block_frame_number: used only by sysmo direct PHY (not used by other direct phy like oc2g) to correct the clock (bts->cur_fn) in case there was some drift detected between what comes from PCUIF info_ind and its own direct phy.

So, the idea here would be to:
1- Make sure generic osmo-pcu code supports receiving data_len=0 minimally: "pcu_rx_data() => pcu_rx_data_ind_pdtch() => pdch->rcv_block()" This should already be fine OK since mcs_get_by_size_ul() will return UNKNOWN=0 and early return printing an error. So this first step is done.
2- Make sure all phys always send data_ind even if no data received, or corrupted, by setting data_len = 0 (aka NOPE.ind)
3- In osmo-pcu, ignore received time_ind messages (at most cross-check the FN and print warning otherwise), and update the clock (bts_set_current_frame_number) during data_ind instead.
4- In osmo-pcu upper layers, stored scheduled USF for that BTS+TRX+TS+FN and if data_ind data_len=0 (NOPE.ind), then increment N3101 for that TBF.

Step 2 (backend sending NOPE.ind data_len=0) requires:
2.1- Allow direct phy osmo-pcu code to submit corrupt/empty data_ind. For example, for sysmo direct phy:

handle_ph_data_ind (sysmo_l1_if.c)
        bts_set_current_block_frame_number    <--- RX FN number is corrected here
            bts->pollController->expireTimedout(fn, max_delay);
    if (data_ind->msgUnitParam.u8Size == 0)
        return -1;                              <--- NOPE.ind is discarded here
    " /* drop incomplete UL block */ if (data_ind->msgUnitParam.u8Buffer[0] != GsmL1_PdtchPlType_Full) break;    <--- NOPE.ind is discarded here

2.1- Allow indrect phy from osmo-bts to send PCUIF INFO_IND with data_len=0 (osmo-bts.git/src/common/l1sap.c):
    if (ts_is_pdch(&trx->ts[tn])) {
        /* don't send bad frames to PCU */
        if (len == 0)
            return -EINVAL;            <--- NOPE.ind is discarded here
        /* drop incomplete UL block */
        if (pr_info != PRES_INFO_BOTH)
            return 0;                  <--- NOPE.ind is discarded here
        pcu_tx_data_ind(...)               <----- This function contains a memcpy(x, y, len) which must be done conditonal on len>0 (undefined behavior of memcpy for len=0).
        reutrn 0;

Actions #4

Updated by pespin almost 3 years ago

I updated osmo-bts-trx, osmo-pcu and TTCN3 PCUIF_Components to send all DATA.ind even if len=0. Related patches:

osmo-bts.git: osmo-pcu.git: osmo-ttcn3-hacks.git: docker-plaground.git:

I did some manual tests with osmo-bts-trx and osmo-pcu patches applied and everything looks good.
I also checked that TTCN3 patches still work fine with older versions of osmo-pcu.

osmo-ttcn3-hacks + docker can already be merged. The first osmo-pcu patch silently ignorig data_len=0 too. Other ones I want to first implement it for all BTS types and are marked as WIP in gerrit.

Actions #5

Updated by pespin almost 3 years ago

pespin wrote:

  • TODO: implement change for lower layers of other bts types!

Actually the code in osmo-bts.git is OK and already letting pass of size=0 and alike, due to previous work we already did with TCH channels.
The missing changes are in osmo-pcu for the direct phy case. I submitted a patch here: direct_phy: Support submitting DATA.ind with len=0 to upper layers

Actions #6

Updated by pespin almost 3 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 60

remote: Fix: left shift cannot be repesented in type int [NEW]
remote: sched: Fix scheduling UL TBF not matching conditions [NEW]
remote: sched: Simplify usf selection code [NEW]
remote: Set matching USF if available when polling a UL TBF [NEW]
remote: pdch: Add mising pdch_ulc_release_node in Rx Cell Change Notif [NEW]
remote: pdch_ulc: Create helper API pdch_ulc_release_node [NEW]
remote: Track scheduled UL blocks through USF [NEW]
remote: Properly implement N3101 [NEW]

After this bunch of patches become merged, we already track all sort of UL blocks in a unified way. What's missing regarding this ticket:
  • Improve SBA scheduler to offer something else than N+52 (AGCH_START_OFFSET)
  • Improve SBA/POLL scheduler to offer something else than N+13
Actions #7

Updated by pespin almost 3 years ago

So, as per SBA allocation (Imm Assignment on CCCH):

The scheduled FN for the SBA is currently hardcoded to 52 (sba.c):

/* starting time for assigning single slot
 * This offset must be a multiple of 13. */

struct gprs_rlcmac_sba *sba_alloc(void *ctx, struct gprs_rlcmac_pdch *pdch, uint8_t ta)
    struct gprs_rlcmac_sba *sba;
    sba = talloc_zero(ctx, struct gprs_rlcmac_sba);
    if (!sba)
        return NULL;

    sba->pdch = pdch;
    sba->ta = ta;

    /* TODO: request ULC for next available FN instead of hardcoded AGCH_START_OFFSET */
    sba->fn = next_fn(pdch->last_rts_fn, AGCH_START_OFFSET);

    pdch_ulc_reserve_sba(pdch->ulc, sba);
    return sba;

Then, that FN is fed into Imm Assignment when encoding it (bts.cpp bts_rcv_rach):

sba = bts_alloc_sba(bts, ta);
sb_fn = sba->fn;
        plen = Encoding::write_immediate_assignment(
            &bts->trx[trx_no].pdch[ts_no], tbf, bv,
            false, rip->ra, Fn, ta, usf, false, sb_fn,
            bts_get_ms_pwr_alpha(bts), bts->pcu->vty.gamma, -1,

And sba_Fn is used here in Encoding::write_immediate_assignment as ref_fn to indicate the offset to the MS to start using it:

    bitvec_write_field(dest, &wp,(ref_fn / (26 * 51)) % 32,5); // T1'
    bitvec_write_field(dest, &wp,ref_fn % 51,6);               // T3
    bitvec_write_field(dest, &wp,ref_fn % 26,5);               // T2

Now, the question is why is 52 being used, and why it must be multiple of 13... Need to find that in the spec.

The symbol AGCH_START_OFFSET and the related comment about being multiple of 13 was added here, without much explanation:

commit 07e97cf8a551b05d7f5f3f9583b68b2eff0f1c23
Author: Andreas Eversberg <>
Date:   Tue Aug 7 16:00:56 2012 +0200

    Adding single block allocation

    It is mandatory to support it because MS may request a single block.
    In this case the network must assign a single block.

    It is possible to force single block allocation for all uplink requests
    on RACH. (VTY option)

Actions #8

Updated by pespin almost 3 years ago

After these patches, FN for SBA and RRBP poll is scheduled based on available/unreserved FNs in the UL scheduler:
remote: Pick unreserved UL FN when allocating an SBA
remote: pdch_ulc: Support picking RRBP other than N+13 [NEW]

So, after those are merged, what's TODO:
  • modify SBA allocator offset in sba_alloc() based on AGCH queue load in the BTS
  • Update gprs_rlcmac_tbf poll_state/poll_fn/poll_ts implementation to support multiple concurrent polls allocated.
Actions #9

Updated by pespin almost 3 years ago

  • % Done changed from 60 to 80

I submitted a bunch of commits to get rid of old tbf's poll_fn/poll_ts infra which only supported 1 concurrent poll request per tbf here:
remote: pdch_ulc: Store TBF poll reason [NEW]
remote: tbf: Get rid of unneeded poll_scheduled() [NEW]
remote: tbf: Allow multiple concurrent polls [NEW]
remote: Remove unneeded poll_state check [NEW]
remote: tbf: get rid of poll_state completely [NEW]
remote: Get rid of param 'poll' with constant value [NEW]
remote: tbf: Get rid of attribute poll_fn [NEW]
remote: tbf: Get rid of attribute poll_ts [NEW]

Actions #10

Updated by pespin almost 3 years ago

  • Related to Feature #5122: Choose SBA allocation offset based on AGCH queue load in the BTS added
Actions #11

Updated by pespin almost 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Patches were merged.

I created a separate ticket to track the SBA allocation fixed delay here:

Hence, closing this ticket. Other detailed ticket can be created in the future to track specific issues/improvements.


Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)