Project

General

Profile

Bug #5020

osmo-pcu: Poll and SBA allocation improvements and fixes

Added by pespin 24 days ago. Updated 1 day ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
Start date:
02/11/2021
Due date:
% Done:

0%

Spec Reference:

Description

Current implementations are quite basic and as far as i can tell prone to errors and sometimes unable to schedule CTRL polling on TBFs (RRBP) or SBAs (RACH).

  • SBA blocks allocated (reserved) are stored in SBAController class under gprs_rlcmac_bts object [bts_sba(bts)]. It sets the SBA block on FN + "AGCH_START_OFFSET=52" (also theoretically linked to timer X2002).
  • SBAController doesn't seem to have into account that a TBF may have already reserved that FN when allocating a SBA (hence either there may be a collision at that time).
  • Current allocation of TBF polls consist on always allocating poll on RRBP as FN+13. This has the advantage that there's no need to check for collisions between different TBFs on a given PDCH, since on a given FN only a TBF is selected/executed and hence only that one can even reserve FN+13 (because the +13 is fixed offset). However, it may happen sometimes that a TBF tries to allocate FN+13 (tbf->check_polling()/set_polling()), but that FNa+13 is already taken by a previous SBA which allocated it to FNb+52 such that FNb=FNa-52+13, and hence the TBF fails to allocate a poll for that CTRL message, which in the end it means it ends up failing to construct the CTRL message and the scheduler ends up sending a RLCMAC Dummy Block instead.
  • Since that can happen, the tbf->check_polling() should try harder finding a poll FN by trying other RRBP values than FN+13.
  • TBF class implementation of polling seems to only support 1 active poll at a time. That means, If for instance scheduler selects the same TBF at let's say FNa=10 and immediatelly afeter at FNb=10+4=14, If both where to request poll allocation, it would fail in tbf->check_polling() due to "poll_state != GPRS_RLCMAC_POLL_NONE" being hit. That's basically because the TBF class stores the active poll FN in class attribute "tbf->poll_fn/ts". So ideally there should be some sort of linkedlist or circular buffer or whatever supporting several active pollings. This becomes more important at the time we start using bigger values for RRBP, where the "poll active" state can be there for longer periods and hence probability to hti this issue increments.

Related specs:
3GPP TS 44.060 "10.4.5 Relative Reserved Block Period (RRBP) field"


Related issues

Related to OsmoPCU - Bug #5033: N3101 potentially being updated only during RBBP poll but not during USF pollNew02/17/2021

Associated revisions

Revision 464627c0 (diff)
Added by pespin 1 day ago

pdch: Silently ignore DATA.ind with len=0

Since recently (see Depends below), BTS side submits DATA.ind with len=0
to announce nothing was received on that UL block FN. This will allow
osmo-pcu track time more accurately, and use this information to quickly
find out if a UL block was expected as requested by RRBP or USF poll and
increment counters such as N3101 (finally being able to properly
implement timers such as T3619).

Depends: osmo-bts.git Change-Id I343c7a721dab72411edbca816c8864926bc329fb
Related: OS#5020
Change-Id: I17c28abf63b153448b533971ac5cf2e48daadea8

Revision 7a14d81b (diff)
Added by pespin 1 day ago

Track TDMA clock with DATA.ind instead of TIME.ind

Since recently (see Depends below), BTS side submits DATA.ind with len=0
to announce nothing was received on that UL block FN. This will allow
osmo-pcu track time more accurately, and use this information to quickly
find out if a UL block was expected as requested by RRBP or USF poll and
increment counters such as N3101 (finally being able to properly
implement timers such as T3619).

Depends: osmo-bts.git Change-Id I343c7a721dab72411edbca816c8864926bc329fb
Related: OS#5020

Change-Id: Ibc495173119465e74f726ddc36e312334e6dc0fd

Revision 60ed844b (diff)
Added by pespin 1 day ago

pcu: transmit PCUIF DATA.ind with len=0 when no UL data to transmit

PCUIF will be updated to always send DATA.ind for each expected block FN
on any activated PDCH slot, irrespective of whether valid data was
received or not, similarly to what's done already for TRXDv1 NOPE.ind in
TRXD and TCH channels in OsmoBTS. The aim at this change is to be able to
track TDMA clock in an accurate way without hops, and hence be able to
detect on time whether expected UL blocks (SF, RRBP poll) didn't arrive.

Older osmo-pcu versions can cope well with this change, they will simply
print an error upon ach data_len=0 messages received and submit a GSMTAP
block, then discard it, so tests still pass.
Nevertheless, a new module parameter is added to disable this new
behavior in order to avoid logs and pcap files ending up clogged with
uneeded information until a new osmo-pcu release appears.

Related: OS#5020
Change-Id: Ib4f97a9bcfa68230945effeb6412218faa64ec78

History

#1 Updated by pespin 24 days ago

The last point (TBF only supporting 1 active poll FN) is worse than expected, because PollController seems to be timing out non-acked polls for a larger period of time than the FN+13 one:

void bts_set_current_frame_number(struct gprs_rlcmac_bts *bts, int fn)
{
    /* The UL frame numbers lag 3 behind the DL frames and the data
     * indication is only sent after all 4 frames of the block have been
     * received. Sometimes there is an idle frame between the end of one
     * and start of another frame (every 3 blocks).  So the timeout should
     * definitely be there if we're more than 8 frames past poll_fn. Let's
     * stay on the safe side and say 13 or more. An additional delay can
     * happen due to the block processing time in the DSP, so the delay of
     * decoded blocks relative to the timing clock can be much larger.
     * Values up to 50 frames have been observed under load. */
    const static int max_delay = 60;

    bts->cur_fn = fn;
    bts->pollController->expireTimedout(bts->cur_fn, max_delay);
}

For the ACKed one's is fine, since gprs_rlcmac_pdch::rcv_control_ack() calls:

TBF_POLL_SCHED_UNSET(tbf);

#2 Updated by pespin 18 days ago

  • Related to Bug #5033: N3101 potentially being updated only during RBBP poll but not during USF poll added

#3 Updated by pespin 3 days ago

Some more facts after further investigation:

  • bts->cur_fn holds the current/last MAC block received on the UL, so it drives the clock.
  • The bts->cur_fn ield is updated in 2 code paths:
    • bts_set_current_frame_number: the main way, triggered by PCUIF time_ind received on every start of MAC block FN from BTS (even when direct phy is used, TIME.IND in lower layers are only received in BTS)
    • bts_set_current_block_frame_number: used only by sysmo direct PHY (not used by other direct phy like oc2g) to correct the clock (bts->cur_fn) in case there was some drift detected between what comes from PCUIF info_ind and its own direct phy.

So, the idea here would be to:
1- Make sure generic osmo-pcu code supports receiving data_len=0 minimally: "pcu_rx_data() => pcu_rx_data_ind_pdtch() => pdch->rcv_block()" This should already be fine OK since mcs_get_by_size_ul() will return UNKNOWN=0 and early return printing an error. So this first step is done.
2- Make sure all phys always send data_ind even if no data received, or corrupted, by setting data_len = 0 (aka NOPE.ind)
3- In osmo-pcu, ignore received time_ind messages (at most cross-check the FN and print warning otherwise), and update the clock (bts_set_current_frame_number) during data_ind instead.
4- In osmo-pcu upper layers, stored scheduled USF for that BTS+TRX+TS+FN and if data_ind data_len=0 (NOPE.ind), then increment N3101 for that TBF.

Step 2 (backend sending NOPE.ind data_len=0) requires:
2.1- Allow direct phy osmo-pcu code to submit corrupt/empty data_ind. For example, for sysmo direct phy:

handle_ph_data_ind (sysmo_l1_if.c)
    pcu_rx_block_time
        bts_set_current_block_frame_number    <--- RX FN number is corrected here
            bts->pollController->expireTimedout(fn, max_delay);
    if (data_ind->msgUnitParam.u8Size == 0)
        return -1;                              <--- NOPE.ind is discarded here
    " /* drop incomplete UL block */ if (data_ind->msgUnitParam.u8Buffer[0] != GsmL1_PdtchPlType_Full) break;    <--- NOPE.ind is discarded here
    pcu_rx_data_ind_pdtch

2.1- Allow indrect phy from osmo-bts to send PCUIF INFO_IND with data_len=0 (osmo-bts.git/src/common/l1sap.c):
l1sap_ph_data_ind:
    if (ts_is_pdch(&trx->ts[tn])) {
        ...
        /* don't send bad frames to PCU */
        if (len == 0)
            return -EINVAL;            <--- NOPE.ind is discarded here
        /* drop incomplete UL block */
        if (pr_info != PRES_INFO_BOTH)
            return 0;                  <--- NOPE.ind is discarded here
        pcu_tx_data_ind(...)               <----- This function contains a memcpy(x, y, len) which must be done conditonal on len>0 (undefined behavior of memcpy for len=0).
        reutrn 0;
    }

#4 Updated by pespin 1 day ago

I updated osmo-bts-trx, osmo-pcu and TTCN3 PCUIF_Components to send all DATA.ind even if len=0. Related patches:

osmo-bts.git: osmo-pcu.git: osmo-ttcn3-hacks.git: docker-plaground.git:

I did some manual tests with osmo-bts-trx and osmo-pcu patches applied and everything looks good.
I also checked that TTCN3 patches still work fine with older versions of osmo-pcu.

osmo-ttcn3-hacks + docker can already be merged. The first osmo-pcu patch silently ignorig data_len=0 too. Other ones I want to first implement it for all BTS types and are marked as WIP in gerrit.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)