Bug #6208
closedmodem: "UL_TBF[0x55a5d78de0]{ASSIGN}: Event RX_UL_ACK_NACK not permitted"
0%
Description
I am observing this error while testing on the mssdr-ms host:
20231005173418051 DLLC DEBUG llc.c:145 modem_llc_prim_down_cb(): Rx GRR-UNITDATA.request ll=[01 c0 09 08 16 08 09 10 10 00 00 00 10 10 2e d7 b8 ] 20231005173418051 DRLCMAC INFO rlcmac_prim.c:522 Rx from upper layers: GRR-UNITDATA.request 20231005173418092 DRLCMAC INFO fsm.c:456 UL_TBF[0x55a5d72f00]{NEW}: Allocated 20231005173418092 DRLCMAC INFO fsm.c:456 UL_TBF_ASS[0x55a5d725d0]{IDLE}: Allocated 20231005173418092 DRLCMAC INFO tbf_ul_ass_fsm.c:711 UL_TBF_ASS[0x55a5d725d0]{IDLE}: Received Event START_FROM_DL_TBF 20231005173418092 DRLCMAC INFO tbf_ul_ass_fsm.c:301 UL_TBF[0x55a5d72f00]{NEW}: Received Event UL_ASS_START 20231005173418092 DRLCMAC INFO tbf_ul_fsm.c:148 UL_TBF[0x55a5d72f00]{NEW}: state_chg to ASSIGN 20231005173418092 DRLCMAC INFO tbf_ul_ass_fsm.c:303 UL_TBF_ASS[0x55a5d725d0]{IDLE}: state_chg to WAIT_PKT_UL_ASS 20231005173418188 DRLCMAC INFO rlcmac.c:793 TS=3 FN=1551883 Rx Pkt UL ACK/NACK 20231005173418189 DRLCMAC INFO tbf_ul.c:317 UL_TBF[0x55a5d72f00]{ASSIGN}: Received Event RX_UL_ACK_NACK 20231005173418189 DRLCMAC ERROR tbf_ul.c:317 UL_TBF[0x55a5d72f00]{ASSIGN}: Event RX_UL_ACK_NACK not permitted 20231005173418189 DRLCMAC INFO tbf_ul.c:273 TBF(UL:NR-2:TLLI-91223344) Tx CS update: CS-1 -> CS-2 20231005173418230 DRLCMAC INFO rlcmac.c:793 TS=3 FN=1551892 Rx Pkt DL Dummy Ctrl Block 20231005173418231 DRLCMAC INFO fsm.c:568 UL_TBF_ASS[0x55a5d725d0]{WAIT_PKT_UL_ASS}: Deallocated 20231005173418231 DRLCMAC INFO tbf_ul_fsm.c:74 UL_TBF[0x55a5d72f00]{ASSIGN}: Send L1CTL-CFG_UL_TBF.req ul_tbf_nr=2 (release) 20231005173418231 DRLCMAC INFO fsm.c:568 UL_TBF[0x55a5d72f00]{ASSIGN}: Deallocated
Please find the PCAPs (BTS side + MS side) attached.
Files
Related issues
Updated by fixeria 7 months ago
As can be seen from the MS side PCAP (GSMTAP logging), there is another weird problem:
20231005173417913 DRLCMAC INFO tbf_ul.c:1161 UL_TBF[0x55a5d72f00]{FLOW}: Received Event FIRST_UL_DATA_SENT 20231005173417913 DRLCMAC INFO tbf_ul_fsm.c:179 UL_TBF[0x55a5d72f00]{FLOW}: First UL block sent, stop T3164 20231005173417913 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x55a5d72f00]{FLOW}: Received Event LAST_UL_DATA_SENT 20231005173417913 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x55a5d72f00]{FLOW}: Last UL block sent (CV=0), start T3182 20231005173417913 DRLCMAC INFO tbf_ul_fsm.c:201 UL_TBF[0x55a5d72f00]{FLOW}: state_chg to FINISHED 20231005173417949 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x55a5d72f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231005173417949 DRLCMAC INFO tbf_ul_fsm.c:241 UL_TBF[0x55a5d72f00]{FINISHED}: Data block with CV=0 retransmit attempts=1 20231005173417949 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x55a5d72f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231005173417972 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x55a5d72f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231005173417972 DRLCMAC INFO tbf_ul_fsm.c:241 UL_TBF[0x55a5d72f00]{FINISHED}: Data block with CV=0 retransmit attempts=2 20231005173417973 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x55a5d72f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231005173417991 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x55a5d72f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231005173417991 DRLCMAC INFO tbf_ul_fsm.c:241 UL_TBF[0x55a5d72f00]{FINISHED}: Data block with CV=0 retransmit attempts=3 20231005173417991 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x55a5d72f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231005173418009 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x55a5d72f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231005173418009 DRLCMAC INFO tbf_ul_fsm.c:241 UL_TBF[0x55a5d72f00]{FINISHED}: Data block with CV=0 retransmit attempts=4 20231005173418009 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x55a5d72f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231005173418010 DRLCMAC NOTICE tbf_ul_fsm.c:245 UL_TBF[0x55a5d72f00]{FINISHED}: TBF establishment failure (Data block with CV=0 retransmit attempts=4) 20231005173418010 DRLCMAC INFO fsm.c:568 UL_TBF_ASS[0x55a5d725d0]{IDLE}: Deallocated 20231005173418010 DRLCMAC INFO tbf_ul_fsm.c:74 UL_TBF[0x55a5d72f00]{FINISHED}: Send L1CTL-CFG_UL_TBF.req ul_tbf_nr=1 (release) 20231005173418010 DRLCMAC INFO fsm.c:568 UL_TBF[0x55a5d72f00]{FINISHED}: Deallocated
Updated by pespin 7 months ago
Regarding your 1st comment, it's another issue some I'm ignoring it here now. Describing the issue from the initial ticket, which also has a pcap mssdr_gprs_bts.pcapng:
- frame_nr=44323 MS sends first and last UL block (CV=0) of the UL TBF containing the GMM Identity Respnse (IMEI). Network uses the already existing DL TBF to send the next GMM Identity Request (IMSI).
- frame_nr=44393 PCU sends PKT UL ACK/NACK FinalACK=1 [RRBP] at gsm_fn=1551849 telling the MS it received all the data, and expects PKT CTRL ACK at N+13=1551862
- [Missing: PKT CTRL ACK transmitted by MS to PCU at gsm_fn=1551862, it was probably dropped]
- frame_nr=44860 PCU sends PKT UPLINK ACK/NACK [RRBP] which confirms the MS didn't receive the PKT CTRL ACK, so it is trying to ack everything again.
- frame_nr=44875 MS uses the PKT DL ACK/NACK to set "Channel_request_description" field and hence ask for a new UL TBF.
- frame_nr=44938 PCU sends PKT UL ASS [RRBP] to assign a new UL TBF.
- frame_nr=45329 MS PKT CTRL ACKs the PKT UL ASS.
So the main problem here is that the MS is finishing the UL TBF after sending the PKT CTRL ACK, and if the PCU doesn't received that PKT CTRL ACK, both sides become out of sync: MS thinks the UL TBF is finished, and PCU tries to ACK everything again because it doesn't know if the PKT CTRL ACK was lost or if MS didn't receive the PKT UL ACK/NACK in first place.
So while PCU is trying to finish the UL TBF, MS considers it finished and asks PCU for a new UL TBF using the existing DL TBF (through DL ACK/NACK in frame_nr=44875). That frame_nr=44875 (PKT DL ACK/NACK) is the responisble for triggering this:
20231005173418051 DLLC DEBUG llc.c:145 modem_llc_prim_down_cb(): Rx GRR-UNITDATA.request ll=[01 c0 09 08 16 08 09 10 10 00 00 00 10 10 2e d7 b8 ] 20231005173418051 DRLCMAC INFO rlcmac_prim.c:522 Rx from upper layers: GRR-UNITDATA.request 20231005173418092 DRLCMAC INFO fsm.c:456 UL_TBF[0x55a5d72f00]{NEW}: Allocated 20231005173418092 DRLCMAC INFO fsm.c:456 UL_TBF_ASS[0x55a5d725d0]{IDLE}: Allocated 20231005173418092 DRLCMAC INFO tbf_ul_ass_fsm.c:711 UL_TBF_ASS[0x55a5d725d0]{IDLE}: Received Event START_FROM_DL_TBF 20231005173418092 DRLCMAC INFO tbf_ul_ass_fsm.c:301 UL_TBF[0x55a5d72f00]{NEW}: Received Event UL_ASS_START 20231005173418092 DRLCMAC INFO tbf_ul_fsm.c:148 UL_TBF[0x55a5d72f00]{NEW}: state_chg to ASSIGN 20231005173418092 DRLCMAC INFO tbf_ul_ass_fsm.c:303 UL_TBF_ASS[0x55a5d725d0]{IDLE}: state_chg to WAIT_PKT_UL_ASS
Then, since the PCU still keeps to ack the old UL TBF, MS says this when reciving it:
20231005173418188 DRLCMAC INFO rlcmac.c:793 TS=3 FN=1551883 Rx Pkt UL ACK/NACK 20231005173418189 DRLCMAC INFO tbf_ul.c:317 UL_TBF[0x55a5d72f00]{ASSIGN}: Received Event RX_UL_ACK_NACK 20231005173418189 DRLCMAC ERROR tbf_ul.c:317 UL_TBF[0x55a5d72f00]{ASSIGN}: Event RX_UL_ACK_NACK not permitted
Because that's indeed for the older UL TBF.
Updated by pespin 7 months ago
TS 44.060 9.3.2.4.2 :
If the PACKET UPLINK ACK/NACK message has the Final Ack Indicator bit set to "1" and the mobile station does not initiate the establishment of a new uplink TBF according to one of the procedures described above, the mobile station shall transmit the PACKET CONTROL ACKNOWLEDGEMENT message and release the TBF.
so the MS is acting correctly in the sense that it's freeing the UL TBF after theoretically having sent the PKT CTRL ACK (which doesn't seem to arrive to the PCU).
The question to investigate is: why didn't the PKT CTRL ACK reach the PCU?
Updated by pespin 7 months ago
Logs from osmo-bts/osmo-pcu regarding the never received PKT CTRL ACK in gsm_fn=1551862
44767 17:34:18.095240221 Oct 5, 2023 19:34:18.095240221 CEST 127.0.0.1 54671 127.0.0.1 4729 GSMTAP 211 1551862/1170/00/34/38 (bts=0,trx=0,ts=3) PDTCH: Received PDTCH bid=0 44772 17:34:18.099987467 Oct 5, 2023 19:34:18.099987467 CEST 127.0.0.1 54671 127.0.0.1 4729 GSMTAP 211 1551863/1170/01/35/39 (bts=0,trx=0,ts=3) PDTCH: Received PDTCH bid=1 44779 17:34:18.104640189 Oct 5, 2023 19:34:18.104640189 CEST 127.0.0.1 54671 127.0.0.1 4729 GSMTAP 211 1551864/1170/02/36/40 (bts=0,trx=0,ts=3) PDTCH: Received PDTCH bid=2 44807 17:34:18.109678724 Oct 5, 2023 19:34:18.109678724 CEST 127.0.0.1 54671 127.0.0.1 4729 GSMTAP 211 1551865/1170/03/37/41 (bts=0,trx=0,ts=3) PDTCH: Received PDTCH bid=3 44808 17:34:18.109869726 Oct 5, 2023 19:34:18.109869726 CEST 127.0.0.1 54671 127.0.0.1 4729 GSMTAP 245 1551865/1170/03/37/41 (bts=0,trx=0,ts=3) PDTCH: Received bad data (rc=-5, BER 0/0) ending at fn=81/104 44809 17:34:18.109960165 Oct 5, 2023 19:34:18.109960165 CEST 127.0.0.1 54671 127.0.0.1 4729 GSMTAP 218 1551862/1170/00/34/38 Rx PH-DATA.ind chan_nr=PDCH on TS3 link_id=0x00 len=0 44810 17:34:18.110042684 Oct 5, 2023 19:34:18.110042684 CEST 127.0.0.1 54671 127.0.0.1 4729 GSMTAP 202 Sending data indication: sapi=PDTCH arfcn=979 block=6 data=
Updated by pespin 7 months ago
After looking at the MS pcap, it seems both issues reported in this ticket are actually related. The "TBF establishment failure (Data block with CV=0 retransmit attempts=4)" is the culprit.
That line basically says that the UL TBF is freed, at frame_nr=788.
Then, at frame_nr=801:
801 17:34:18.032486 Oct 5, 2023 19:34:18.032486000 CEST 127.0.0.1 42239 127.0.1.1 4729 GSMTAP 201 TS=3 FN=1551849 Rx Pkt UL ACK/NACK: UL_TBF TFI=0 not found
Since the TBF is gone due to retransmit problem of course it won't send a PKT CTRL ACK for that UL ACK/NACK.
So the main issue here is the fn-advance making the CV=0 retransmit 4 times and hence free the UL TBF.
Updated by pespin 7 months ago
- Status changed from New to Feedback
- Assignee changed from pespin to fixeria
The problem of the MS freeing the UL TBF after retransmitting 4 CV=0 blocks should be fixed here:
https://gerrit.osmocom.org/c/libosmo-gprs/+/34679 tbf_ul_fsm: Avoid freeing tbf after 4 CV=0 retrans in acknowledged mode
That limitation really only applied when in Unacknowledged operation mode, which we don't really support/use.
There's still the more general problem of huge delay between UL/DL (fn-advance/rts-advance), which means up to 4-5 DL blocks can be scheduled before having received the answer to the first one in PCU, which is quite a lot and for sure undesired. Assigning this ticket to fixeria since there was some extra testing he wanted to do regarding this topic.
Updated by fixeria 7 months ago
- Status changed from Feedback to Resolved
Thanks, I cherry-picked the patch and deployed it on the mssdr-ms:
20231006213325367 DRLCMAC INFO tbf_ul.c:1161 UL_TBF[0x557d247f00]{FLOW}: Received Event FIRST_UL_DATA_SENT 20231006213325367 DRLCMAC INFO tbf_ul_fsm.c:179 UL_TBF[0x557d247f00]{FLOW}: First UL block sent, stop T3164 20231006213325368 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x557d247f00]{FLOW}: Received Event LAST_UL_DATA_SENT 20231006213325368 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x557d247f00]{FLOW}: Last UL block sent (CV=0), start T3182 20231006213325368 DRLCMAC INFO tbf_ul_fsm.c:201 UL_TBF[0x557d247f00]{FLOW}: state_chg to FINISHED 20231006213325404 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x557d247f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231006213325404 DRLCMAC INFO tbf_ul_fsm.c:238 UL_TBF[0x557d247f00]{FINISHED}: Data block with CV=0 retransmit attempts=1 20231006213325404 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x557d247f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231006213325427 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x557d247f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231006213325427 DRLCMAC INFO tbf_ul_fsm.c:238 UL_TBF[0x557d247f00]{FINISHED}: Data block with CV=0 retransmit attempts=2 20231006213325427 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x557d247f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231006213325445 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x557d247f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231006213325446 DRLCMAC INFO tbf_ul_fsm.c:238 UL_TBF[0x557d247f00]{FINISHED}: Data block with CV=0 retransmit attempts=3 20231006213325446 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x557d247f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231006213325464 DRLCMAC INFO tbf_ul.c:1163 UL_TBF[0x557d247f00]{FINISHED}: Received Event LAST_UL_DATA_SENT 20231006213325464 DRLCMAC INFO tbf_ul_fsm.c:238 UL_TBF[0x557d247f00]{FINISHED}: Data block with CV=0 retransmit attempts=4 20231006213325464 DRLCMAC INFO tbf_ul_fsm.c:105 UL_TBF[0x557d247f00]{FINISHED}: Last UL block sent (CV=0), start T3182 20231006213325487 DRLCMAC INFO rlcmac.c:793 TS=3 FN=1655277 Rx Pkt UL ACK/NACK 20231006213325487 DRLCMAC INFO tbf_ul.c:317 UL_TBF[0x557d247f00]{FINISHED}: Received Event RX_UL_ACK_NACK 20231006213325487 DRLCMAC INFO tbf_ul_fsm.c:232 UL_TBF[0x557d247f00]{FINISHED}: state_chg to RELEASING
Things seem to look a lot better now. We were cutting it really close :)
- The
Rx Pkt UL ACK/NACK
happens ~13 ms after the 4th retransmit attempt. - Without your patch we used to free the TBF right after the 4th retransmit attempt.