Bug #5262
closedBTS_Tests_LAPDm: TC_rr_response_frame_loss fails if run after TC_t200_n200
100%
Description
If run alone, TC_rr_response_frame_loss passes fine. However, if it is run after TC_t200_n200 (usual way as per control()), then it fails.
TC_rr_response_frame_loss(11): fail (none -> fail) reason: ""BTS_Tests.ttcn:700 : Tguard timeout""
Files
Updated by pespin over 2 years ago
In f_establish_mo() we send a SABM, we properly receive the EstInd in RSL but then we fail to match:
/* UA: SAPI = 0, R = 0, F = 1, M = 0, L = L of SABM. */ LAPDM.receive(t_PH_DATA(0, is_sacch, tr_LAPDm_UA(sapi, cr_MT_RSP, f:=true, l3:=l3_mo)));
TTCN3 logs:
16:45:07.828382 11 BTS_Tests_LAPDm.ttcn:277 Message enqueued on LAPDM from 12 @LAPDm_RAW_PT.LAPDm_ph_data : { sacch := false, sapi := 0, lapdm := { ab := { addr := { spare := '0'B, lpd := 0, sapi := 0, c_r := true, ea := true }, ctrl := { u := { u := '000'B, p_f := false, u2 := '00'B, spare := '11'B } }, len := 0, m := false, el := 1, payload := ''O } } } id 5 16:45:07.828396 12 LAPDm_RAW_PT.ttcn:514 Matching on port L1CTL succeeded. 16:45:07.828422 12 LAPDm_RAW_PT.ttcn:514 Receive operation on port L1CTL succeeded, message from system(): @L1CTL_Types.L1ctlDlMessage: { header := { msg_type := L1CTL_DATA_IND (3), flags := { padding := '0000000'B, f_done := false }, padding := '0000'O }, dl_info := { chan_nr := { u := { ch0 := RSL_CHAN_NR_Bm_ACCH (1) }, tn := 1 }, link_id := { c := FACCH_SDCCH (0), na := false, prio := SAPI0_PRIO_NORMAL (0), sapi := 0 }, arfcn := { pcs := false, uplink := false, spare := '0000'B, arfcn := 871 }, frame_nr := 160, rx_level := 0, snr := 0, num_biterr := 0, fire_crc := 2 }, payload := omit } id 18 16:45:07.828445 11 BTS_Tests_LAPDm.ttcn:277 Matching on port LAPDM .lapdm.ab.addr.c_r := true with false unmatched.lapdm.ab.ctrl.u.u := '000'B with '011'B unmatched.lapdm.ab.ctrl.u.p_f := false with true unmatched.lapdm.ab.payload := ''O with '77D30B2993'O unmatched: First message in the queue does not match the template:
Updated by pespin over 2 years ago
It seems osmo-bts is crashing when receiving that Lapdm message (I was also seeing a crash in nightly ttcn3-bts-tests but I didn't figure out yet where did occured):
20211013164507661 DRSL <0000> rsl.c:3636 (bts=0,trx=0,ts=0,ss=0) Rx RSL BCCH_INFO 20211013164507661 DRSL <0000> rsl.c:521 Rx RSL BCCH INFO (SI4, 23 bytes) 20211013164507661 DPAG <0005> paging.c:720 Paging SI update 20211013164507677 DRSL <0000> rsl.c:3702 (bts=0,trx=0,ts=1,pchan=TCH/F) ss=0 Rx RSL CHAN_ACTIV 20211013164507677 DRSL <0000> rsl.c:1854 (bts=0,trx=0,ts=1,ss=0) chan_nr=TCH/F on TS1 type=0x00=INITIAL mode=SIGNALLING 20211013164507677 DL1C <0006> l1sap.c:2038 (bts=0,trx=0,ts=1,ss=0) Activating channel TCH/F on TS1 20211013164507677 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating TCH/F 20211013164507677 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating SACCH/TF 20211013164507677 DL1C <0006> scheduler.c:1170 (bts=0,trx=0,ts=1) Set mode for TCH/F (rsl_cmode=3, tch_mode=0, handover=0) 20211013164507677 DTRX <000b> trx_if.c:256 phy0.0: Enqueuing TRX control command 'CMD NOHANDOVER 1 0' 20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for TCH/F 20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for SACCH/TF 20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for TCH/F 20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for SACCH/TF 20211013164507677 DL1C <0006> l1sap.c:839 (bts=0,trx=0,ts=1,ss=0) activate confirm chan_nr=TCH/F on TS1 trx=0 20211013164507677 DRSL <0000> rsl.c:1327 (bts=0,trx=0,ts=1,pchan=TCH/F) (ss=0) TCH_F Tx CHAN ACT ACK 20211013164507677 DTRX <000b> trx_if.c:672 phy0.0: Response message: 'RSP NOHANDOVER 0 1 0' 20211013164507777 DL1P <0007> sched_lchan_tchf.c:216 000159/00/03/06/03 (bts=0,trx=0,ts=1) TCH/F: Received bad data (55/104) 20211013164507796 DLLAPD <0011> lapd_core.c:949 ((bts=0,trx=0,ts=1,ss=0)[DCCH][0]) Store content res. 20211013164507796 DRSL <0000> rsl.c:3603 (bts=0,trx=0,ts=1,ss=0) Fwd RLL msg EST_IND from LAPDm to A-bis 0: stopped pid 8 with status 139 sleeping 1 seconds...
Updated by pespin over 2 years ago
If test TC_rr_response_frame_loss is run alone, osmo-bts doesn't crash.
Updated by pespin over 2 years ago
I enabled ASan:
20211013172315371 DRSL <0000> rsl.c:3702 (bts=0,trx=0,ts=1,pchan=TCH/F) ss=0 Rx RSL CHAN_ACTIV 20211013172315371 DRSL <0000> rsl.c:1854 (bts=0,trx=0,ts=1,ss=0) chan_nr=TCH/F on TS1 type=0x00=INITIAL mode=SIGNALLING 20211013172315371 DL1C <0006> l1sap.c:2038 (bts=0,trx=0,ts=1,ss=0) Activating channel TCH/F on TS1 20211013172315371 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating TCH/F 20211013172315371 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating SACCH/TF 20211013172315371 DL1C <0006> scheduler.c:1170 (bts=0,trx=0,ts=1) Set mode for TCH/F (rsl_cmode=3, tch_mode=0, handover=0) 20211013172315371 DTRX <000b> trx_if.c:256 phy0.0: Enqueuing TRX control command 'CMD NOHANDOVER 1 0' 20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for TCH/F 20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for SACCH/TF 20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for TCH/F 20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for SACCH/TF 20211013172315371 DL1C <0006> l1sap.c:839 (bts=0,trx=0,ts=1,ss=0) activate confirm chan_nr=TCH/F on TS1 trx=0 20211013172315372 DRSL <0000> rsl.c:1327 (bts=0,trx=0,ts=1,pchan=TCH/F) (ss=0) TCH_F Tx CHAN ACT ACK 20211013172315372 DTRX <000b> trx_if.c:672 phy0.0: Response message: 'RSP NOHANDOVER 0 1 0' 20211013172315442 DL1P <0007> sched_lchan_tchf.c:216 000159/00/03/06/03 (bts=0,trx=0,ts=1) TCH/F: Received bad data (55/104) 20211013172315461 DLLAPD <0011> lapd_core.c:949 ((bts=0,trx=0,ts=1,ss=0)[DCCH][0]) Store content res. 20211013172315461 DRSL <0000> rsl.c:3603 (bts=0,trx=0,ts=1,ss=0) Fwd RLL msg EST_IND from LAPDm to A-bis rbtree.c:82:24: runtime error: member access within null pointer of type 'struct rb_node' ASAN:DEADLYSIGNAL ================================================================= ==8==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7ff53be4cc96 bp 0x7ffd03c21950 sp 0x7ffd03c21920 T0) #0 0x7ff53be4cc95 in rb_insert_color (/usr/local/lib/libosmocore.so.17+0x100c95) #1 0x7ff53be04256 in __add_timer (/usr/local/lib/libosmocore.so.17+0xb8256) #2 0x7ff53be04460 in osmo_timer_add (/usr/local/lib/libosmocore.so.17+0xb8460) #3 0x7ff53be04856 in osmo_timer_schedule (/usr/local/lib/libosmocore.so.17+0xb8856) #4 0x7ff53be42ee5 in rate_ctr_timer_cb (/usr/local/lib/libosmocore.so.17+0xf6ee5) #5 0x7ff53be05b10 in osmo_timers_update (/usr/local/lib/libosmocore.so.17+0xb9b10) #6 0x7ff53be09395 in _osmo_select_main (/usr/local/lib/libosmocore.so.17+0xbd395) #7 0x7ff53be09446 in osmo_select_main (/usr/local/lib/libosmocore.so.17+0xbd446) #8 0x55a27d03205e in bts_main (/usr/local/bin/osmo-bts-trx+0x23205e) #9 0x55a27cf598e9 in main (/usr/local/bin/osmo-bts-trx+0x1598e9) #10 0x7ff53a8952e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #11 0x55a27cf58729 in _start (/usr/local/bin/osmo-bts-trx+0x158729) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/usr/local/lib/libosmocore.so.17+0x100c95) in rb_insert_color ==8==ABORTING 0: stopped pid 8 with status 1
Updated by pespin over 2 years ago
- File osmo-bts.log osmo-bts.log added
I attach full osmo-bts.log file while running both tests.
Updated by pespin over 2 years ago
So to me it looks like some struct is freed at some point which still holds a timer set and which is not removed when freed.
Updated by pespin over 2 years ago
Since the crash happens in rate_ctr_timer_cb(), it's probably some rate_ctr_group which is somehow freed but not removed properly. So probably automatically freed by talloc_free() but no rate_ctr_group_free() is called on it.
Updated by pespin over 2 years ago
The most plausible ctr_group having issues is:
src/common/scheduler.c 619: l1ts->ctrs = rate_ctr_group_alloc(ts->trx, src/common/scheduler.c 663: rate_ctr_group_free(l1ts->ctrs);
This is called twice per TS (non-shadow and shadow TS). trx_sched_init_ts() does the alloc and trx_sched_clean_ts() does the free.
Updated by pespin over 2 years ago
- Status changed from New to Feedback
- % Done changed from 0 to 90
The issue was lchan->lapdm_ch not being released under some conditions. Fixed by:
https://gerrit.osmocom.org/c/osmo-bts/+/25792
Updated by pespin over 2 years ago
Patch was merged, waiting for complete ttcn3-bts-tests nightly to run before closing.
Updated by pespin over 2 years ago
- Status changed from Feedback to Resolved
- % Done changed from 90 to 100
tests are fixed now, closing.