Project

General

Profile

Bug #5262

BTS_Tests_LAPDm: TC_rr_response_frame_loss fails if run after TC_t200_n200

Added by pespin 13 days ago. Updated 8 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
10/13/2021
Due date:
% Done:

100%

Spec Reference:

Description

If run alone, TC_rr_response_frame_loss passes fine. However, if it is run after TC_t200_n200 (usual way as per control()), then it fails.

TC_rr_response_frame_loss(11): fail (none -> fail) reason: ""BTS_Tests.ttcn:700 : Tguard timeout"" 
osmo-bts.log osmo-bts.log 1.84 MB pespin, 10/13/2021 05:26 PM

History

#1 Updated by pespin 13 days ago

  • Description updated (diff)

#2 Updated by pespin 13 days ago

In f_establish_mo() we send a SABM, we properly receive the EstInd in RSL but then we fail to match:

    /* UA: SAPI = 0, R = 0, F = 1, M = 0, L = L of SABM. */
    LAPDM.receive(t_PH_DATA(0, is_sacch, tr_LAPDm_UA(sapi, cr_MT_RSP, f:=true, l3:=l3_mo)));

TTCN3 logs:

16:45:07.828382 11 BTS_Tests_LAPDm.ttcn:277 Message enqueued on LAPDM from 12 @LAPDm_RAW_PT.LAPDm_ph_data : { sacch := false, sapi := 0, lapdm := { ab := { addr := { spare := '0'B, lpd := 0, sapi := 0, c_r := true, ea := true }, ctrl := { u := { u := '000'B, p_f := false, u2 := '00'B, spare := '11'B } }, len := 0, m := false, el := 1, payload := ''O } } } id 5
16:45:07.828396 12 LAPDm_RAW_PT.ttcn:514 Matching on port L1CTL succeeded.
16:45:07.828422 12 LAPDm_RAW_PT.ttcn:514 Receive operation on port L1CTL succeeded, message from system(): @L1CTL_Types.L1ctlDlMessage: { header := { msg_type := L1CTL_DATA_IND (3), flags := { padding := '0000000'B, f_done := false }, padding := '0000'O }, dl_info := { chan_nr := { u := { ch0 := RSL_CHAN_NR_Bm_ACCH (1) }, tn := 1 }, link_id := { c := FACCH_SDCCH (0), na := false, prio := SAPI0_PRIO_NORMAL (0), sapi := 0 }, arfcn := { pcs := false, uplink := false, spare := '0000'B, arfcn := 871 }, frame_nr := 160, rx_level := 0, snr := 0, num_biterr := 0, fire_crc := 2 }, payload := omit } id 18
16:45:07.828445 11 BTS_Tests_LAPDm.ttcn:277 Matching on port LAPDM .lapdm.ab.addr.c_r := true with false unmatched.lapdm.ab.ctrl.u.u := '000'B with '011'B unmatched.lapdm.ab.ctrl.u.p_f := false with true unmatched.lapdm.ab.payload := ''O with '77D30B2993'O unmatched: First message in the queue does not match the template:

#3 Updated by pespin 13 days ago

It seems osmo-bts is crashing when receiving that Lapdm message (I was also seeing a crash in nightly ttcn3-bts-tests but I didn't figure out yet where did occured):

20211013164507661 DRSL <0000> rsl.c:3636 (bts=0,trx=0,ts=0,ss=0) Rx RSL BCCH_INFO
20211013164507661 DRSL <0000> rsl.c:521  Rx RSL BCCH INFO (SI4, 23 bytes)
20211013164507661 DPAG <0005> paging.c:720 Paging SI update
20211013164507677 DRSL <0000> rsl.c:3702 (bts=0,trx=0,ts=1,pchan=TCH/F) ss=0 Rx RSL CHAN_ACTIV
20211013164507677 DRSL <0000> rsl.c:1854 (bts=0,trx=0,ts=1,ss=0) chan_nr=TCH/F on TS1 type=0x00=INITIAL mode=SIGNALLING
20211013164507677 DL1C <0006> l1sap.c:2038 (bts=0,trx=0,ts=1,ss=0) Activating channel TCH/F on TS1
20211013164507677 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating TCH/F
20211013164507677 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating SACCH/TF
20211013164507677 DL1C <0006> scheduler.c:1170 (bts=0,trx=0,ts=1) Set mode for TCH/F (rsl_cmode=3, tch_mode=0, handover=0)
20211013164507677 DTRX <000b> trx_if.c:256 phy0.0: Enqueuing TRX control command 'CMD NOHANDOVER 1 0'
20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for TCH/F
20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for SACCH/TF
20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for TCH/F
20211013164507677 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for SACCH/TF
20211013164507677 DL1C <0006> l1sap.c:839 (bts=0,trx=0,ts=1,ss=0) activate confirm chan_nr=TCH/F on TS1 trx=0
20211013164507677 DRSL <0000> rsl.c:1327 (bts=0,trx=0,ts=1,pchan=TCH/F) (ss=0) TCH_F Tx CHAN ACT ACK
20211013164507677 DTRX <000b> trx_if.c:672 phy0.0: Response message: 'RSP NOHANDOVER 0 1 0'
20211013164507777 DL1P <0007> sched_lchan_tchf.c:216 000159/00/03/06/03 (bts=0,trx=0,ts=1) TCH/F: Received bad data (55/104)
20211013164507796 DLLAPD <0011> lapd_core.c:949 ((bts=0,trx=0,ts=1,ss=0)[DCCH][0]) Store content res.
20211013164507796 DRSL <0000> rsl.c:3603 (bts=0,trx=0,ts=1,ss=0) Fwd RLL msg EST_IND from LAPDm to A-bis
0: stopped pid 8 with status 139
sleeping 1 seconds...

#4 Updated by pespin 13 days ago

If test TC_rr_response_frame_loss is run alone, osmo-bts doesn't crash.

#5 Updated by pespin 13 days ago

I enabled ASan:

20211013172315371 DRSL <0000> rsl.c:3702 (bts=0,trx=0,ts=1,pchan=TCH/F) ss=0 Rx RSL CHAN_ACTIV
20211013172315371 DRSL <0000> rsl.c:1854 (bts=0,trx=0,ts=1,ss=0) chan_nr=TCH/F on TS1 type=0x00=INITIAL mode=SIGNALLING
20211013172315371 DL1C <0006> l1sap.c:2038 (bts=0,trx=0,ts=1,ss=0) Activating channel TCH/F on TS1
20211013172315371 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating TCH/F
20211013172315371 DL1C <0006> scheduler.c:1107 (bts=0,trx=0,ts=1,ss=0) Activating SACCH/TF
20211013172315371 DL1C <0006> scheduler.c:1170 (bts=0,trx=0,ts=1) Set mode for TCH/F (rsl_cmode=3, tch_mode=0, handover=0)
20211013172315371 DTRX <000b> trx_if.c:256 phy0.0: Enqueuing TRX control command 'CMD NOHANDOVER 1 0'
20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for TCH/F
20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 uplink for SACCH/TF
20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for TCH/F
20211013172315371 DL1C <0006> scheduler.c:1236 (bts=0,trx=0,ts=1,ss=0) Set A5/0 downlink for SACCH/TF
20211013172315371 DL1C <0006> l1sap.c:839 (bts=0,trx=0,ts=1,ss=0) activate confirm chan_nr=TCH/F on TS1 trx=0
20211013172315372 DRSL <0000> rsl.c:1327 (bts=0,trx=0,ts=1,pchan=TCH/F) (ss=0) TCH_F Tx CHAN ACT ACK
20211013172315372 DTRX <000b> trx_if.c:672 phy0.0: Response message: 'RSP NOHANDOVER 0 1 0'
20211013172315442 DL1P <0007> sched_lchan_tchf.c:216 000159/00/03/06/03 (bts=0,trx=0,ts=1) TCH/F: Received bad data (55/104)
20211013172315461 DLLAPD <0011> lapd_core.c:949 ((bts=0,trx=0,ts=1,ss=0)[DCCH][0]) Store content res.
20211013172315461 DRSL <0000> rsl.c:3603 (bts=0,trx=0,ts=1,ss=0) Fwd RLL msg EST_IND from LAPDm to A-bis
rbtree.c:82:24: runtime error: member access within null pointer of type 'struct rb_node'
ASAN:DEADLYSIGNAL
=================================================================
==8==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7ff53be4cc96 bp 0x7ffd03c21950 sp 0x7ffd03c21920 T0)
    #0 0x7ff53be4cc95 in rb_insert_color (/usr/local/lib/libosmocore.so.17+0x100c95)
    #1 0x7ff53be04256 in __add_timer (/usr/local/lib/libosmocore.so.17+0xb8256)
    #2 0x7ff53be04460 in osmo_timer_add (/usr/local/lib/libosmocore.so.17+0xb8460)
    #3 0x7ff53be04856 in osmo_timer_schedule (/usr/local/lib/libosmocore.so.17+0xb8856)
    #4 0x7ff53be42ee5 in rate_ctr_timer_cb (/usr/local/lib/libosmocore.so.17+0xf6ee5)
    #5 0x7ff53be05b10 in osmo_timers_update (/usr/local/lib/libosmocore.so.17+0xb9b10)
    #6 0x7ff53be09395 in _osmo_select_main (/usr/local/lib/libosmocore.so.17+0xbd395)
    #7 0x7ff53be09446 in osmo_select_main (/usr/local/lib/libosmocore.so.17+0xbd446)
    #8 0x55a27d03205e in bts_main (/usr/local/bin/osmo-bts-trx+0x23205e)
    #9 0x55a27cf598e9 in main (/usr/local/bin/osmo-bts-trx+0x1598e9)
    #10 0x7ff53a8952e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #11 0x55a27cf58729 in _start (/usr/local/bin/osmo-bts-trx+0x158729)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/local/lib/libosmocore.so.17+0x100c95) in rb_insert_color
==8==ABORTING
0: stopped pid 8 with status 1

#6 Updated by pespin 13 days ago

I attach full osmo-bts.log file while running both tests.

#7 Updated by pespin 13 days ago

So to me it looks like some struct is freed at some point which still holds a timer set and which is not removed when freed.

#8 Updated by pespin 13 days ago

Since the crash happens in rate_ctr_timer_cb(), it's probably some rate_ctr_group which is somehow freed but not removed properly. So probably automatically freed by talloc_free() but no rate_ctr_group_free() is called on it.

#9 Updated by pespin 13 days ago

The most plausible ctr_group having issues is:

src/common/scheduler.c
619:    l1ts->ctrs = rate_ctr_group_alloc(ts->trx,
src/common/scheduler.c
663:    rate_ctr_group_free(l1ts->ctrs);

This is called twice per TS (non-shadow and shadow TS). trx_sched_init_ts() does the alloc and trx_sched_clean_ts() does the free.

#10 Updated by pespin 12 days ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 90

The issue was lchan->lapdm_ch not being released under some conditions. Fixed by:
https://gerrit.osmocom.org/c/osmo-bts/+/25792

#11 Updated by pespin 12 days ago

Patch was merged, waiting for complete ttcn3-bts-tests nightly to run before closing.

#12 Updated by pespin 8 days ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

tests are fixed now, closing.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)