Project

General

Profile

Actions

Bug #5324

closed

MULTI BSS Handover: Target BTS is NULL, sigsegv in chan_counts_for_bts()

Added by keith over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Handover
Target version:
-
Start date:
11/23/2021
Due date:
% Done:

100%

Spec Reference:

Description

It looks like something with Multi BSS handover is broken:

DHODEC handover_decision_2.c:1470 (lchan 0.020 TCH_F SPEECH_AMR) (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) MEASUREMENT REPORT (1 neighbors)                               
DHODEC handover_decision_2.c:1475 (lchan 0.020 TCH_F SPEECH_AMR) (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f)   0: arfcn=247 bsic=63 neigh_idx=0 rxlev=63 flags=0            
DHODEC handover_decision_2.c:1522 (lchan 0.020 TCH_F SPEECH_AMR) (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) Avg RX level = -47 dBm, +0 dBm AFS bias = -47 dBm; Avg RX quality = 0, +0 AFS bias = 0
DHODEC handover_logic.c:241 (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) HO-none: There are explicit neighbors configured for this cell                                      
DHODEC handover_logic.c:254 (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) HO-none: Found remote target cell(s) CGI[1]:{334-07-274-101}                                        

Program received signal SIGSEGV, Segmentation fault.
0x00005555555b98de in chan_counts_for_bts (bts_counts=bts_counts@entry=0x7fffffff7a70, bts=0x0) at chan_counts.c:133  

From a previous run:

#0  chan_counts_for_bts (bts_counts=bts_counts@entry=0x7fffffff7a10, bts=0x0) at chan_counts.c:137
#1  0x00005555555c0cad in candidate_set_free_tch (c=c@entry=0x7fffffff8240) at handover_decision_2.c:1030
#2  0x00005555555c2bd7 in collect_handover_candidate (lchan=lchan@entry=0x7ffff7e9f1f0, nmp=nmp@entry=0x7ffff7e9f36c, clist=clist@entry=0x7fffffff89b0,
    candidates=candidates@entry=0x7fffffff899c, include_weaker_rxlev=include_weaker_rxlev@entry=false, rxlev_current=rxlev_current@entry=63, neighbors_count=0x7fffffff8914)
    at handover_decision_2.c:1146
#3  0x00005555555c5813 in collect_candidates_for_lchan (lchan=lchan@entry=0x7ffff7e9f1f0, clist=clist@entry=0x7fffffff89b0, candidates=candidates@entry=0x7fffffff899c,
    _rxlev_current=_rxlev_current@entry=0x7fffffff8998, include_weaker_rxlev=include_weaker_rxlev@entry=false) at handover_decision_2.c:1224
#4  0x00005555555c6af4 in find_alternative_lchan (lchan=lchan@entry=0x7ffff7e9f1f0, include_weaker_rxlev=include_weaker_rxlev@entry=false, request_upgrade_to_tch_f=false)
    at handover_decision_2.c:1303
#5  0x00005555555c7f8f in on_measurement_report (mr=0x7ffff7e9f540) at handover_decision_2.c:1577
#6  0x00005555555d2647 in ho_meas_rep (mr=0x7ffff7e9f540) at handover_logic.c:95
#7  ho_logic_sig_cb (subsys=<optimized out>, signal=<optimized out>, handler_data=<optimized out>, signal_data=<optimized out>) at handover_logic.c:316
#8  0x00007ffff72e3ca4 in osmo_signal_dispatch (subsys=subsys@entry=3, signal=signal@entry=8, signal_data=signal_data@entry=0x7fffffffd170) at signal.c:118
#9  0x0000555555582a72 in send_lchan_signal (resp=0x7ffff7e9f540, lchan=<optimized out>, sig_no=8) at abis_rsl.c:67
#10 rsl_rx_meas_res (msg=msg@entry=0x555555bc0350) at abis_rsl.c:1455
#11 0x00005555555879e5 in abis_rsl_rx_dchan (msg=0x555555bc0350) at abis_rsl.c:1544
#12 abis_rsl_rcvmsg (msg=0x555555bc0350) at abis_rsl.c:3056
#13 0x00007ffff6eac542 in handle_ts1_read () from /usr/local/lib/libosmoabis.so.10
#14 0x00007ffff6eaca2b in ipaccess_fd_cb () from /usr/local/lib/libosmoabis.so.10
#15 0x00007ffff72e36fc in poll_disp_fds (n_fd=<optimized out>) at select.c:361
#16 _osmo_select_main (polling=<optimized out>) at select.c:393
#17 0x00007ffff72e37e6 in osmo_select_main_ctx (polling=<optimized out>) at select.c:449
#18 0x0000555555575909 in main (argc=<optimized out>, argv=<optimized out>) at osmo_bsc_main.c:1087

In candidate_set_free_tch ():

(gdb) p c->target
$13 = {ab = {arfcn = 249, bsic = 63 '?'}, cell_ids = {id_discr = CELL_IDENT_WHOLE_GLOBAL, id_list = {{global = {lai = {plmn = {mcc = 334, mnc = 7, mnc_3_digits = false},
            lac = 274}, cell_identity = 102}, lac_and_ci = {lac = 334, ci = 7}, ci = 334, lai_and_lac = {plmn = {mcc = 334, mnc = 7, mnc_3_digits = false}, lac = 274}, lac = 334,
        global_ps = {rai = {lac = {plmn = {mcc = 334, mnc = 7, mnc_3_digits = false}, lac = 274}, rac = 102 'f'}, cell_identity = 0}}, {global = {lai = {plmn = {mcc = 0, mnc = 0,
              mnc_3_digits = false}, lac = 0}, cell_identity = 0}, lac_and_ci = {lac = 0, ci = 0}, ci = 0, lai_and_lac = {plmn = {mcc = 0, mnc = 0, mnc_3_digits = false}, lac = 0},
        lac = 0, global_ps = {rai = {lac = {plmn = {mcc = 0, mnc = 0, mnc_3_digits = false}, lac = 0}, rac = 0 '\000'}, cell_identity = 0}} <repeats 126 times>}, id_list_len = 1},
  bts = 0x0, rxlev = 63, rxlev_afs_bias = 0, free_tchf = 0, min_free_tchf = 0, free_tchh = 0, min_free_tchh = 0, next_tchf_reduces_tchh = 0, next_tchh_reduces_tchf = 0}

Related issues

Related to OsmoBSC - Bug #5246: sigsegv in bts_count_free_ts()Newkeith10/03/2021

Actions
Related to OsmoBSC - Bug #5385: Segmentation fault in chan_counts_for_bts()Resolvedpespin01/05/2022

Actions
Related to OsmoBSC - Bug #5525: Multi BSS Handover: gsm_bts_cell_id() passed NULL btsResolvedneels04/12/2022

Actions
Actions #1

Updated by keith over 2 years ago

  • Related to Bug #5246: sigsegv in bts_count_free_ts() added
Actions #2

Updated by keith over 2 years ago

in handover_decision.c:1129:

    /* For cells in a remote BSS, we cannot query the target cell's handover config, and hence
     * instead assume the local BTS' config to apply. */
    neigh_cfg = (neighbor_bts ? : bts)->ho;

So, IIUC, We are expecting that neighbor_bts may be NULL at this point.
In this case, We are then proceeding to define c as struct ho_candidate with member .target.bts = 0x0

We pass &c to candidate_set_free_tch() which calls chan_counts_for_bts(&bts_counts, c->target.bts) at line 1030

That function dereferences the null pointer: llist_for_each_entry(trx, &bts->trx_list, list) -> BOOM!

Actions #3

Updated by keith over 2 years ago

I also notice that a few lines down in handover_decision_2.c: (line 1172) we are doing:

    if (neighbor_bts) {
        check_requirements(&c);
    } else
        check_requirements_remote_bss(&c);

So maybe something like this is enough?

--- a/src/osmo-bsc/handover_decision_2.c
+++ b/src/osmo-bsc/handover_decision_2.c
@@ -1143,7 +1143,8 @@ static void collect_handover_candidate(struct gsm_lchan *lchan, struct neigh_mea
                        .rxlev = neigh_meas_avg(nmp, ho_get_hodec2_rxlev_neigh_avg_win(bts->ho)),
                },
        };
-       candidate_set_free_tch(&c);
+       if (neighbor_bts)
+               candidate_set_free_tch(&c);
Actions #4

Updated by keith over 2 years ago

I've tested this patch (above in #5324-3) and am running successfully a multi-BSS system with HO working.

neels You want to take a look and see if this simple check is enough?

Looks like it was introduced in
https://osmocom.org/projects/osmobsc/repository/osmo-bsc/revisions/d946e5b280dbce0131234a10d28524d910c76553

thnx

Actions #5

Updated by neels over 2 years ago

Thanks for reporting this!

I'll try to find out why the inter-BSC HO ttcn3 testing doesn't catch this problem.
I also have an alternative patch that makes candidate_set_free_tch() safe to call for inter-BSC candidates.
Still testing...

At first I thought the bug was introduced by the recent channel counting refactoring,
but indeed you are right that the problem was introduced much earlier, almost a year ago, when we added
sane handover target channel selection -- which of course doesn't apply for inter-BSC HO.

Actions #6

Updated by neels over 2 years ago

neels wrote in #note-5:

I'll try to find out why the inter-BSC HO ttcn3 testing doesn't catch this problem.

Of course: in ttcn3 we so far just trigger a handover via VTY directly instead of letting
measurements cause HO candidate selection... That's why we didn't catch the bug.

f_vty_transceive(BSCVTY, "handover any to arfcn 123 bsic any");

Given the serious nature of the bug, I'll try to find a way to test that code path...

Actions #7

Updated by neels over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to neels
  • % Done changed from 0 to 90

I am able to reproduce the bug in https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26354
and submitted an alternative fix in https://gerrit.osmocom.org/c/osmo-bsc/+/26352

Actions #8

Updated by neels over 2 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100
Actions #9

Updated by pespin over 2 years ago

  • Related to Bug #5385: Segmentation fault in chan_counts_for_bts() added
Actions #10

Updated by keith about 2 years ago

  • Related to Bug #5525: Multi BSS Handover: gsm_bts_cell_id() passed NULL bts added
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)