Project

General

Profile

Bug #3713

osmo-bsc abort()s in ttcn3-bsc-sccplite TC_ho_int testcase

Added by daniel 18 days ago. Updated 17 days ago.

Status:
Feedback
Priority:
High
Assignee:
Category:
Handover
Target version:
-
Start date:
11/27/2018
Due date:
% Done:

0%

Spec Reference:

Description

I can not reproduce this if TC_ho_int is the first test I run. Running the whole suite with docker does cause this issue so it seems some lingering state triggers this.

The core file shows the following:

(gdb) bt full
#0  __GI_abort () at abort.c:125                             
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {18446744073709551615 <repeats 16 times>}},
          sa_flags = 0, sa_restorer = 0x0}                                                                                                    
        sigs = {__val = {32, 0 <repeats 15 times>}}                    
#1  0x00007f8033abd020 in osmo_panic () from /usr/lib/x86_64-linux-gnu/libosmocore.so.11
No symbol table info available.                                  
#2  0x0000000000432dca in conn_get_bts (conn=<optimized out>) at ../../include/osmocom/bsc/gsm_data.h:1242                                    
No locals.                                                                                                                                    
#3  ho_or_as (vty=0xf0d2d0, argv=0xf0dcb0, argc=6) at bsc_vty.c:1668  
        conn = 0xf0dcb0
        new_bts = 0x7fff8a84a590                                                                                                              
        bts_nr = 0                       
        trx_nr = 0
        ss_nr = 856498279                                       
        bts_nr_new = 514                          
        action = 0x6 <error: Cannot access memory at address 0x6>
#4  0x00007f80343567c5 in ?? () from /usr/lib/x86_64-linux-gnu/libosmovty.so.4
No symbol table info available.                                                                                                               
#5  0x00007f803435b429 in vty_read () from /usr/lib/x86_64-linux-gnu/libosmovty.so.4
No symbol table info available.                  
#6  0x00007f803435d4d9 in ?? () from /usr/lib/x86_64-linux-gnu/libosmovty.so.4
No symbol table info available.
#7  0x00007f8033ab38f0 in osmo_select_main () from /usr/lib/x86_64-linux-gnu/libosmocore.so.11
No symbol table info available.

The code in ho_or_as iterates through the subscr_conns and just assumes there is an lchan present for all of them. I guess this is no longer the case and we should prepend a conn->lchan && to the if statement?

osmo-bsc.log osmo-bsc.log 5.03 MB daniel, 11/28/2018 05:06 PM
BSC_Tests.TC_paging_resp_unsol.pcap BSC_Tests.TC_paging_resp_unsol.pcap 8.31 KB daniel, 11/28/2018 05:15 PM
BSC_Tests.TC_paging_resp_unsol.merged BSC_Tests.TC_paging_resp_unsol.merged 105 KB daniel, 11/28/2018 05:15 PM

History

#1 Updated by neels 18 days ago

  • Status changed from New to Feedback
  • Assignee changed from neels to daniel

hmm, the conclusion seems right, but let's first try to find out which conn it is that has no lchan, and whether that makes sense at all.
The fact that this testcase alone cannot trigger the failure indicates that some previous test leaves a conn lingering behind.

On branch neels/hack I pushed a patch that might help to find out:

commit e92ce3f0c6a03084c9f1037204c94c9a44c37bdd (origin/neels/hack, neels/hack)
Author: Neels Hofmeyr <neels@hofmeyr.de>
Date:   Tue Nov 27 18:19:33 2018 +0100

    NOT FOR MERGE: try to find out why there is a conn without an lchan

    in ttcn3 tests, we currently face a crash where the vty 'handover any' command
    assumes all conns have an lchan.

    Don't access a NULL lchan.

    But first we want to find out why there is a NULL lchan, so log a visible error.

    Change-Id: If5909b3890cf3f6327b271e53e458a417cebf73f

diff --git a/src/osmo-bsc/bsc_vty.c b/src/osmo-bsc/bsc_vty.c
index 983dcb9b3..947ba67df 100644
--- a/src/osmo-bsc/bsc_vty.c
+++ b/src/osmo-bsc/bsc_vty.c
@@ -1665,6 +1665,11 @@ static int ho_or_as(struct vty *vty, const char *argv[], int argc)

        /* Find the connection/lchan that we want to handover */
        llist_for_each_entry(conn, &net->subscr_conns, entry) {
+               if (!conn->lchan) {
+                       LOGP(DHO, LOGL_ERROR, "XXXXX found a conn without an lchan: %s %s\n",
+                            osmo_fsm_inst_name(conn->fi), bsc_subscr_name(conn->bsub));
+                       continue;
+               }
                if (conn_get_bts(conn)->nr == bts_nr &&
                    conn->lchan->ts->trx->nr == trx_nr &&
                    conn->lchan->ts->nr == ts_nr && conn->lchan->nr == ss_nr) {

daniel can you try to reproduce the problem with this patch applied? Then we might see an error log with XXXXX telling us the FSM instance name of a conn, and hence we might be able to associate that with some earlier logging from osmo-bsc to find out why that conn is still around.

#2 Updated by daniel 17 days ago

With your patch the only failing test is currently TC_paging_resp_unsol:
f_expect_chan_rel(): expect_rll_rel_req=true got_rll_rel_req=false

I can see the log message you added:

Wed Nov 28 14:37:47 2018 DHO <0008> bsc_vty.c:1670 XXXXX found a conn without an lchan: SUBSCR_CONN[0x21d7010] unknown

Lines 32073 of osmo-bsc.log show where this came from:

32073 Wed Nov 28 14:36:11 2018 DRSL <0003> osmo_bsc_main.c:283 bootstrapping RSL for BTS/TRX (0/0) on ARFCN 871 using MCC-MNC 001-01 LAC=1 CID=0  32073  BSIC=10
32074 Wed Nov 28 14:36:12 2018 DRSL <0003> abis_rsl.c:1357 (bts=0) CHAN RQD: reason: answer to paging (ra=0x23, neci=0x01, chreq_reason=0x01)
32075 Wed Nov 28 14:36:12 2018 DMSC <0007> fsm.c:299 SUBSCR_CONN[0x21d7010]{INIT}: Allocated
32076 Wed Nov 28 14:36:12 2018 DLCLS <000f> fsm.c:299 LCLS[0x21d7140]{NO_LCLS}: Allocated
32077 Wed Nov 28 14:36:12 2018 DLCLS <000f> fsm.c:329 LCLS[0x21d7140]{NO_LCLS}: is child of SUBSCR_CONN[0x21d7010]
32078 Wed Nov 28 14:36:12 2018 DMSC <0007> gsm_08_08.c:328 Got paged but no subscriber found.
32079 Wed Nov 28 14:36:12 2018 DMSC <0007> gsm_08_08.c:363 Failed to find a MSC for a connection.
32080 Wed Nov 28 14:36:12 2018 DMSC <0007> lchan_fsm.c:1390 SUBSCR_CONN[0x21d7010]{INIT}: lchan lchan(0-0-0-CCCH_SDCCH4-0)[0x21d8a10] detaches from conn

The failing test TC_paging_resp_unsol seems to be responsible for the that.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)