Project

General

Profile

Actions

Bug #6180

closed

ASSERT in l1sap_tch_ind

Added by keith 8 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
09/14/2023
Due date:
% Done:

100%

Spec Reference:

Description

On a sysmoBTS with the latest release, I observed regularly hitting this ASSERT at https://gerrit.osmocom.org/c/osmo-bts/+/33832/5/src/common/l1sap.c#1982

        case RSL_CMOD_SPD_SIGN:
        default: /* shall not happen */
            OSMO_ASSERT(0);
        }

I've patched to avoid this ASSERT based on fixeria 's advice on IRC to get this BTS back up and running, so I can't reproduce right now.
(I've only installed the release on one system)

Here's a backtrace I did grab from the terminal scrollback, probably not very useful:

((*))                                                                                                                                                                              
  |                                                                                                                                                                                
 / \ OsmoBTS                                                                                                                                                                       
Assert failed 0 ../../../git/src/common/l1sap.c:1983                                                                                                                               
backtrace() returned 0 addresses                                                                                                                                                   

Program received signal SIGABRT, Aborted.                                                                                                                                          
0x432dcf74 in raise () from /lib/libc.so.6                                                                                                                                         
(gdb) bt                                                                                                                                                                           
#0  0x432dcf74 in raise () from /lib/libc.so.6                                                                                                                                     
#1  0x432de358 in abort () from /lib/libc.so.6                                                                                                                                     
#2  0xb6e778d4 in osmo_panic_default (args=..., fmt=0x0)                                                                                                                           
    at /usr/src/debug/libosmocore/1.9.0+gitrAUTOINC+aca2c724ae-r2.18.0/git/src/core/panic.c:45                                                                                     
#3  osmo_panic (fmt=0x5f448 "Assert failed %s %s:%d\n")                                                                                                                            
    at /usr/src/debug/libosmocore/1.9.0+gitrAUTOINC+aca2c724ae-r2.18.0/git/src/core/panic.c:80                                                                                     
#4  0x0005098c in l1sap_tch_ind (tch_ind=<optimized out>, l1sap=<optimized out>, trx=0xb6b5a038)                                                                                   
    at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/common/l1sap.c:1983                                                                                       
#5  l1sap_up (trx=trx@entry=0xb6b5a038, l1sap=<optimized out>)                                                                                                                     
    at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/common/l1sap.c:2184                                                                                       
#6  0x00051b34 in add_l1sap_header (trx=trx@entry=0xb6b5a038, rmsg=<optimized out>, lchan=<optimized out>,                                                                         
    chan_nr=<optimized out>, fn=5084, ber10k=1184, lqual_cb=102, rssi=-104 '\230', ta_offs=192, is_sub=0 '\000')                                                                   
    at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/common/l1sap.c:179                                                                                        
#7  0x00020b8c in l1if_tch_rx (trx=trx@entry=0xb6b5a038, chan_nr=chan_nr@entry=26 '\032',                                                                                          
    l1p_msg=l1p_msg@entry=0x187060)                                                                                                                                                
    at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/osmo-bts-sysmo/tch.c:611                                                                                  
#8  0x000188e0 in handle_ph_data_ind (l1p_msg=0x187060, data_ind=0x187128, fl1=<optimized out>)                                                                                    
    at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/osmo-bts-sysmo/l1_if.c:976                                                                                
#9  l1if_handle_ind (fl1=<optimized out>, msg=0x187060)                                                                                                                            
    at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/osmo-bts-sysmo/l1_if.c:1139                                                                               
---Type <return> to continue, or q <return> to quit---q                          
Actions #1

Updated by fixeria 7 months ago

  • Status changed from New to In Progress
  • Assignee set to fixeria
  • Priority changed from Normal to High
Actions #2

Updated by fixeria 7 months ago

keith thanks for reporting! I would still be interested to figure out why exactly this is happening. TCH.ind is certainly not expected when a logical channel is in signalling mode. My first thought was that it's caused by buffering: some TCH frames may still be emitted by the PHY right after the channel mode change. But that would only be possible when changing from speech to signalling, which is highly unusual... unless you're using dynamic timeslots, do you? With dynamic timeslots it's pretty much possible. For instance, a voice calls gets terminated and the timeslot switches from TCHF (or TCHH) to PDCH (or SDCCH8).

Actions #3

Updated by fixeria 7 months ago

keith wrote:

Here's a backtrace I did grab from the terminal scrollback, probably not very useful:

It's indeed not useful, unfortunately. All I see is that some TCH.ind primitive traversed up across the stack and crashed the process because it was not expected. If you still have the coredump file, it may be interesting to see the payload of this primitive, but the respective pointer is <optimized out>, so... What would be useful is some logging preceding the segfault (if you still have it, of course), so that we could reconstruct the sequence of events.

I've patched to avoid this ASSERT based on fixeria 's advice on IRC to get this BTS back up and running, so I can't reproduce right now.

Good to hear. So we know how to fix this, but I am still curious to know what's causing this :)

Actions #4

Updated by fixeria 7 months ago

  • Status changed from In Progress to Feedback

Here is a patch:

https://gerrit.osmocom.org/c/osmo-bts/+/34446 l1sap: l1sap_tch_ind(): fix segfault on stale TCH.ind [NEW]

Once merged, we should back-port it to osmo-bts v1.7.0 and tag a patch release.

Actions #5

Updated by keith 7 months ago

hi. thanks for looking at it, sorry for delay with replies, I have been AFK for a week's rest.
Yes, DYN timeslots were in use.

Although I didn't try the suspect code on a sysmoBTS at my desk, it really didn't take long for the bug to be triggered in the production scenario. less than 1 minute. I wonder if running a non optimized binary of a sysmoBTS would trigger it and get the debug info needed to shed more light on what is causing it? IIRC, fixeria doesn't have a sysmoBTS to hand. I'll see if I can get around to taking a look in the next days.

Actions #6

Updated by fixeria 7 months ago

  • Assignee changed from fixeria to keith

Hi Keith,

keith wrote in #note-5:

Yes, DYN timeslots were in use.

ok, this confirms my assumption.

Although I didn't try the suspect code on a sysmoBTS at my desk, it really didn't take long for the bug to be triggered in the production scenario. less than 1 minute.

I will try running a -trx based setup with dynamic timeslots configured.

I wonder if running a non optimized binary of a sysmoBTS would trigger it and get the debug info needed to shed more light on what is causing it?
IIRC, fixeria doesn't have a sysmoBTS to hand. I'll see if I can get around to taking a look in the next days.

No, I don't have a sysmoBTS [anymore]. Assigning to you for feedback.

Actions #7

Updated by fixeria 7 months ago

fixeria wrote in #note-6:

I will try running a -trx based setup with dynamic timeslots configured.

I was not able to reproduce the problem with osmo-bts-trx and all timeslots (except TS0) set to phys_chan_config DYNAMIC/OSMOCOM.

keith I think we can just merge the fix and tag a patch-release, given that you said adding a break fixed the problem for you.

Actions #8

Updated by fixeria 7 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)