Project

General

Profile

Actions

Bug #5925

closed

dtx downlink causing ABORT in osmo-bts-sysmo

Added by keith about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
osmo-bts-sysmo
Target version:
-
Start date:
02/27/2023
Due date:
% Done:

100%

Spec Reference:

Description

With param dtx downlink and making a call, causes

==24631== Process terminating with default action of signal 6 (SIGABRT)                                                                                                                 
==24631==    at 0x6612E87: raise (raise.c:51)                                                                                                                                           
==24631==    by 0x66147F0: abort (abort.c:79)                                                                                                                                           
==24631==    by 0x5D648AF: osmo_panic (in /usr/lib/x86_64-linux-gnu/libosmocore.so.20.0.0)                                                                                              
==24631==    by 0x114E8E: msgb_pull (msgb.h:407)                                                                                                                                        
==24631==    by 0x116751: ph_tch_req (l1_if.c:508)                                                                                                                                      
==24631==    by 0x116A08: ph_tch_req (l1_if.c:564)                                                                                                                                      
==24631==    by 0x116CE9: bts_model_l1sap_down (l1_if.c:632)                                                                                                                            
==24631==    by 0x164A1F: l1sap_down (l1sap.c:1849)                                                                                                                                     
==24631==    by 0x16285D: l1sap_tch_rts_ind (l1sap.c:1324)                                                                                                                              
==24631==    by 0x164821: l1sap_up (l1sap.c:1809)                                                                                                                                       
==24631==    by 0x117637: handle_ph_readytosend_ind (l1_if.c:886)                                                                                                                       
==24631==    by 0x118438: l1if_handle_ind (l1_if.c:1119)

ph_tch_req() calls itself passing l1sap->oph.msg as msg
then
msgb_pull(msg, sizeof(*l1sap));
is called.

Actions #1

Updated by laforge about 1 year ago

  • Category set to osmo-bts-sysmo

can you elaborate a bit more in detail? what does "make a call" mean in this context? MO call? MT call? both legs of a call? Which codec was used?

The code path is the downlink transmit path, when the PH-RTS.ind for TCH arrives, and a TCH.req is sent down from l1sap. l1sap_tch_rts_ind is dequeuing a RTP message from the RTP socket and passing it down. So what this looks like is we're passing a short/truncated message from RTP down into the BTS model?

The function enqueueing message buffers is l1sap_rtp_rx_cb(). That function
  • allocates using l1sap_msgb_alloc
  • copies the rtp payload into the msgb
  • pulls the osmo_phsap_prim from the start of the msgb (which l1sap_msgb_alloc allocated there)
  • then enqueues it into the dl_tch_queue
  • l1sap_tch_rts_ind dequeues the mesasge and pushes a osmo_phsap_prim to its start
  • ph_tch_req then pulls that very osmo_phsap_prim - and thats's when it ABORTs.

This is rather odd, as I currently don't see how even a zero-length RTP message would affect what we do to the header of the msgb.

Actions #2

Updated by laforge about 1 year ago

needless to say, pcap file of the RTP traffic and osmo-bts logs at the time of the bug would help debugging this.

Actions #3

Updated by laforge about 1 year ago

  • Assignee set to fixeria

maybe fixeria can have a quick look, maybe he sees something I don't see.

Actions #4

Updated by fixeria about 1 year ago

  • Status changed from New to In Progress

laforge wrote in #note-1:

So what this looks like is we're passing a short/truncated message from RTP down into the BTS model? [...]
This is rather odd, as I currently don't see how even a zero-length RTP message would affect what we do to the header of the msgb.

ph_tch_req() is a recursive function and conditionally calls itself at the very bottom:

        if (dtx_recursion(lchan)) /* DTX: send voice after ONSET was sent */
                return ph_tch_req(trx, l1sap->oph.msg, l1sap, true, false);

This can be seen in the backtrace:

==24631==    by 0x116751: ph_tch_req (l1_if.c:508)
==24631==    by 0x116A08: ph_tch_req (l1_if.c:564)

The problem it that it does msgb_pull(msg, sizeof(*l1sap)) twice:

        /* create new message and fill data */
        if (msg) {
                msgb_pull(msg, sizeof(*l1sap));

so first time it pulls sizeof(*l1sap) == 56 bytes, then it attempts to pull 56 bytes again, but fails. I am not familiar with this code, and a bit surprised that queuing more than one DATA.req in response to a single RTS is legal. Looking at this function more closely, I think pull()ing is not really necessary and a potential fix would be to avoid doing so. I will prepare a patch.

Actions #5

Updated by fixeria about 1 year ago

  • Status changed from In Progress to Feedback
  • Assignee changed from fixeria to keith
  • % Done changed from 0 to 20

fixeria wrote in #note-4:

Looking at this function more closely, I think pull()ing is not really necessary and a potential fix would be to avoid doing so. I will prepare a patch.

keith please give this patch a try and let us know if it solves the problem:

https://gerrit.osmocom.org/c/osmo-bts/+/31991 osmo-bts-{sysmo,lc15,oc2g}: fix segfault in ph_tch_req() [NEW]

I have no osmo-bts-sysmo at hand (only remote access, which will take time to setup), so I cannot test this patch myself.

Actions #6

Updated by keith about 1 year ago

LGTM, feel free to close when the patch is merged.

Actions #7

Updated by fixeria about 1 year ago

  • Status changed from Feedback to Resolved
  • % Done changed from 20 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)