Project

General

Profile

Actions

Bug #5645

closed

osmo-bts-trx crashes when using gsmtap option -i

Added by dexter over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
08/15/2022
Due date:
% Done:

100%

Spec Reference:

Description

When osmo-bts-trx is started with option -i (osmo-bts-trx -c ./osmo-bts.cfg -i 127.0.0.1), then it fails with SIGABRT (or Segmentation Fault without GDB). The problem is most likely related to an msgb that is too small.

Mon Aug 15 14:43:21 2022 <0001> nm_channel_fsm.c:152 NM_CHAN_OP(bts0-trx0-ts5)[0x55555576f660]{DISABLED_OFFLINE}: state_chg to ENABLED
Mon Aug 15 14:43:21 2022 <0001> oml.c:354 OC=CHANNEL INST=(00,00,05) AVAIL STATE Off line -> OK
Mon Aug 15 14:43:21 2022 <0001> oml.c:362 OC=CHANNEL INST=(00,00,05) OPER STATE Disabled -> Enabled
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:242 Sending info
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:257 BTS is up
Mon Aug 15 14:43:21 2022 <0001> oml.c:144 OC=CHANNEL(03) INST=(00,00,05): Tx State Changed Event Report
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:139 127.0.0.1:3002 connected write
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:139 127.0.0.1:3002 connected write
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:135 127.0.0.1:3002 connected read
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:56 127.0.0.1:3002 message received
Mon Aug 15 14:43:21 2022 <0001> oml.c:1017 OC=CHANNEL(03) INST=(00,00,06): Rx OPSTART
Mon Aug 15 14:43:21 2022 <0001> l1_if.c:588 NM_CHAN_OP(bts0-trx0-ts6)[0x55555576fcb0]{DISABLED_OFFLINE}: Received Event OPSTART_ACK
Mon Aug 15 14:43:21 2022 <0001> oml.c:144 OC=CHANNEL(03) INST=(00,00,06): Tx Opstart Ack
Mon Aug 15 14:43:21 2022 <0001> nm_channel_fsm.c:152 NM_CHAN_OP(bts0-trx0-ts6)[0x55555576fcb0]{DISABLED_OFFLINE}: state_chg to ENABLED
Mon Aug 15 14:43:21 2022 <0001> oml.c:354 OC=CHANNEL INST=(00,00,06) AVAIL STATE Off line -> OK
Mon Aug 15 14:43:21 2022 <0001> oml.c:362 OC=CHANNEL INST=(00,00,06) OPER STATE Disabled -> Enabled
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:242 Sending info
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:257 BTS is up
Mon Aug 15 14:43:21 2022 <0001> oml.c:144 OC=CHANNEL(03) INST=(00,00,06): Tx State Changed Event Report
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:139 127.0.0.1:3002 connected write
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:139 127.0.0.1:3002 connected write
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:139 127.0.0.1:3002 connected write
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:135 127.0.0.1:3002 connected read
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:56 127.0.0.1:3002 message received
Mon Aug 15 14:43:21 2022 <0001> oml.c:1017 OC=CHANNEL(03) INST=(00,00,07): Rx OPSTART
Mon Aug 15 14:43:21 2022 <0001> l1_if.c:588 NM_CHAN_OP(bts0-trx0-ts7)[0x555555770300]{DISABLED_OFFLINE}: Received Event OPSTART_ACK
Mon Aug 15 14:43:21 2022 <0001> oml.c:144 OC=CHANNEL(03) INST=(00,00,07): Tx Opstart Ack
Mon Aug 15 14:43:21 2022 <0001> nm_channel_fsm.c:152 NM_CHAN_OP(bts0-trx0-ts7)[0x555555770300]{DISABLED_OFFLINE}: state_chg to ENABLED
Mon Aug 15 14:43:21 2022 <0001> oml.c:354 OC=CHANNEL INST=(00,00,07) AVAIL STATE Off line -> OK
Mon Aug 15 14:43:21 2022 <0001> oml.c:362 OC=CHANNEL INST=(00,00,07) OPER STATE Disabled -> Enabled
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:242 Sending info
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:257 BTS is up
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:221 (bts=0,trx=0) PDCH on ts=7 is available (tsc=6 hopping=no arfcn=868)
Mon Aug 15 14:43:21 2022 <0001> oml.c:144 OC=CHANNEL(03) INST=(00,00,07): Tx State Changed Event Report
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:139 127.0.0.1:3002 connected write
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:863 Activate request received: TRX=0 TS=7
Mon Aug 15 14:43:21 2022 <0006> l1sap.c:1942 (bts=0,trx=0,ts=7,ss=0) Activating channel PDCH on TS7
Mon Aug 15 14:43:21 2022 <0006> scheduler.c:1099 (bts=0,trx=0,ts=7,ss=0) Activating PDTCH
Mon Aug 15 14:43:21 2022 <0006> scheduler.c:1099 (bts=0,trx=0,ts=7,ss=0) Activating PTCCH
Mon Aug 15 14:43:21 2022 <0006> lchan.c:271 (bts=0,trx=0,ts=7,ss=0) state NONE -> ACTIVE
Mon Aug 15 14:43:21 2022 <0006> l1sap.c:837 (bts=0,trx=0,ts=7,ss=0) activate confirm chan_nr=PDCH on TS7 trx=0
Mon Aug 15 14:43:21 2022 <0000> rsl.c:1389 (bts=0,trx=0,ts=7,ss=0) not sending CHAN ACT ACK
Mon Aug 15 14:43:21 2022 <0011> input/ipa.c:139 127.0.0.1:3002 connected write
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:448 Sending rts request: is_ptcch=0 arfcn=868 block=11
Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:680 Data request received: sapi=PDTCH arfcn=868 block=11 data=
msgb(0x5555558a3a30): Not enough tailroom msgb_put (allocated 52904, head at 0, len 16, tailroom 53024 < want tailroom 1432604448)
backtrace() returned 22 addresses
/usr/local/lib/libosmocore.so.19(osmo_generate_backtrace+0x18) [0x7ffff7e15e23]
/usr/local/lib/libosmocore.so.19(+0x29b01) [0x7ffff7e15b01]
/usr/local/lib/libosmocore.so.19(osmo_panic+0xca) [0x7ffff7e15bd0]
/usr/local/lib/libosmocore.so.19(+0x28fd0) [0x7ffff7e14fd0]
/usr/local/lib/libosmocore.so.19(gsmtap_makemsg_ex+0x110) [0x7ffff7e1533f]
/usr/local/lib/libosmocore.so.19(gsmtap_send_ex+0x88) [0x7ffff7e1563e]
/usr/local/lib/libosmocore.so.19(gsmtap_send+0x79) [0x7ffff7e156fa]
/usr/local/bin/osmo-bts-trx(+0x51806) [0x5555555a5806]
/usr/local/bin/osmo-bts-trx(+0x56493) [0x5555555aa493]
/usr/local/bin/osmo-bts-trx(+0x566c4) [0x5555555aa6c4]
/usr/local/bin/osmo-bts-trx(+0x4998e) [0x55555559d98e]
/usr/local/bin/osmo-bts-trx(+0x4a740) [0x55555559e740]
/usr/local/bin/osmo-bts-trx(+0x4b174) [0x55555559f174]
/usr/local/bin/osmo-bts-trx(+0x4b3b9) [0x55555559f3b9]
/usr/local/lib/libosmocore.so.19(+0x1082c) [0x7ffff7dfc82c]
/usr/local/lib/libosmocore.so.19(+0x10939) [0x7ffff7dfc939]
/usr/local/lib/libosmocore.so.19(osmo_select_main+0x15) [0x7ffff7dfc955]
/usr/local/bin/osmo-bts-trx(+0x5a5f1) [0x5555555ae5f1]
/usr/local/bin/osmo-bts-trx(+0xd5ed) [0x5555555615ed]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7ffff7c1309b]
/usr/local/bin/osmo-bts-trx(+0xd14a) [0x55555556114a]

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) 

Related issues

Related to OsmoBTS - Bug #5765: osmo-bts-trx crashes when using gsmtap-sapi pdtchResolvedlaforge11/11/2022

Actions
Actions #1

Updated by pespin over 1 year ago

Mon Aug 15 14:43:21 2022 <0009> pcu_sock.c:680 Data request received: sapi=PDTCH arfcn=868 block=11 data=
msgb(0x5555558a3a30): Not enough tailroom msgb_put (allocated 52904, head at 0, len 16, tailroom 53024 < want tailroom 1432604448)

Looks like something is wrong with that size 1432604448. This size should be rejected even before trying to allocate a msgb for it.
So the problem is not the msgb being too small, it's a size totally unrealistic being passed. Maybe a negative number converted to unsigned?

Actions #2

Updated by pespin over 1 year ago

dexter can you provide more information on the versions, setup, etc. you were using when having this crash?

Actions #3

Updated by dexter over 1 year ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 60

I have traced the problem down to a problem with msgb_l2hlen(). This function returns a value, even when l2h is not populated. This is the case sometimes. When I fix the problem it no longer segfaults. As far as I know, the reason why l2h is not populated in some cases is simply because of idle blocks that (see also l1sap_pdch_req()) and those won't have l2h populated.

See also: https://gerrit.osmocom.org/c/libosmocore/+/29591

Actions #4

Updated by dexter over 1 year ago

  • % Done changed from 60 to 80

We have decied to put asserts in the msgb_lXhlen() functions, so if we call it with an msgb that does not have the related lXh pointer populated the program will stop. I have now fixed the location that caused the problems with gsmtap. I also found one more.

https://gerrit.osmocom.org/c/osmo-bts/+/29650 l1sap: do not call msgb_l2hlen without checking
https://gerrit.osmocom.org/c/osmo-bts/+/29651 measurement: do not call msgb_l3len without checking

Actions #5

Updated by fixeria over 1 year ago

  • Priority changed from Normal to High

The libosmocore patch has been merged earlier than the respective osmo-bts patches. I assume this is why we have +35 failures in https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/1799/. Bumping priority to high, let's get the remaining patches merged ASAP.

Actions #6

Updated by dexter over 1 year ago

I am on this now. I think this is in particular related to the problem in measurement.c, which I unfortunately noticed late. Patch https://gerrit.osmocom.org/c/osmo-bts/+/29651 should fix it. So far I haven't seen any problems with other osmocom programs.

Actions #7

Updated by dexter over 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

This is now fixed and all patches are merged.

Actions #8

Updated by fixeria over 1 year ago

  • Related to Bug #5765: osmo-bts-trx crashes when using gsmtap-sapi pdtch added
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)