Bug #6180
openASSERT in l1sap_tch_ind
0%
Description
On a sysmoBTS with the latest release, I observed regularly hitting this ASSERT at https://gerrit.osmocom.org/c/osmo-bts/+/33832/5/src/common/l1sap.c#1982
case RSL_CMOD_SPD_SIGN:
default: /* shall not happen */
OSMO_ASSERT(0);
}
I've patched to avoid this ASSERT based on fixeria 's advice on IRC to get this BTS back up and running, so I can't reproduce right now.
(I've only installed the release on one system)
Here's a backtrace I did grab from the terminal scrollback, probably not very useful:
((*)) | / \ OsmoBTS Assert failed 0 ../../../git/src/common/l1sap.c:1983 backtrace() returned 0 addresses Program received signal SIGABRT, Aborted. 0x432dcf74 in raise () from /lib/libc.so.6 (gdb) bt #0 0x432dcf74 in raise () from /lib/libc.so.6 #1 0x432de358 in abort () from /lib/libc.so.6 #2 0xb6e778d4 in osmo_panic_default (args=..., fmt=0x0) at /usr/src/debug/libosmocore/1.9.0+gitrAUTOINC+aca2c724ae-r2.18.0/git/src/core/panic.c:45 #3 osmo_panic (fmt=0x5f448 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.9.0+gitrAUTOINC+aca2c724ae-r2.18.0/git/src/core/panic.c:80 #4 0x0005098c in l1sap_tch_ind (tch_ind=<optimized out>, l1sap=<optimized out>, trx=0xb6b5a038) at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/common/l1sap.c:1983 #5 l1sap_up (trx=trx@entry=0xb6b5a038, l1sap=<optimized out>) at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/common/l1sap.c:2184 #6 0x00051b34 in add_l1sap_header (trx=trx@entry=0xb6b5a038, rmsg=<optimized out>, lchan=<optimized out>, chan_nr=<optimized out>, fn=5084, ber10k=1184, lqual_cb=102, rssi=-104 '\230', ta_offs=192, is_sub=0 '\000') at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/common/l1sap.c:179 #7 0x00020b8c in l1if_tch_rx (trx=trx@entry=0xb6b5a038, chan_nr=chan_nr@entry=26 '\032', l1p_msg=l1p_msg@entry=0x187060) at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/osmo-bts-sysmo/tch.c:611 #8 0x000188e0 in handle_ph_data_ind (l1p_msg=0x187060, data_ind=0x187128, fl1=<optimized out>) at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/osmo-bts-sysmo/l1_if.c:976 #9 l1if_handle_ind (fl1=<optimized out>, msg=0x187060) at /usr/src/debug/osmo-bts/1.7.0+gitAUTOINC+e97834f2db-r0.18/git/src/osmo-bts-sysmo/l1_if.c:1139 ---Type <return> to continue, or q <return> to quit---q
Updated by fixeria 10 days ago
keith thanks for reporting! I would still be interested to figure out why exactly this is happening. TCH.ind is certainly not expected when a logical channel is in signalling mode. My first thought was that it's caused by buffering: some TCH frames may still be emitted by the PHY right after the channel mode change. But that would only be possible when changing from speech to signalling, which is highly unusual... unless you're using dynamic timeslots, do you? With dynamic timeslots it's pretty much possible. For instance, a voice calls gets terminated and the timeslot switches from TCHF (or TCHH) to PDCH (or SDCCH8).
Updated by fixeria 10 days ago
keith wrote:
Here's a backtrace I did grab from the terminal scrollback, probably not very useful:
It's indeed not useful, unfortunately. All I see is that some TCH.ind primitive traversed up across the stack and crashed the process because it was not expected. If you still have the coredump file, it may be interesting to see the payload of this primitive, but the respective pointer is <optimized out>
, so... What would be useful is some logging preceding the segfault (if you still have it, of course), so that we could reconstruct the sequence of events.
I've patched to avoid this ASSERT based on fixeria 's advice on IRC to get this BTS back up and running, so I can't reproduce right now.
Good to hear. So we know how to fix this, but I am still curious to know what's causing this :)
Updated by fixeria 9 days ago
- Status changed from In Progress to Feedback
Here is a patch:
https://gerrit.osmocom.org/c/osmo-bts/+/34446 l1sap: l1sap_tch_ind(): fix segfault on stale TCH.ind [NEW]
Once merged, we should back-port it to osmo-bts v1.7.0 and tag a patch release.
Updated by keith 1 day ago
hi. thanks for looking at it, sorry for delay with replies, I have been AFK for a week's rest.
Yes, DYN timeslots were in use.
Although I didn't try the suspect code on a sysmoBTS at my desk, it really didn't take long for the bug to be triggered in the production scenario. less than 1 minute. I wonder if running a non optimized binary of a sysmoBTS would trigger it and get the debug info needed to shed more light on what is causing it? IIRC, fixeria doesn't have a sysmoBTS to hand. I'll see if I can get around to taking a look in the next days.