Bug #4832
osmo-bsc hard-releases lchan if no MSC is found
90%
Description
As described in #4829:
So.... the problem is something else altogether, or at least a large part of the problem.
For some reason, osmo-bsc these days immediately hard-releases the lchan if there is no MSC. There is no signaling sent back to the MS about this:- no spoofing of MM/CM (LU REJECT, CM SERV REJECT, ...) which is still understandable, as it's a layering violation and those messages normally originat e at the MSC
- No RR CHANNEL RELEASE is sent to the MS. That is a big fat bug.
Instead, we simply immedaitely close the lchan on the BTS side. This of course means that from that point onwards there re only bit-errors in downlink in the UE.
DRSL INFO <0003> ../../../git/src/osmo-bsc/abis_rsl.c:1453 (bts=0) CHAN RQD: reason: Location updating (ra=0x0f, neci=0x01, chreq_reason=0x03) DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_select.c:247 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Selected DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1657 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) MS: Channel Request: reason=LOCATION_UPDATE ra=0x0f ta=0 DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:329 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: Received Event LCHAN_EV_ACTIVATE DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:548 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: state_chg to WAIT_TS_READY DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:861 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update requested: 15 dBm DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:893 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update (power class 0): 0 -> 7 DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:626 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) Activation requested: FOR_MS_CHANNEL_REQUEST voice=no MGW-ci=none type=SDCCH tch-mode=SIGNALLING encr-alg=A5/0 ck=none DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/timeslot_fsm.c:106 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: Received Event LCHAN_EV_TS_READY DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: state_chg to WAIT_ACTIV_ACK DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:476 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4,state=IN_USE) Tx RSL Channel Activate with act_type=INITIAL DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Rx CHAN_ACTIV_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1164 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: Received Event LCHAN_EV_RSL_CHAN_ACTIV_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:772 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Tx RR Immediate Assignment DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:822 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: state_chg to WAIT_RLL_RTP_ESTABLISH DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1899 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: (type=SDCCH) SAPI=0 ESTABLISH INDICATION DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1932 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: Received Event LCHAN_EV_RLL_ESTABLISH_IND DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:850 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: state_chg to ESTABLISHED DRSL ERROR <0003> ../../../git/src/osmo-bsc/gsm_08_08.c:483 MM GSM48_MT_MM_LOC_UPD_REQUEST: IMSI-001010000000001: No suitable MSC for this Complete Layer 3 request found DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:644 (bts=0,trx=0,ts=0,ss=0) DEACTivate SACCH CMD DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1595 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{ESTABLISHED}: state_chg to WAIT_RF_RELEASE_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:778 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0] detaches lchan (primary lchan) DMSC ERROR <0007> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:153 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0]{INIT}: Unable to deliver BSSMAP Clear Request message, no MSC for this conn DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: lchan detaches from conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0] DLINP ERROR <0017> ../../git/src/input/ipaccess.c:412 Bad signalling message, sign_link returned error: No such file or directory. DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: (type=SDCCH) Rx RF_CHAN_REL_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1184 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: Received Event LCHAN_EV_RSL_RF_CHAN_REL_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1206 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: state_chg to WAIT_AFTER_ERROR DLSS7 ERROR <0021> ../../git/src/m3ua.c:507 XUA_AS(as-clnt-A-0-m3ua)[0x253d70]{AS_DOWN}: Event AS-TRANSFER.req not permitted DCHAN DEBUG <000f> ../../git/src/fsm.c:322 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: Timeout of X3111 DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1550 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: state_chg to UNUSED DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:382 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Clearing lchan state
We should at the very minimum send a proper RR CHANNEL RELEASE to the MS, so it knows the channel is closed. That's the least we can do. However, it also means that the MS will likely start re-trying again and again. After all, it did not receive any reject with a cause value that would tell it to back off.
So I think we actually should either simply not hard-fast-close but let things run into some timeout (send SCCP connect request but then no response received?), or do the effort of spoofing a LU REJECT / CM SERVICE REJECt with a suitable Reject Cause IE ("Network Failure" or "Service option temporarily out of order" look good to me).
Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).
Related issues
Associated revisions
History
#5 Updated by neels 4 months ago
Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).
This is not exactly related to this specific case, rather refer to #4701
#6 Updated by neels 4 months ago
- Related to Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnected added
#7 Updated by neels 4 months ago
- File 0001-spoof-LU-reject-when-there-is-no-MSC.patch 0001-spoof-LU-reject-when-there-is-no-MSC.patch added
- File spoof-LU-reject-when-there-is-no-MSC.pcapng spoof-LU-reject-when-there-is-no-MSC.pcapng added
- Status changed from In Progress to Feedback
- Assignee changed from neels to laforge
- % Done changed from 50 to 90
tested just now the behavior of a Samsung Galaxy S4m:
- letting the LU Request time out:
- The lchan stays open for close to 20 seconds.
- The MS launches the next LU attempt about 15 seconds later.
- releasing immediately (including RR Release with above fix https://gerrit.osmocom.org/c/osmo-bsc/+/20949 )
- lchan stays open for 0.33 seconds
- The MS launches the next LU attempt about 15 seconds later.
- when spoofing a Location Updating Reject
- lchan stays open for 0.33 seconds
- The MS launches the next LU attempt about 15 seconds later.
No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.
Letting the LU time out is bad because it occupies the lchan for a long time.
Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).
Conclusion from these tests would be to just properly release the lchan and not care about spoofing a reject.
So above patch to simply add an RR Release should do it.
For later reference, attaching the patch that I used to test the spoofing behavior, and a pcap showing it in operation.
#9 Updated by fixeria 4 months ago
While looking at the capture you attached, I stumbled upon packet number 13 "(RR) Channel Release" with cause "Unknown (32)". I checked in 3GPP TS 44.018, section 10.5.2.31 "RR Cause", and yes, Wireshark is right - there is no such cause value. Where this value is coming from?
#10 Updated by laforge 4 months ago
For a LU REJECT with cause "network failure" (which likely is the right
cause here), T3211 applies and that is a 15s timer, indeed. We could
send a "permanent" cause, but that would be wrong.
No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.
Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).
If we wanted to make the MS back off for a longer time, the LU reject would need to
include the optional T3246 ("Extended wait time") IE.
#11 Updated by laforge 23 days ago
- Status changed from Feedback to Stalled
- Assignee changed from laforge to neels
- Priority changed from High to Low
I think in order to avoid signaling overload (every single MS retrying every 15s), we should spoof a LU REJECT with the T3246 "extended wait time" IE. The value we send probably should scale with either the amount of time the MSC is already unreachable, or the rate of LU REQ we get, so we have some kind of back-off.
This could also be done in two steps:- start with spoofing LU with a fixed back-off
- adjusting that back-off automatically.
re-prioritizing as low.
fix missing RR release when there is no MSC
Related: OS#4832
Related: I4ffcfd4be551e0647abe00c4eaa8e9c490887190 (ttcn3 test case)
Change-Id: I697ec1287f2e813b99d95e2855d0184b14eb2783