Bug #6197
open"Cannot handle SM for unknown MM CTX"
0%
Description
I am observing relatively long PDP Context activation with Sony Ericsson K800i and recent osmo-sgsn:
osmo-sgsn 1.11.0
osmo-pcu 1.3.1.1-c1b0
I don't remember if this was the case before, most likely not.
As can be seen from the attached PCAP, the MS orders a PDP Context activation right after completing the Attach (frame 259):
130 16.160906108 127.0.0.1 → 127.0.0.1 GPRS-LLC 107 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Attach Request 156 16.161190149 127.0.0.1 → 127.0.0.1 GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Identity Request 157 16.797408334 127.0.0.1 → 127.0.0.1 GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 1(DTAP) (GMM) Identity Response 173 16.797533278 127.0.0.1 → 127.0.0.1 GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 1(DTAP) (GMM) Identity Request 181 17.200282825 127.0.0.1 → 127.0.0.1 GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 2(DTAP) (GMM) Identity Response 226 17.225222456 127.0.0.1 → 127.0.0.1 GPRS-LLC 113 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 2(DTAP) (GMM) Attach Accept 239 17.697504097 127.0.0.1 → 127.0.0.1 GPRS-LLC 77 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 3(DTAP) (GMM) Attach Complete 259 17.739056217 127.0.0.1 → 127.0.0.1 GPRS-LLC 136 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 4(DTAP) (SM) Activate PDP Context Request <-- (!) 274 17.739270297 127.0.0.1 → 127.0.0.1 GPRS-LLC 76 SAPI: LLGMM, U, XID 275 17.739280476 127.0.0.1 → 127.0.0.1 GPRS-LLC 75 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Detach Request <-- (!)
The SGSN is responding with GMM Detach Request (frame 275), here is the related logging:
259 17.739056217 127.0.0.1 → 127.0.0.1 GPRS-LLC 136 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 4(DTAP) (SM) Activate PDP Context Request 260 17.739109847 127.0.0.1 → 127.0.5.1 GSMTAP 180 NSE(00101)-NSVC(00101) Rx NS-UNITDATA 261 17.739128332 127.0.0.1 → 127.0.5.1 GSMTAP 263 GPRS-NS2-VC(UDP-NSE00101-NSVC00101-0_0_0_0:23000-127_0_0_1:23023)[0x55e69ba253d0]{UNBLOCKED}: Received Event RX-UNITDATA 262 17.739139873 127.0.0.1 → 127.0.5.1 GSMTAP 180 NSE(00101)-NSVC(00101) Rx NS-UNITDATA 263 17.739148439 127.0.0.1 → 127.0.5.1 GSMTAP 183 BSSGP TLLI=0x85c79efb Rx UPLINK-UNITDATA 264 17.739181411 127.0.0.1 → 127.0.5.1 GSMTAP 236 LLME(ffffffff/85c79efb){UNASSIGNED} LLC RX: unknown TLLI 0x85c79efb, creating LLME on the fly 265 17.739188394 127.0.0.1 → 127.0.5.1 GSMTAP 193 LLC SAPI=1 C U GEA0 IOV-UI=0x000000 FCS=0x3ef88c 266 17.739193033 127.0.0.1 → 127.0.5.1 GSMTAP 149 CMD=UI 267 17.739196720 127.0.0.1 → 127.0.5.1 GSMTAP 147 DATA 268 17.739200627 127.0.0.1 → 127.0.5.1 GSMTAP 143 269 17.739212048 127.0.0.1 → 127.0.5.1 GSMTAP 214 LLME(ffffffff/85c79efb){UNASSIGNED} Cannot handle SM for unknown MM CTX 270 17.739224792 127.0.0.1 → 127.0.5.1 GSMTAP 198 LLME(ffffffff/85c79efb){UNASSIGNED} LLGM Reset (SAPI=1) 271 17.739243207 127.0.0.1 → 127.0.5.1 GSMTAP 180 NSE(00101)-NSVC(00101) Tx NS-UNITDATA 272 17.739252294 127.0.0.1 → 127.0.5.1 GSMTAP 215 <- GMM DETACH REQ (type: re-attach required, cause: Implicitly detached) 273 17.739257293 127.0.0.1 → 127.0.5.1 GSMTAP 180 NSE(00101)-NSVC(00101) Tx NS-UNITDATA 274 17.739270297 127.0.0.1 → 127.0.0.1 GPRS-LLC 76 SAPI: LLGMM, U, XID 275 17.739280476 127.0.0.1 → 127.0.0.1 GPRS-LLC 75 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Detach Request
The MS repeats the request again (frame 517) 30 seconds after the first attempt, and finally gets a PDP Context activated.
The key difference between frames 259 (first attempt) and 517 (second attempt) is TLLI indicated in the BSSGP header.
Files
Updated by pespin 7 months ago
- Status changed from New to Feedback
- Assignee set to fixeria
fixeria the main problem to me seems to be that the MS keeps using the same TLLI it was using before the GMM Attach also after the GMM Attach procedure has finished. The SM Activate PDP Ctx Req is transmitted in BSSGP with the old TLLI=0x85c79efb.
See how MS signals the up-to-then-current TLLI=0x85c79efb and PTMSI=0xc5c79efb during GMM Attach Req in frame_nr 130.
Then, SGSN assigns a new PTMSI=0xe7ba3193 in GMM Attach Accept in frame_nr 226.
When MS confirms the Attach wtih GMM Attach Accept, you can see the SGSN applying the derived TLLI from the PTMSI which was generated during GMM Attach Accept:
253 14:15:03.476231029 Sep 28, 2023 16:15:03.476231029 CEST 127.0.0.1 42590 127.0.5.1 4729 GSMTAP 215 LLME(85c79efb/e7ba3193){ASSIGNED} LLGM Assign pre (ffffffff => e7ba3193) 254 14:15:03.476234726 Sep 28, 2023 16:15:03.476234726 CEST 127.0.0.1 42590 127.0.5.1 4729 GSMTAP 216 LLME(ffffffff/e7ba3193){ASSIGNED} LLGM Assign post (ffffffff => e7ba3193)
So from that point on, osmo-sgsn doesn't longer expect TLLI=85c79efb previously used at the start of the Attach procedure.
However, in the followup SM Activate PDP Context Req, the MS still uses that one instead of the new one, and that makes it fail:
264 14:15:03.517812062 Sep 28, 2023 16:15:03.517812062 CEST 127.0.0.1 42590 127.0.5.1 4729 GSMTAP 236 LLME(ffffffff/85c79efb){UNASSIGNED} LLC RX: unknown TLLI 0x85c79efb, creating LLME on the fly 269 14:15:03.517842699 Sep 28, 2023 16:15:03.517842699 CEST 127.0.0.1 42590 127.0.5.1 4729 GSMTAP 214 LLME(ffffffff/85c79efb){UNASSIGNED} Cannot handle SM for unknown MM CTX
We'd need to check a couple things here:
- The TLLI is in BSSGP, which means it's filled in by osmo-pcu. So it may be a problem at osmo-pcu not using the new TLLI when needed.
- We need to check in the specs the exact moment where the old TLLI cannot be used anymore from the MS/PCU.
For the 1st one, we probably need a new pcap which also includes osmo-pcu RLC/MAC. I personally use the following:
pcu gsmtap-remote-host 192.168.30.1 gsmtap-category dl-unknown gsmtap-category dl-ctrl gsmtap-category dl-data-gprs gsmtap-category dl-data-egprs gsmtap-category dl-agch gsmtap-category dl-pch gsmtap-category ul-unknown gsmtap-category ul-ctrl gsmtap-category ul-data-gprs gsmtap-category ul-data-egprs gsmtap-category ul-rach
in osmo-bts:
gsmtap-remote-host 127.0.0.1 gsmtap-sapi enable-all no gsmtap-sapi pdtch no gsmtap-sapi ptcch no gsmtap-sapi pacch
Updated by pespin 7 months ago
3GPP TS 24.008 4.7.3.1.3 GPRS attach accepted by the network:
The P-TMSI reallocation may be part of the GPRS attach procedure. When the ATTACH REQUEST includes the IMSI or IMEI, the SGSN shall allocate the P-TMSI. The P-TMSI that shall be allocated is then included in the ATTACH ACCEPT message together with the routing area identifier. The network shall, in this case, change to state GMM- COMMON-PROCEDURE-INITIATED and shall start timer T3350 as described in subclause 4.7.6. Furthermore, the network may assign a P-TMSI signature for the GMM context which is then also included in the ATTACH ACCEPT message. [...] If the message contains a P-TMSI, the MS shall use this P-TMSI as the new temporary identity for GPRS services. In this case, an ATTACH COMPLETE message is returned to the network. The MS shall delete its old P-TMSI and shall store the new one. If no P-TMSI has been included by the network in the ATTACH ACCEPT message, the old P-TMSI, if any available, shall be kept. If the message contains a P-TMSI signature, the MS shall use this P-TMSI signature as the new temporary signature for the GMM context. The MS shall delete its old P-TMSI signature, if any is available, and shall store the new one. If the message contains no P-TMSI signature, the old P-TMSI signature, if available, shall be deleted.
So from that spec fragment, I think it's clear the PCU/MS is misbheaving.
We need a pcap with RLC/MAC to find out where the problem is.
Updated by pespin 7 months ago
4.7.3.1.6 Abnormal cases on the network side The following abnormal cases can be identified: a) Lower layer failure If a low layer failure occurs before the message ATTACH COMPLETE has been received from the MS and a new P-TMSI (or a new P-TMSI and a new P-TMSI signature) has been assigned, the network shall consider both the old and new P-TMSI each with its corresponding P-TMSI-signature as valid until the old P-TMSI can be considered as invalid by the network (see subclause 4.7.1.5) or the GMM context which has been marked as detached in the network is released, and shall not resent the message ATTACH ACCEPT. During this period the network may: - use the identification procedure followed by a P-TMSI reallocation procedure if the old P-TMSI is used by the MS in a subsequent message.
Updated by fixeria 7 months ago
- File osmo_sgsn_k800i_pdp_ctx_rlcmac.pcapng.gz osmo_sgsn_k800i_pdp_ctx_rlcmac.pcapng.gz added
- Priority changed from Normal to Low
For the record, the firmware version is R1GP001 (prgCXC1250210_GENERIC_WI).
I found a switch in the settings menu ("Connectivity" -> "Data communication" -> "Preferred service"), which controls whether the MS is staying GMM-attached all time ("PS and CS") or doing the GMM attach/detach every time a PDP context is activated/deactivated ("CS only"). The problem was observed with the "CS only", and is gone after a switched to "PS and CS". The phone is now always GMM-attached and PDP Context activation works fine. Thus lowering the priority.
pespin please find a PCAP with RLC/MAC traces attached.
Updated by fixeria 7 months ago
- Assignee changed from fixeria to pespin
The RLC/MAC traces reveal several interesting things:
- When sending
GMM Attach Complete
, the phone is using CS-4 on the Uplink; - Frame 493 (RLC/MAC UL DATA) carries not one, but two LLC segments:
- the first segment (8 bytes,
01c00d0803551cea
) containing the Attach Accept message; - the second segment (41 bytes) is a part of the PDP Context Activation request;
- the first segment (8 bytes,
- Frame 499 (RLC/MAC UL DATA) carries another LLC segment, containing the remaining 26 bytes.
So what the phone is doing is actually re-using the same TBF to initiate the SM procedure. And the PCU is just including TLLI of that same TBF.
Updated by pespin 7 months ago
fixeria agreeing with your analysis.
There's no way that the PCU can get to know the new TLLI in a reasonable way in that scenario, because it is not going through contention resolution.
I'd say that's actually a bug in the MS stack, though it's difficult to say. It shouldn't be requesting a new TBF if the TLLI changes, or at least without providing the new TLLI when using the new TBF. We may want to check TS 44.060 on the matter.
Updated by pespin 7 months ago
TS 44.060 5.5.1.8 TLLI management:
After contention resolution the mobile station shall apply new TLLI in RLC/MAC control block if the mobile has received a new P-TMSI.
The GMM Attach Accept is sent in gsm fn=348625, pcap frame_nr=404.
MS sends UL CTRL blocks in:
428 15:37:49.623898459 Sep 28, 2023 17:37:49.623898459 CEST 127.0.0.1 34344 127.0.1.3 4729 GSM RLC/MAC 81 GPRS UL CTRL: PACKET_DOWNLINK_ACK_NACK 437 15:37:49.665406374 Sep 28, 2023 17:37:49.665406374 CEST 127.0.0.1 34344 127.0.1.3 4729 GSM RLC/MAC 81 GPRS UL CTRL: PACKET_CONTROL_ACKNOWLEDGEMENT 442 15:37:49.684257074 Sep 28, 2023 17:37:49.684257074 CEST 127.0.0.1 34344 127.0.1.3 4729 GSM RLC/MAC 81 GPRS UL CTRL: PACKET_DOWNLINK_ACK_NACK
The 2 DL_ACK_NACK blocks contain no TLLI (not even sure if they can contain a TLLI field, need to check TS 44.060).
The PKT CONTROL ACK in frame_nr 437 contains a TLLI=0xb9518db1, which IIUC is the old one?
So according to the specs above the MS should have sent the new TLLI instead. (Another topic is whether we'd update it correctly in osmo-pcu if it had sent the new one, needs to be checked).
Updated by pespin 7 months ago
we do update the TLLI in that scenario in osmo-pcu already, so if MS would have sent it the new TLLI, then we'd have updated it properly AFAICT:
void gprs_rlcmac_pdch::rcv_control_ack(Packet_Control_Acknowledgement_t *packet, uint32_t fn) { ... uint32_t tlli = packet->TLLI; ... ms_update_announced_tlli(tbf->ms(), tlli); /* Gather MS from TBF again, since it may be NULL or may have been merged during ms_update_announced_tlli */ ms = tbf->ms();
Updated by fixeria 7 months ago
fixeria wrote in #note-11:
It needs to be checked though if the TEMS firmware exhibits the same behavior when operating in the "CS only" mode.
TEMS firmware (CXC1722434_TEMS R2B) for K800i does exhibit the same behavior in "CS only".
This is good news, because we can get the MS side packet traces.
Updated by fishpike about 2 months ago
I observe same situation in my setup. Cannot activate PDP context because of missing RA-CAPABILTIY-UPDATE and RA-CAPABILITY-ACK support in osmoSGSN.
My setup is Samsung Galaxy S9 and S23, real BTS/BSC from well-known vendor and Osmo core (SGSN/GGSN/HLR/MSC/STP) (Important thing that I'm not using osmo-pcu).
Problem description:
S9 attach procedure (CS+PS):
1.After UE reboot I see MM procedure Location request and Location update processed good (on DMtool). Then if needed CS calling is working properly(connection to osmoMSC).
2.After that UE is trying GMM Attach request and it got Attach accept and attach complete.
3.Then UE send Activate PDP context, but SGSN is reject this with cause:
"Cannot handle SM for unknown MM CTX"
4.After that UE is trying activate PDP context for second time (After timer expired) and finally it got pdp accept (ping/traffic work) same as in this thread. After deep inspection I found that reason of first pdp activation reject is that TLLI is not updated (SGSN dont know it and causing Cannot handle SM for unknown MM CTX) because when I checked pcap from bssgp I see that BSC send RA-CAPABILTIY-UPDATE with new tlli and osmoSGSN send response Unknown PDU with protocol unspecified but should send RA-CAPABILTIY-UPDATE-ACK. Other thing I dont understand that in msg comes from bsc on bvci 127 but respond goes to bvci 0 (but I didnt define it as bsc-sgsn link use only bvci 127 not bvci 0).Also I think sig bvci 0 is not changeable to bvci 127 on osmoSGSN.
To summarize attach procedure is long it takes 30-50 seconds(because of this first reject).
S23 attach
Same as S9 but after first fail it not doing second attempt and stuck without PS forever in CS domain. Its not problem with MS stack.
I tried a lot of workarounds to make s23 attach but no success. Changing to PSonly not work, AT commands not work. Also tried with Huawei E3372,E3131 but same behaviour.
I connect this setup to other 2Gcore-simulator(not opensource) to compare and its processed normally for both UEs and modems,it got pdp contexts after 6-8 seconds. Difference is that here I got message RA-CAPABILTIY-UPDATE/RA-CAPABILITY-ACK because got real BSC not osmo-pcu.
When I look into code I cannot see RA-CAPABILTIY-UPDATE and RA-CAPABILITY-ACK support so my questions are:
1.Is this RA-CAPABILTIY-UPDATE and RA-CAPABILITY-ACK supported in osmoSGSN and is it plan to do that?
2.Is any other workaround (changing timers,tmsi assignment or whatever to change in bsc) in configuration to make S23/modems attached?Theoretically if UE can set PSonly and AT command to activate PDP context it should work but I never make it happened this AT commands on my terminals.
3.I can provide both pcaps and UE logs to see whats missing if it helps to make add it.
4.Is it possible to add this two messages to support to osmoSGSN code?I can test patch if you provide it.
Updated by laforge about 2 months ago
fishpike wrote in #note-13:
can you please provide a matching set ofProblem description: [...]
- pcap files for the Gb interface traffic between osmo-pcu and osmo-sgsn
- osmo-pcu log output
- osmo-sgsn log output
all from the same time, showing your test procedure?
If you'd like, you can configure GSMTAP logging and will get all three in one pcap file.