Bug #3178
closedgbproxy: failed to parse invalid BSSGP-UNITDATA message
0%
Description
Today while checking log output (debug all) of osmo-gbproxy 8343b4adbbe9afd4294232429696ee9736ec1004 in a testing/production set up:
<000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada <000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108) <0012> gprs_llc_parse.c:82 LLC SAPI=9 C U GEA? IOV-UI=0x000000 FCS=0x62a1a7 CMD=UI DATA <0012> gprs_gb_parse.c:408 Got LLC message, CRC: 62a1a7 (computed 62a1a7) <000f> gprs_gb_parse.c:535 LLC: Got TLLI e5225438, BSSGP RAID 274-8-4120-0 <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438) <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada) <000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada <000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108) <000f> gprs_gb_parse.c:535 BSSGP: Got BSSGP RAID 274-8-4120-0, BSSGP PTMSI f3687732, IMSI 274080000004803 <000f> gb_proxy_patch.c:184 Patching BSSGP P-TMSIs: Replacing f3687732 -> e155cfea <000f> gb_proxy.c:1080 NSEI=9108(SGSN) BSSGP PAGING routing by RAI to peer BVCI=9108 <000f> gb_proxy.c:791 NSEI=9108 proxying SGSN->BSS (NS_BVCI=0, NSEI=110) <0012> gprs_llc_parse.c:82 LLC SAPI=1 C U GEA? IOV-UI=0x000000 FCS=0x38436c CMD=UI DATA <0012> gprs_gb_parse.c:408 Got LLC message, CRC: 38436c (computed 38436c) <000f> gb_proxy.c:575 NSEI=110(BSS) patching: failed to parse invalid BSSGP-UNITDATA message <000f> gprs_gb_parse.c:535 BSSGP-UNITDATA: Got TLLI 00000000, BSSGP RAID 274-8-4120-0 <000f> gb_proxy.c:579 NSEI=110(BSS) invalid message was: [L2]> 00 00 23 94 01 e1 55 cf ea 00 00 04 08 88 72 f4 80 10 18 00 9c 40 00 80 0e 00 06 01 c0 41 6c 43 38 <0012> gprs_llc_parse.c:82 LLC SAPI=9 C U GEA? IOV-UI=0x000000 FCS=0xa65d4d CMD=UI DATA <0012> gprs_gb_parse.c:408 Got LLC message, CRC: a65d4d (computed a65d4d) <000f> gprs_gb_parse.c:535 LLC: Got TLLI e5225438, BSSGP RAID 274-8-4120-0 <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438) <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada) <000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada <000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108) <0012> gprs_llc_parse.c:82 LLC SAPI=9 C U GEA? IOV-UI=0x000000 FCS=0x2bbfe2 CMD=UI DATA <0012> gprs_gb_parse.c:408 Got LLC message, CRC: 2bbfe2 (computed 2bbfe2) <000f> gprs_gb_parse.c:535 LLC: Got TLLI e5225438, BSSGP RAID 274-8-4120-0 <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438) <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada) <000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada <000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108) <0012> gprs_llc_parse.c:82 LLC SAPI=9 R U GEA? IOV-UI=0x000000 FCS=0x376834 CMD=UI DATA <0012> gprs_gb_parse.c:408 Got LLC message, CRC: 376834 (computed 376834) <000f> gprs_gb_parse.c:535 LLC: Got TLLI da1d5ada, IMSI 274085000030459 <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada) <000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438) <000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing da1d5ada -> e5225438
So it seems there are some messages we are not handling correctly.
Updated by stsp about 6 years ago
I have written a small test to reproduce the problem: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9500
The parsing failure happens when this part of the message is parsed as TLV type TLV_TYPE_TvLV:
ea 00 00 04 08 88 72 f4 80 10 18 00 9c 40 00 80 0e 00 06 01 c0 41 6c 43 38
libosmocore's TLV parser (src/gsm/tlv_parser.c) ends up "return -2" in line 214 at the end of this block of code:
194 case TLV_TYPE_TvLV: 195 if (*(buf+1) & 0x80) { 196 /* like TLV, but without highest bit of len */ 197 if (buf + 1 > buf + buf_len) 198 return -1; 199 *o_val = buf+2; 200 *o_len = *(buf+1) & 0x7f; 201 len = *o_len + 2; 202 if (len > buf_len) 203 return -2; 204 break; 205 } 206 /* like TL16V, fallthrough */ 207 case TLV_TYPE_TL16V: 208 if (2 > buf_len) 209 return -1; 210 *o_val = buf+3; 211 *o_len = *(buf+1) << 8 | *(buf+2); 212 len = *o_len + 3; 213 if (len > buf_len) 214 return -2;
In gdb, we see:
(gdb) p len $14 = 2187 (gdb) p buf_len $15 = 22
Updated by stsp about 6 years ago
Manually decoded message according to:
ETSI TS 148 018 V10.6.0 (2012 07) 96
3GPP TS 48.018 version 10.6.0 Release 10
Table 10.2.2: UL-UNITDATA PDU content
00 - PDU type UL-UNITDATA (ok) 11.3.35 Temporary logical link Identity (TLLI) 00 - TLLI[0] 23 - TLLI[1] 94 - TLLI[2] 01 - TLLI[3] TLLI == "00239401" e1 - QOS[0] (bit rate MSB) 55 - QOS[1] (bit rate LSB) bit rate = "57685" (57685*100000 bit/s per PBRG) cf - QOS[2] PBRG = 11 (bit rate is expressed in 100000 bit/s increments), C/R 0 (contains LLC ACK/SACK), T 0 (contains signalling), A 1 (radio if uses MAC/UNITDATA, Precedence 111 (reserved value) ea - CELL_ID[0] (TLV IEI: wrong, should be 0x08) 00 - CELL_ID[1] (length 1) 00 - CELL_ID[2] (length 2) lenth == 0 04 -- CELL_ID[3] 08 -- CELL_ID[4] 88 -- CELL_ID[5] 72 -- CELL_ID[6] f4 -- CELL_ID[7] 80 -- CELL_ID[8] 10 -- CELL_DI[9] 18 -- QOSP[0] OoS Profile IEI not allowed in BSSGP Userdata 00 -- QOSP[1] 9c -- QOSP[2] 40 -- QOSP[3] 00 -- QOSP[4] 80 -- IEI for "E-UTRAN Inter RAT Handover Info" not allowed in BSSGP Userdata 0e -- length (14 bytes -- only 8 bytes remain) 00 06 01 c0 41 6c 43 38
EDIT: typos
Updated by stsp about 6 years ago
As best as I can tell this message is invalid.
Apart from tne first byte (and perhaps the TLLI) none of the data makes sense.
It would be nice to get a live pcap of such a message with lower-layer headers so we can tell where it originated. Would it be possible to reproduce this problem and record a pcap file?
Updated by pespin about 6 years ago
We can most probably reproduce it by connecting to any machine currently using osmo-gbproxy in production.
Once we know if this message is uplink or downlink, I guess it's worth looking at related code generating this kind of message in osmo-pcu or osmo-sgsn to see if we can find some code malforming the packet there.
Updated by stsp about 6 years ago
We already know that the message is uplink, based on where it was logged.
EDIT: To be clearer: The message processing failure was logged in gb_proxy.c:575 which is processing BSSGP uplink messages.
Updated by stsp about 6 years ago
Invalid BSSGP messages could slip through Rx in osmo-pcu and libosmocore as well.
See https://gerrit.osmocom.org/#/c/osmo-pcu/+/9728/
and https://gerrit.osmocom.org/#/c/libosmocore/+/9729
Updated by stsp about 6 years ago
I suggest we run another test with the above patches applied and see if more invalid messages show up.
I don't have a test setup ready. Can someone either test this on my behalf or guide me through the process of getting a test setup going, with my patches?
Updated by stsp about 6 years ago
We could also return an error notification to the sender if parsing fails:
https://gerrit.osmocom.org/#/c/libosmocore/+/9730/
Updated by stsp almost 6 years ago
This issue is still waiting for tests. I will need to coordinate with Pau to see how I could run such a test.
Updated by pespin almost 6 years ago
Running some tests with the patches mentioned above already applied we got the following:
<000f> gprs_ns.c:584 NSEI=18033 Tx NS ALIVE_ACK (NSVCI=18033) <0011> gprs_llc_parse.c:81 LLC SAPI=1 C U GEA? IOV-UI=0x000000 FCS=0x63f476 CMD=UI DATA <0011> gprs_gb_parse.c:408 Got LLC message, CRC: 63f476 (computed 63f476) <0011> gprs_gb_parse.c:384 Unhandled GSM 04.08 message type unknown 0x46 for protocol discriminator SM. <000e> gprs_gb_parse.c:535 GMM: Got TLLI f820483d, BSSGP RAID 274-08-8276-0 <000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = f820483d, assigned = 00000000, net_vld = 1, bss_vld = 1}, f820483d) <000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = c5101308, assigned = 00000000, net_vld = 1, bss_vld = 1}, c5101308) <000e> gb_proxy_patch.c:161 Patching TLLIs: Replacing f820483d -> c5101308 <000e> gb_proxy.c:770 NSEI=110 proxying BTS->SGSN (NS_BVCI=18033, NSEI=18033) <0011> gprs_llc_parse.c:81 LLC SAPI=1 R U GEA? IOV-UI=0x000000 FCS=0xc39c48 CMD=UI DATA <0011> gprs_gb_parse.c:408 Got LLC message, CRC: c39c48 (computed c39c48) <0011> gprs_gb_parse.c:384 Unhandled GSM 04.08 message type unknown 0x47 for protocol discriminator SM. <------------------------ SEE THIS LINE <000e> gprs_gb_parse.c:535 GMM: Got TLLI c5101308, IMSI 274080000008245 <000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = c5101308, assigned = 00000000, net_vld = 1, bss_vld = 1}, c5101308) <000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = f820483d, assigned = 00000000, net_vld = 1, bss_vld = 1}, f820483d) <000e> gb_proxy_patch.c:161 Patching TLLIs: Replacing c5101308 -> f820483d
Updated by stsp almost 6 years ago
<0011> gprs_gb_parse.c:384 Unhandled GSM 04.08 message type unknown 0x47 for protocol discriminator SM
I see no evidence that this message is related to the original problem described in this issue. I think we should treat it as a separate problem until there is such evidence.
This message appeared with a Sony Ericsson phone which had trouble getting online.
I tried to find the corresponding frame in wireshark for quite some time yesterday but failed to locate it. I didn't keep a copy of captured frames unfortunately. But I believe I know how to reproduce this.
The message did not appear with a Samsung Galaxy S2 I tried later on.
The original error "gb_proxy.c:575 NSEI=110(BSS) patching: failed to parse invalid BSSGP-UNITDATA message" did not appear with either phone I tried.
Updated by stsp almost 6 years ago
The meaning of message type 0x47 is "DEACT PDP ACK"
From gsm_04_08_gprs.h:
/* Table 10.4a, GPRS Session Management (GSM) */ #define GSM48_MT_GSM_ACT_PDP_REQ 0x41 #define GSM48_MT_GSM_ACT_PDP_ACK 0x42 #define GSM48_MT_GSM_ACT_PDP_REJ 0x43 #define GSM48_MT_GSM_REQ_PDP_ACT 0x44 #define GSM48_MT_GSM_REQ_PDP_ACT_REJ 0x45 #define GSM48_MT_GSM_DEACT_PDP_REQ 0x46 #define GSM48_MT_GSM_DEACT_PDP_ACK 0x47
The function gprs_gb_parse_dtap() doesn't handle this message.
Updated by stsp almost 6 years ago
It would be good to have more information about the conditions in which the original problem occurred. Right now, I see two ways forward:
1) Keep trying various phones in the lab and hope the issue occurs eventually.
2) Wait until someone runs into the problem again in whatever circumstances and then updates this ticket with more information.
Frankly, from my point of view, the quality of bug reports we are filing in Osmocom's tracker is relatively low. I am under the impression that, in general, many bug reports lack descriptions of the environmental conditions a problem was found in. This is especially important information when problems are observed with hardware rather than a reproducible test case. A bug report which contains a single error message is not much to go on. In other projects I am involved in I see issues being closed regularly based on lack of such information.
Updated by laforge almost 6 years ago
On Wed, Aug 15, 2018 at 12:42:41PM +0000, stsp [REDMINE] wrote:
2) Wait until someone runs into the problem again in whatever circumstances and then updates this ticket with more information.
Full ACK to that. Set it to stalled and low priority, or simlpy set it
to feedback and assign to pespin. It's not worth spending too much time
on something that's not obvious to reproduce. We have no shortage of
other issues where we can use our time more productively than hard to
reproduce issues.
Frankly, from my point of view, the quality of bug reports we are
filing in Osmocom's tracker is relatively low. I am under the
impression that, in general, many bug reports lack descriptions of the
environmental conditions a problem was found in. This is especially
important information when problems are observed with hardware rather
than a reproducible test case.
I agree in this case. pespin, please take that to heart, at least when taking
this current bug report as an example. From your report here we don't
see what kind of phone you were using, what you were doing with the
phone at the time, ...
Even if one files a bug report assigned to oneself, chances are high
that 4 months later you don't remember any of the context that's not in
the ticket.
However, there are also plenty of better bug reports in the tracker in general.
I don't see this as a general problem.
Updated by pespin almost 6 years ago
laforge wrote:
On Wed, Aug 15, 2018 at 12:42:41PM +0000, stsp [REDMINE] wrote:
2) Wait until someone runs into the problem again in whatever circumstances and then updates this ticket with more information.
Full ACK to that. Set it to stalled and low priority, or simlpy set it
to feedback and assign to pespin. It's not worth spending too much time
on something that's not obvious to reproduce. We have no shortage of
other issues where we can use our time more productively than hard to
reproduce issues.
Agree. I think we can still look into the error message we got a few days ago which I recorded here and make sure we handle those type of messages, or create a task explaining what's needed. If it's expected that w do nothing with those message, we shouldn't print error messages which distracts people operating the software. stsp you can do that or otherwise re-assign the ticket to me.
I agree in this case. pespin, please take that to heart, at least when taking
this current bug report as an example. From your report here we don't
see what kind of phone you were using, what you were doing with the
phone at the time, ...Even if one files a bug report assigned to oneself, chances are high
that 4 months later you don't remember any of the context that's not in
the ticket.
I saw that message while busy operating the software doing some other work, so I decided to log it here for later investigation, that's why I didn't provide much information on first place.
Updated by stsp almost 6 years ago
I have created #3466 about the mysterious 0x47 message.
I want to add that I didn't mean to specifically point fingers at Pau for lack of details in this report. Filing informative bug reports isn't easy, especially when one can't afford sufficient mental focus while doing so. I've certainly filed my own share of bugs lacking vital information as well :)
Updated by stsp almost 6 years ago
- Status changed from In Progress to Feedback
- Assignee changed from stsp to pespin
Updated by laforge about 4 years ago
- Project changed from OsmoSGSN to osmo-gbproxy
Updated by pespin over 3 years ago
- Status changed from Feedback to Rejected
- It didn't contain enough information, not obvious to reproduce
- We don't have access to the setup anymore
- There's a more specific ticket created from stsp from stuff he found.