Project

General

Profile

Actions

Bug #3727

closed

SGSN segfaults on network type change

Added by manatails over 5 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
12/12/2018
Due date:
% Done:

100%

Spec Reference:

Description

When the phone changes its network type between GSM and UMTS osmo-sgsn crashes with the following log:

<0012> gprs_llc_parse.c:81 LLC SAPI=1 C U GEA0 IOV-UI=0x000000 FCS=0x760d06 CMD=UI DATA
<0002> gprs_gmm.c:1609 -> GMM RA UPDATE REQUEST type="RA updating"
<0002> gprs_gmm.c:1685 MM Looked up by matching TLLI and P_TMSI. BSSGP TLLI: b99cab1e, P-TMSI: f99cab1e (00000000), TLLI: 00000000 (00000000), RA: 450-09-1-1

Program received signal SIGSEGV, Segmentation fault.
0x0000000000409667 in gsm48_gmm_authorize (ctx=0x758600) at gprs_gmm.c:1051
1051 if (ctx->ran_type == MM_CTX_T_UTRAN_Iu && !ctx->iu.ue_ctx->integrity_active) {
(gdb)


Related issues

Related to OsmoSGSN - Bug #3995: OsmoSGSN doesn't close SCCP connection after successful LU over IuPSClosedlynxis05/10/2019

Actions
Related to OsmoSGSN - Bug #1977: 3G IuPS is unreliableClosedlynxis03/09/2017

Actions
Actions #1

Updated by manatails over 5 years ago

ctx->iu.ue_ctx is null at the time of crash

Actions #2

Updated by laforge almost 5 years ago

  • Assignee set to lynxis
Actions #3

Updated by lynxis almost 5 years ago

Can you create a backtrace when this problem happens (gdb cli: bt). It would be also nice if you can provide a pcap trace.
I would guess this problem happens when a MS/UE moves from 3G to 2G. Not sure if the SGSN also crashs the other way around :).

I should write a TTCN-3 test first to cover this.

Actions #4

Updated by laforge almost 5 years ago

Actions #5

Updated by manatails almost 5 years ago

lynxis wrote:

Can you create a backtrace when this problem happens (gdb cli: bt). It would be also nice if you can provide a pcap trace.
I would guess this problem happens when a MS/UE moves from 3G to 2G. Not sure if the SGSN also crashs the other way around :).

I should write a TTCN-3 test first to cover this.

Program received signal SIGSEGV, Segmentation fault.
0x000000000040ad17 in gsm48_gmm_authorize (ctx=0x764350) at gprs_gmm.c:1058
1058            if (ctx->ran_type == MM_CTX_T_UTRAN_Iu && !ctx->iu.ue_ctx->integrity_active) {
(gdb) bt
#0  0x000000000040ad17 in gsm48_gmm_authorize (ctx=0x764350) at gprs_gmm.c:1058
#1  0x000000000040b6c5 in gsm48_rx_gmm_ra_upd_req (mmctx=0x764350, mmctx@entry=0x0, msg=msg@entry=0x760690, llme=llme@entry=0x762430) at gprs_gmm.c:1800
#2  0x000000000040c45e in gsm0408_rcv_gmm (mmctx=mmctx@entry=0x0, msg=msg@entry=0x760690, llme=llme@entry=0x762430, drop_cipherable=drop_cipherable@entry=false) at gprs_gmm.c:2008
#3  0x000000000040d352 in gsm0408_gprs_rcvmsg_gb (msg=msg@entry=0x760690, llme=0x762430, drop_cipherable=drop_cipherable@entry=false) at gprs_gmm.c:2933
#4  0x000000000041c10b in gprs_llc_rcvmsg (msg=0x760690, tv=<optimized out>) at gprs_llc.c:997
#5  0x0000000000415ead in bssgp_prim_cb (oph=oph@entry=0x1, ctx=ctx@entry=0x0) at sgsn_main.c:125
#6  0x00007ffff7758ec0 in bssgp_rx_ul_ud (ctx=<optimized out>, ctx=<optimized out>, tp=<optimized out>, msg=<optimized out>) at gprs_bssgp.c:414
#7  bssgp_rx_ptp (bctx=<optimized out>, tp=<optimized out>, msg=<optimized out>) at gprs_bssgp.c:873
#8  bssgp_rcvmsg (msg=0x760690) at gprs_bssgp.c:1096
#9  0x00007ffff7752cea in gprs_ns_rx_unitdata (msg=0x760690, nsvc=0x761380) at gprs_ns.c:1139
#10 gprs_ns_process_msg (nsi=nsi@entry=0x73a040, msg=msg@entry=0x760690, nsvc=nsvc@entry=0x7fffffffe260) at gprs_ns.c:1774
#11 0x00007ffff775482a in gprs_ns_rcvmsg (nsi=nsi@entry=0x73a040, msg=msg@entry=0x760690, saddr=saddr@entry=0x7fffffffe2c0, ll=ll@entry=GPRS_NS_LL_UDP) at gprs_ns.c:1523
#12 0x00007ffff7754995 in handle_nsip_read (bfd=0x73a070) at gprs_ns.c:1989
#13 nsip_fd_cb (bfd=0x73a070, what=1) at gprs_ns.c:2022
#14 0x00007ffff7303e37 in osmo_fd_disp_fds (_eset=0x7fffffffe430, _wset=0x7fffffffe3b0, _rset=0x7fffffffe330) at select.c:223
#15 osmo_select_main (polling=polling@entry=0) at select.c:263
#16 0x0000000000405097 in main (argc=2, argv=<optimized out>) at sgsn_main.c:524
(gdb)

Sorry for late reply,

Here is the backtrace took when going from 3G->2G.

Moving from 2G-3G causes the crash as well.

Program received signal SIGSEGV, Segmentation fault.
gsm48_parse_ra (raid=raid@entry=0x7636c8, buf=buf@entry=0x0) at gsm48.c:788
788     {
(gdb) bt
#0  gsm48_parse_ra (raid=raid@entry=0x7636c8, buf=buf@entry=0x0) at gsm48.c:788
#1  0x00007ffff7758639 in bssgp_parse_cell_id (raid=raid@entry=0x7636c8, buf=0x0) at gprs_bssgp.c:239
#2  0x000000000040b705 in gsm48_rx_gmm_ra_upd_req (mmctx=0x763670, mmctx@entry=0x0, msg=msg@entry=0x767d60, llme=llme@entry=0x0) at gprs_gmm.c:1756
#3  0x000000000040c45e in gsm0408_rcv_gmm (mmctx=0x0, msg=0x767d60, llme=0x0, drop_cipherable=<optimized out>) at gprs_gmm.c:2008
#4  0x00007ffff60b2e1e in ranap_handle_co_initial_ue (ies=<optimized out>, ctx=0x7fffffffdf80) at iu_client.c:373
#5  cn_ranap_handle_co_initial (ctx=0x7fffffffdf80, message=<optimized out>) at iu_client.c:517
#6  0x00007ffff60b1908 in ranap_cn_rx_co (cb=cb@entry=0x7ffff60b2af0 <cn_ranap_handle_co_initial>, ctx=ctx@entry=0x7fffffffdf80, data=<optimized out>, len=<optimized out>) at ranap_common_cn.c:307
#7  0x00007ffff60b379a in sccp_sap_up (oph=0x766f98, _scu=0x760550) at iu_client.c:803
#8  0x00007ffff73086b7 in _osmo_fsm_inst_dispatch (fi=0x766cf0, event=5, data=data@entry=0x7611d0, file=file@entry=0x7ffff639711d "sccp_scoc.c", line=line@entry=1677) at fsm.c:818
#9  0x00007ffff6387059 in sccp_scoc_rx_from_scrc (inst=inst@entry=0x760350, xua=xua@entry=0x7611d0) at sccp_scoc.c:1677
#10 0x00007ffff638444a in scrc_node_6 (inst=inst@entry=0x760350, xua=xua@entry=0x7611d0, called=0x7fffffffe130, called=0x7fffffffe130) at sccp_scrc.c:348
#11 0x00007ffff6384b8d in scrc_rx_mtp_xfer_ind_xua (inst=inst@entry=0x760350, xua=0x7611d0) at sccp_scrc.c:468
#12 0x00007ffff6387c45 in mtp_user_prim_cb (oph=0x765d58, ctx=0x760350) at sccp_user.c:176
#13 0x00007ffff637fbff in m3ua_rx_xfer (xua=0x760b40, asp=0x75f030) at m3ua.c:586
#14 m3ua_rx_msg (asp=asp@entry=0x75f030, msg=msg@entry=0x764fe0) at m3ua.c:739
#15 0x00007ffff638e30b in xua_cli_read_cb (conn=0x75fe70) at osmo_ss7.c:1650
#16 0x00007ffff50bede3 in osmo_stream_cli_read (cli=0x75fe70) at stream.c:213
#17 osmo_stream_cli_fd_cb (ofd=<optimized out>, what=1) at stream.c:297
#18 0x00007ffff7303e37 in osmo_fd_disp_fds (_eset=0x7fffffffe430, _wset=0x7fffffffe3b0, _rset=0x7fffffffe330) at select.c:223
#19 osmo_select_main (polling=polling@entry=0) at select.c:263
#20 0x0000000000405097 in main (argc=2, argv=<optimized out>) at sgsn_main.c:524
(gdb)

2G->3G backtrace
Actions #6

Updated by manatails almost 5 years ago

Actions #7

Updated by laforge over 4 years ago

  • Priority changed from Normal to High
Actions #8

Updated by lynxis over 4 years ago

I've tested the ttcn3 branch laforge/iu, rebased it and pushed to gerrit as lynxis/sgsn_iu
Next step is writing a testcase:
  • do a gmm attach vi geran
  • do a LU via iups
Actions #9

Updated by lynxis over 4 years ago

  • Status changed from New to In Progress
Actions #10

Updated by lynxis over 4 years ago

GMM Attach via Iu succeed (with a connection close patch).

Actions #11

Updated by lynxis over 4 years ago

I could reproduce the crash geran -> utran

Actions #12

Updated by fixeria over 4 years ago

IMHO, gsm48_rx_gmm_ra_upd_req() needs to be refactored. It does not check whether the received msgb actually contains any data (like we do in OsmoMSC), so sending an incorrect / incomplete message would crash OsmoSGSN.

Actions #13

Updated by lynxis over 4 years ago

  • Related to Bug #3995: OsmoSGSN doesn't close SCCP connection after successful LU over IuPS added
Actions #14

Updated by laforge over 4 years ago

is there any status update on this one? How did you handle this at CCCamp2019? I think this is a rather important bug to resolve, if possible without rewriting all of the SGSN :)

Actions #15

Updated by pespin over 4 years ago

I think I saw some related patches in osmo-sgsn.git branch "cccamp2019" and they will be submitted soon, probably after refactoring patches are merged.

Actions #16

Updated by lynxis over 4 years ago

Actions #17

Updated by lynxis over 4 years ago

  • % Done changed from 0 to 50
Actions #18

Updated by lynxis over 4 years ago

  • Related to Bug #1977: 3G IuPS is unreliable added
Actions #19

Updated by laforge about 4 years ago

  • Status changed from In Progress to Stalled
  • Assignee changed from lynxis to 4368
Actions #20

Updated by pespin about 4 years ago

Hi lynxis , can you write a short summary on the status of what you did here?

From what I understand, all patches you shared link with are merged except this one which is not ready yet:
https://gerrit.osmocom.org/c/osmo-sgsn/+/15487

You added SGSN_Tests_Iu to osmo-ttcn3-hacks.git, and it currently is running only one test (TC_iu_attach) in jenkins dockerized setup, which is passing fine.

I see then that there's also more tests not enabled by default in SGSN_Tests_Iu.ttcn's control(), which probably are the tests you were using to test the patch that was yet not merged and which is expected to support ran swapping:
TC_iu_attach_geran_rau
TC_geran_attach_iu_rau

Can you share your thoughts if I'm missing something?

Actions #21

Updated by daniel about 4 years ago

I looked at the patches mentioned and I think the one that would fix this issue is:

https://gerrit.osmocom.org/c/osmo-sgsn/+/15487 which is still WIP

Actions #22

Updated by lynxis about 4 years ago

pespin The only parts missing, is resolv your comments and take a look if the outcommend test cases now succeed.

Actions #23

Updated by daniel about 4 years ago

  • Status changed from Stalled to In Progress
  • Assignee changed from 4368 to daniel

Okay, I cherry-picked it to master and am testing with the disabled tests now. Will look at the review comments as well.

Actions #24

Updated by daniel about 4 years ago

Some improvements - the SGSN doesn't crash anymore and one of the tests passes:

  <testcase classname='SGSN_Tests_Iu' name='TC_iu_attach_geran_rau' time='2.081220'/>
  <testcase classname='SGSN_Tests_Iu' name='TC_geran_attach_iu_rau' time='2.093987'>
    <error type='DTE'></error>
  </testcase>
Actions #25

Updated by daniel about 4 years ago

  • % Done changed from 50 to 60

Both tests pass sometimes. The failure seems to be an issue with shutting down the test. Sometimes the RAU Accept is being sent on a closed port/connection after the test has passed. It seems when getting a SecurityModeCmd the function f_routing_area_update doesn't wait for the RAU accept, but returns after receiving the CommonId.

16:30:54.720097 41 SGSN_Tests_Iu.ttcn:70 setverdict(pass): none -> pass
[...]
16:30:54.724956 36 RAN_Emulation.ttcnpp:608 Sent on CLIENT to TC_geran_attach_iu_rau(41) @RAN_Emulation.PDU_DTAP_PS_MT : { dlci := '00'O, dtap := { discriminator := '1000'B, tiOrSkip := { skipIndicator := '0000'B }, msgs := { gprs_mm := { routingAreaUpdateAccept := { messageType := '00001001'B, forceToStandby := { forceToStandbyValue := '000'B, spare := '0'B }, updateResult := { valueField := '000'B, fOP_l3 := '0'B }, raUpdateTimer := { timerValue := '01010'B, unit := '001'B }, routingAreaId := { mccDigit1 := '2'H, mccDigit2 := '6'H, mccDigit3 := '2'H, mncDigit3 := 'F'H, mncDigit1 := '4'H, mncDigit2 := '2'H, lac := '334F'O ("3O"), rac := '00'O }, ptmsiSignature := omit, allocatedPTMSI := { elementIdentifier := '0011000'B, spare1 := '0'B, mobileIdentityLV := { lengthIndicator := 5, mobileIdentityV := { typeOfIdentity := '100'B, oddEvenInd_identity := { tmsi_ptmsi := { oddevenIndicator := '0'B, fillerDigit := '1111'B, octets := 'CE24AD0C'O } } } } }, msIdentity := omit, receiveNPDUNumbers := omit, readyTimer := { elementIdentifier := '17'O, gprsTimerV := { timerValue := '10110'B, unit := '000'B } }, gmmCause := omit, t3302 := omit, cellNotification := omit, equivalentPLMNs := omit, pdpContextStatus := omit, networkFeatureSupport := omit, emergencyNumberList := omit, mBMS_ContextStatus := omit, requestedMSInformation := omit, t3319 := omit, t3323 := omit, t3312_ExtendedValue := omit, additionalNetworkFeatureSupport := omit, t3324 := omit, extendedDRXParameters := omit } } } } }
16:30:54.725344 36 RAN_Emulation.ttcnpp:608 Dynamic test case error: Sending data on the connection of port CLIENT to 41:BSSAP failed. (Broken pipe)
16:30:54.725393 36 RAN_Emulation.ttcnpp:608 setverdict(error): none -> error

See change https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/16983 for a fix

Actions #26

Updated by daniel about 4 years ago

  • % Done changed from 60 to 80

A bit back and forth about the proper way to address this issue, but there is progress.

See
https://gerrit.osmocom.org/q/topic:%22OS%25233727%22+(status:open%20OR%20status:merged)
for a list of changes related to this issue

Actions #27

Updated by daniel about 4 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

The important patches, notably https://gerrit.osmocom.org/c/osmo-sgsn/+/15487 got merged.

The one remaining change https://gerrit.osmocom.org/c/osmo-sgsn/+/17080 is not really relevant to this segfault, so closing this issue.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)