https://osmocom.org/https://osmocom.org/favicon.ico?16647414092019-04-24T12:24:35ZOpen Source Mobile CommunicationsOsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141222019-04-24T12:24:35Zkeith
<ul></ul><p>Maybe helps:</p>
<pre>
(gdb) p *(struct gprs_sndcp_comp *)llme->comp->data
$113 = {
list = {
next = 0x5555559f9b90,
prev = 0x5555559f9b90
},
entity = 128,
nsapi_len = 0 '\000',
nsapi = "\000\000\000\060\001\000\000\000\000\000",
comp_len = 0 '\000',
comp = '\000' <repeats 15 times>, "\060",
algo = {
pcomp = (RFC_2507 | unknown: 21844),
dcomp = (V44 | unknown: 21844)
},
compclass = SNDCP_XID_VERSION_NUMBER,
state = 0x0
}
(gdb) p *(struct gprs_sndcp_comp *)llme->comp->proto
$114 = {
list = {
next = 0x5555559b2e10,
prev = 0x5555559b2e10
},
entity = 0,
nsapi_len = 240 '\360',
nsapi = "\001\000\000\220\233\237UUU\000",
comp_len = 144 '\220',
comp = "\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000",
algo = {
pcomp = RFC_1144,
dcomp = V42BIS
},
compclass = 1436523920,
state = 0x0
}
(gdb)
</pre> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141242019-04-24T13:14:16Zlaforge
<ul></ul> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141482019-04-26T20:06:46Zkeith
<ul></ul><p>with commit aed46ec97d22c1eccae3578c0010840c3acede0e</p>
<pre>
==13652== Invalid read of size 4
==13652== at 0x127428: _bssgp_tx_dl_ud (gprs_llc.c:375)
==13652== by 0x127428: gprs_llc_tx_ui (gprs_llc.c:809)
==13652== by 0x113047: mmctx_timer_cb (gprs_gmm.c:2160)
==13652== by 0x56F1525: osmo_timers_update (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.0.1)
==13652== by 0x56F1D99: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.0.1)
==13652== by 0x10F9E6: main (sgsn_main.c:524)
==13652== Address 0x9aa8454 is 116 bytes inside a block of size 4,408 free'd
==13652== at 0x4C29E90: free (vg_replace_malloc.c:473)
==13652== by 0x7055522: _talloc_free (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.2)
==13652== by 0x12769F: llme_free (gprs_llc.c:580)
==13652== by 0x12769F: gprs_llgmm_assign (gprs_llc.c:1084)
==13652== by 0x115FF0: gsm48_rx_gmm_ra_upd_req (gprs_gmm.c:1811)
==13652== by 0x1170A5: gsm0408_rcv_gmm (gprs_gmm.c:2007)
==13652== by 0x118021: gsm0408_gprs_rcvmsg_gb (gprs_gmm.c:2932)
==13652== by 0x12710C: gprs_llc_rcvmsg (gprs_llc.c:997)
==13652== by 0x52B2F5A: bssgp_rcvmsg (in /usr/lib/x86_64-linux-gnu/libosmogb.so.6.2.0)
==13652== by 0x52ACE39: gprs_ns_process_msg (in /usr/lib/x86_64-linux-gnu/libosmogb.so.6.2.0)
==13652== by 0x52AE7F9: gprs_ns_rcvmsg (in /usr/lib/x86_64-linux-gnu/libosmogb.so.6.2.0)
==13652== by 0x52AE973: nsip_fd_cb (in /usr/lib/x86_64-linux-gnu/libosmogb.so.6.2.0)
==13652== by 0x56F1DD3: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.0.1)
</pre> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141492019-04-27T09:21:20Zpespin
<ul></ul><p>So the problem is basically way down the stack (gsm0408_gprs_rcvmsg_gb) llme is assigned to mmctx->gb.llme, and then llme pointer is passed through all the stack as a parameter. Way up the stack, in gsm48_rx_gmm_ra_upd_req()=>gprs_llgmm_unassign()=>gprs_llgmm_assign(), function llme_free() is called on parameter pointer llme, but mmctx->gb.llme is not set to NULL. As a result, when mmctx->gb.llme is later used after a timer triggers, it will use alreadyfreed memory.</p>
Several problems I see:
<ul>
<li>gprs_llgmm_assign(llme) frees llme only on one code path inside it, and it's impossible to distinguish it because it doesn't return a specific code in that code path. So there's no way currently for the caller to know whether llme is freed or not after calling the function. IMHO llme should probably not be freed there and just return a specific return code and freed in the caller.</li>
<li>In the stack trace where gprs_llgmm_assign(llme) is called (gsm48_rx_gmm_ra_upd_req()=>gprs_llgmm_unassign()=>gprs_llgmm_assign()), mmctx is NULL (it was assigned NULL a bit upwards in gsm48_rx_gmm_ra_upd_req), so we cannot set mmctx->gb.llme=NULL. Thus, this assignment must be done before we do mmctx=NULL.</li>
</ul>
<p>Following patch is proposed which seems to fix the invalid read/crash for now, but probably bigger refactoring of the code is needed to avoid other kind of issues or incorrect logic:</p>
<pre>
diff --git a/src/gprs/gprs_gmm.c b/src/gprs/gprs_gmm.c
index 358bff90..f8bc80cf 100644
--- a/src/gprs/gprs_gmm.c
+++ b/src/gprs/gprs_gmm.c
@@ -1727,6 +1727,9 @@ static int gsm48_rx_gmm_ra_upd_req(struct sgsn_mm_ctx *mmctx, struct msgb *msg,
"The MM context cannot be used, RA: %03d-%0*d-%d-%d\n",
mmctx->ra.mcc, mmctx->ra.mnc_3_digits, mmctx->ra.mnc,
mmctx->ra.lac, mmctx->ra.rac);
+ if (gprs_llgmm_unassign(llme) == 1) {
+ mmctx->gb.llme = NULL;
+ llme = NULL;
+ }
mmctx = NULL;
}
diff --git a/src/gprs/gprs_llc.c b/src/gprs/gprs_llc.c
index acf4b547..6cd26cd1 100644
--- a/src/gprs/gprs_llc.c
+++ b/src/gprs/gprs_llc.c
@@ -372,7 +372,7 @@ static int _bssgp_tx_dl_ud(struct msgb *msg, struct sgsn_mm_ctx *mmctx)
dup.ms_ra_cap.v = mmctx->ms_radio_access_capa.buf;
/* make sure we only send it to the right llme */
- if (!(msgb_tlli(msg) == mmctx->gb.llme->tlli
+ if (!mmctx->gb.llme || !(msgb_tlli(msg) == mmctx->gb.llme->tlli
|| msgb_tlli(msg) == mmctx->gb.llme->old_tlli)) {
LOGP(DLLC, LOGL_ERROR,
"_bssgp_tx_dl_ud(): Attempt to send Downlink Unitdata to wrong LLME:"
@@ -1082,6 +1082,7 @@ int gprs_llgmm_assign(struct gprs_llc_llme *llme,
l->state = GPRS_LLES_UNASSIGNED;
}
llme_free(llme);
+ return 1;
} else
return -EINVAL;
</pre> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141502019-04-27T10:50:22Zkeith
<ul></ul><p>New backtrace:</p>
<pre>
Program received signal SIGABRT, Aborted.
0x00007ffff5403067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff5403067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff5404448 in __GI_abort () at abort.c:89
#2 0x00007ffff7314310 in osmo_panic_default (args=0x7fffffffe060, fmt=<optimized out>) at panic.c:49
#3 osmo_panic (fmt=<optimized out>) at panic.c:84
#4 0x000055555556c32f in gprs_sndcp_dcomp_term (comp_entity=0x555555b839d0) at gprs_sndcp_dcomp.c:130
#5 0x000055555556b9ee in gprs_sndcp_comp_free (comp_entities=0x555555d16ba0) at gprs_sndcp_comp.c:172
#6 0x000055555557ab82 in llme_free (llme=0x555555d7f6c0) at gprs_llc.c:583
#7 0x000055555557bda2 in gprs_llgmm_assign (llme=0x555555d7f6c0, old_tlli=3166033600, new_tlli=4294967295) at gprs_llc.c:1090
#8 0x0000555555566c82 in sgsn_mm_ctx_cleanup_free (mm=0x0) at gprs_sgsn.c:365
#9 0x000055555555d658 in mm_ctx_cleanup_free (ctx=0x555555999c80, log_text=0x555555589396 "T3350") at gprs_gmm.c:326
#10 0x0000555555562d45 in mmctx_timer_cb (_mm=0x555555999c80) at gprs_gmm.c:2156
#11 0x00007ffff7308526 in osmo_timers_update () at timer.c:257
#12 0x00007ffff7308d9a in osmo_select_main (polling=0) at select.c:260
#13 0x0000555555572c9f in main (argc=1, argv=0x7fffffffe618) at sgsn_main.c:524
</pre> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141512019-04-27T10:53:27Zkeith
<ul></ul><pre>
(gdb) print *comp_entity
$1 = {list = {next = 0x415454415f4d4d47, prev = 0x465f5145525f4843}, entity = 1730694483, nsapi_len = 98 'b', nsapi = "_gmm_req)[0", comp_len = 120 'x',
comp = "555555e02750]\000\000\240", algo = {pcomp = RFC_1144, dcomp = V42BIS}, compclass = 128, state = 0x555555bbabb0}
</pre> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141522019-04-27T10:59:23Zkeith
<ul><li><strong>File</strong> <a href="/attachments/3674">core_bin.tgz</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3674/core_bin.tgz">core_bin.tgz</a> added</li></ul> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141532019-04-27T12:40:08Zlaforge
<ul></ul><p>On Sat, Apr 27, 2019 at 09:21:21AM +0000, pespin [REDMINE] wrote:</p>
<blockquote>
<p>Following patch is proposed which seems to fix the invalid read/crash for now, but probably bigger refactoring of the code is needed to avoid other kind of issues or incorrect logic:</p>
</blockquote>
<p>as you seem to udnerstand the issue in detail, I'm still wondering about what exactly needs<br />to be done to reproduce the isuse. I've tried for quite some time now various combinations<br />of mising an IMSI attach on one PCU/RA-ID with a RA Update on another PCU/RA-ID, and couldn't<br />reproduce any issue that valgrind would show.</p>
<p>Before merging any patches we should at least have a test to reproduce the issue.</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141642019-04-29T16:00:30Zkeith
<ul></ul><p>With the binary mentioned in <a class="external" href="https://osmocom.org/issues/3964#note-2">https://osmocom.org/issues/3964#note-2</a></p>
<p>the latest is SIGSEGV at gprs_sndcp_comp_free.c:161</p>
<pre>
(gdb) print *comp_entities
$2 = {next = 0xdfa46, prev = 0x1}
</pre>
<p>backtrace:</p>
<pre>
#0 0x000055555556b918 in gprs_sndcp_comp_free (comp_entities=0x5555559476f0) at gprs_sndcp_comp.c:161
#1 0x000055555557ac04 in llme_free (llme=0x555555924d30) at gprs_llc.c:583
#2 0x000055555557be24 in gprs_llgmm_assign (llme=0x555555924d30, old_tlli=2378990690, new_tlli=4294967295) at gprs_llc.c:1090
#3 0x0000555555566c82 in sgsn_mm_ctx_cleanup_free (mm=0x0) at gprs_sgsn.c:365
#4 0x000055555555d658 in mm_ctx_cleanup_free (ctx=0x5555558fc950, log_text=0x555555589416 "T3350") at gprs_gmm.c:326
#5 0x0000555555562d45 in mmctx_timer_cb (_mm=0x5555558fc950) at gprs_gmm.c:2156
#6 0x00007ffff7308526 in osmo_timers_update () at timer.c:257
#7 0x00007ffff7308d9a in osmo_select_main (polling=0) at select.c:260
#8 0x0000555555572d21 in main (argc=1, argv=0x7fffffffe618) at sgsn_main.c:524
</pre> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141652019-04-29T16:09:23Zkeith
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-1 priority-2 priority-default" href="/issues/3964">Bug #3964</a>: SIGSEGV in sndcp_sm_deactivate_ind()</i> added</li></ul> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=141862019-05-01T13:33:12Zlaforge
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li><li><strong>Assignee</strong> set to <i>pespin</i></li></ul><p>Assigning to <a class="user active" href="https://osmocom.org/users/30187">pespin</a> for feedback on how this can be reproduced. I'm happy to write the test case if somebody can explain to me how to actually reproduce the issue. I tried various different things at OsmoDevCon2019 without success :/</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=149722019-06-25T10:25:44Zkeith
<ul></ul><p><a class="user active" href="https://osmocom.org/users/7">laforge</a> Just a note.. that this one (I think) was provoked by a "typo" in the osmo-nitb config</p>
<p>With a sysmobts 2050,</p>
<p>BTS 0 has gprs routing area 0<br />BTS 1 has gprs routing area 1</p>
<p>This causes much RA update of course.. Does that help to reproduce, or indeed write a test?</p>
<p>that said, I filed a number of bug reports and I now am not even sure which config or even in some cases the exact binary that was used.</p>
<p>We might just close all these from osmodevcon time and I will start again with a fresh view on it?</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=149732019-06-25T10:28:11Zkeith
<ul></ul><p>Ah sorry.. reading again above comments about routing area ID, I think what I just wrote was already quite clear.</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=149752019-06-25T10:39:12Zkeith
<ul></ul><p>There are in fact 4 BTS in this config</p>
<p>BTS 0-3</p>
<p>for each BTS, the relevant config was:</p>
<p>gprs routing area 0<br />gprs cell bvci [BTS_NUMBER + 2]<br />gprs nsei [BTS_NUMBER + 2]<br />gprs nsvc 0 nsvci [BTS_NUMBER + 2]</p>
<p>except bts 3 that had routing area 1</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=155772019-08-12T17:16:11Zpespin
<ul></ul><p>I submitted a pair of commits which may help find or exclude some causes of the crash:<br />remote: <a class="external" href="https://gerrit.osmocom.org/c/osmo-sgsn/+/15166">https://gerrit.osmocom.org/c/osmo-sgsn/+/15166</a> gprs_sgsn.c: Warn upon llme free unexpected scenarios<br />remote: <a class="external" href="https://gerrit.osmocom.org/c/osmo-sgsn/+/15167">https://gerrit.osmocom.org/c/osmo-sgsn/+/15167</a> gprs_gmm: Introduce assert to guard against unexpected condition</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=164052019-11-08T17:00:02Zpespin
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-3 priority-high3 closed" href="/issues/4245">Bug #4245</a>: osmo-sgsn hitting assert during osmo-gsm-tester ping.py test</i> added</li></ul> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=164072019-11-08T17:08:54Zpespin
<ul></ul><p>It seems we have catched this one (or related one) in osmo-gsm-tester, see ticket <a class="issue tracker-1 status-3 priority-3 priority-high3 closed" title="Bug: osmo-sgsn hitting assert during osmo-gsm-tester ping.py test (Resolved)" href="https://osmocom.org/issues/4245">#4245</a> were the assert I added here is triggered: <a class="external" href="https://gerrit.osmocom.org/c/osmo-sgsn/+/15167">https://gerrit.osmocom.org/c/osmo-sgsn/+/15167</a></p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=164092019-11-08T18:31:09Zpespin
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>90</i></li></ul><p>Should be fixed by:<br /><a class="external" href="https://gerrit.osmocom.org/c/osmo-sgsn/+/16015">https://gerrit.osmocom.org/c/osmo-sgsn/+/16015</a> gmm: Fix assertion hit during RA UPD REQ before completting gmm attach</p>
<p>And TTCN3 test triggering the issue:<br /><a class="external" href="https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/16016">https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/16016</a> sgsn: Introduce test TC_attach_req_id_req_ra_update</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=164982019-11-21T16:53:03Zpespin
<ul><li><strong>Assignee</strong> changed from <i>pespin</i> to <i>keith</i></li></ul><p>Patch was merged, assigning to <a class="user active" href="https://osmocom.org/users/4282">keith</a> for him to give it a try with the faulty setup.</p> OsmoSGSN - Bug #3957: ABORT from gprs_sndcp_comp_free()https://osmocom.org/issues/3957?journal_id=229002021-11-04T18:33:52Zkeith
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Closed</i></li><li><strong>Assignee</strong> deleted (<del><i>keith</i></del>)</li></ul>