https://osmocom.org/https://osmocom.org/favicon.ico?16647414092020-06-18T15:43:46ZOpen Source Mobile CommunicationsOsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=187802020-06-18T15:43:46Zneelsnhofmeyr@sysmocom.de
<ul></ul><p>the error seems to happen at different times, depending on the timing(?)</p>
<p>jenkins shows:</p>
<pre>
[0;m[1;31m20200618083656230 [1;31mDRLL[0;m[1;31m <0000> mgcp_client_fsm.c:526 MGCP_CONN(to-BTS)[0x555555b32910]{ST_DLCX_RESP}: MGW/DLCX: abrupt FSM termination with connections still present, sending unconditional DLCX...
[0;m20200618083656230 [1;31mDLMGCP[0;m <0023> mgcp_client.c:701 Failed to read: r=172.18.12.203:2427<->l=172.18.12.20:2727: 111='Connection refused'
[0;mAssert failed conn->lchan ../../include/osmocom/bsc/gsm_data.h:1376
</pre>
<p>while a local run shows:<br /><pre>
20200618153242817 DRLL ERROR MGCP_CONN(to-BTS)[0x55f0a6491e80]{ST_DLCX_RESP}: MGW/DLCX: abrupt FSM termination with connections still present, sending unconditional DLCX... (mgcp_client_fsm.c:526)
20200618153242817 DRLL DEBUG MGCP_CONN(to-BTS)[0x55f0a6491e80]{ST_DLCX_RESP}: Freeing instance (mgcp_client_fsm.c:496)
20200618153242817 DRLL DEBUG MGCP_CONN(to-BTS)[0x55f0a6491e80]{ST_DLCX_RESP}: Deallocated (fsm.c:573)
20200618153242817 DLMGCP ERROR Failed to read: r=172.18.12.203:2427<->l=172.18.12.20:2727: 111='Connection refused' (mgcp_client.c:701)
20200618153243361 DHODEC DEBUG (BTS 0) No congestion check: no minimum for free TCH/F nor TCH/H set (handover_decision_2.c:1846)
20200618153243361 DHODEC DEBUG (BTS 1) No congestion check: no minimum for free TCH/F nor TCH/H set (handover_decision_2.c:1846)
20200618153243361 DHODEC DEBUG (BTS 2) No congestion check: no minimum for free TCH/F nor TCH/H set (handover_decision_2.c:1846)
20200618153243362 DHODEC DEBUG (BTS 3) No congestion check: TRX 0 not usable (handover_decision_2.c:1830)
20200618153243362 DHODEC DEBUG HO algorithm 2: next periodical congestion check in 10 seconds (handover_decision_2.c:162)
20200618153244799 DAS DEBUG assignment(conn25_0-0-1-TCH_F-0)[0x55f0a64979f0]{WAIT_RR_ASS_COMPLETE}: Timeout of T10 (fsm.c:322)
Assert failed conn->lchan ../../include/osmocom/bsc/gsm_data.h:1376
</pre></p>
<p>i.e. one during MGCP client teardown, the other during assignment FSM teardown a bit later.</p>
<p>It is obvious that the OSMO_ASSERT(conn->lchan) is too drastic,<br />but I'd also like to understand the root cause for things going wrong in the first place.</p> OsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=187812020-06-18T15:47:05Zneelsnhofmeyr@sysmocom.de
<ul></ul><p>in <a class="issue tracker-1 status-3 priority-3 priority-high3 closed" title="Bug: ttcn3-bsc-test: all LCLS test cases broken since build #987 (Resolved)" href="https://osmocom.org/issues/4619">#4619</a>, fixeria wrote:</p>
<p>"</p>
<blockquote>
<p>Assert failed conn->lchan ../../../../src/osmo-bsc/include/osmocom/bsc/gsm_data.h:1376</p>
</blockquote>
<p>According to the backtrace, the crash originates from <em>assignment_fsm_timer_cb()</em>. This callback tries to update some counters:</p>
<pre><code class="c syntaxhl"><span class="kt">int</span> <span class="nf">assignment_fsm_timer_cb</span><span class="p">(</span><span class="k">struct</span> <span class="n">osmo_fsm_inst</span> <span class="o">*</span><span class="n">fi</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">gsm_subscriber_connection</span> <span class="o">*</span><span class="n">conn</span> <span class="o">=</span> <span class="n">assignment_fi_conn</span><span class="p">(</span><span class="n">fi</span><span class="p">);</span>
<span class="n">assignment_count_result</span><span class="p">(</span><span class="n">CTR_ASSIGNMENT_TIMEOUT</span><span class="p">);</span> <span class="c1">// New: b5ccf09fc4 2020-06-08 14:36:46</span>
<span class="n">assignment_fail</span><span class="p">(</span><span class="n">GSM0808_CAUSE_EQUIPMENT_FAILURE</span><span class="p">,</span> <span class="s">"Timeout"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
<p>so trying to follow (Ctrl+] in Vim) <em>assignment_count_result()</em> brings us to the macros hell:</p>
<pre><code class="c syntaxhl"><span class="cp">#define assignment_count_result(counter) do { \
if (!conn->assignment.result_rate_ctr_done) { \
assignment_count(counter); </span><span class="cm">/* <--- We go here */</span><span class="cp"> \
conn->assignment.result_rate_ctr_done = true; \
} else \
LOG_ASSIGNMENT(conn, LOGL_DEBUG, \
"result rate counter already recorded, NOT counting as: %s %s\n", \
bsc_ctr_description[BSC_##counter].name, \
bsc_ctr_description[BSC_##counter].description); \
} while(0)
</span></code></pre>
<p>following the rabbit hole, we end up here:</p>
<pre><code class="c syntaxhl"><span class="cm">/* Assume presence of local var 'conn' as struct gsm_subscriber_connection */</span>
<span class="cp">#define assignment_count(counter) do { \
struct gsm_bts *bts = conn_get_bts(conn); </span><span class="cm">/* <--- Here is the party pooper! */</span><span class="cp"> \
LOG_ASSIGNMENT(conn, LOGL_DEBUG, "incrementing rate counter: %s %s\n", \
bsc_ctr_description[BSC_##counter].name, \
bsc_ctr_description[BSC_##counter].description); \
rate_ctr_inc(&conn->network->bsc_ctrs->ctr[BSC_##counter]); \
rate_ctr_inc(&bts->bts_ctrs->ctr[BTS_##counter]); \
} while(0)
</span></code></pre>
<p>so I think I found culprit:</p>
<pre>
commit b5ccf09fc4042c7fb1fdaaa6263961c40b32564e
Author: Daniel Willmann <dwillmann@sysmocom.de>
Date: Mon Jun 8 14:36:46 2020 +0200
Count assignment rates per BTS as well
This adds the assignment counters for the BTS as well and changes the
assignment_count() macro to increase both the counters for the BSC as
well as the BTS.
Related: SYS#4877
Change-Id: I0009e51d4caf68e762138d98e2e23d49acc3cc1a
</pre>
<p>"</p> OsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=188132020-06-21T13:05:10Zlaforge
<ul><li><strong>Subject</strong> changed from <i>ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f</i> to <i>ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")</i></li><li><strong>Assignee</strong> set to <i>d8g11qnrlqjn</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul><p>neels wrote:</p>
<blockquote>
<p>so I think I found culprit:</p>
<p>[...]</p>
</blockquote>
<p>If Daniel recently authored a commit, it definitely helps to add him as a watcher to this issue <strong>before</strong> makign the related issue update to ensure he gets notified about it. Alternatively, you could directly assign it to him. A ticket with no watchers and no assignee will not be seen by anyone but those few insane enough to watch everything everywhere.</p> OsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=188142020-06-21T13:05:21Zlaforge
<ul><li><strong>Assignee</strong> changed from <i>d8g11qnrlqjn</i> to <i>daniel</i></li></ul> OsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=188152020-06-21T13:05:52Zlaforge
<ul></ul> OsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=188182020-06-21T13:12:40Zfixeria
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li></ul><p>There is already a fix for this issue (the result of our collaborative investigation):</p>
<p><a class="external" href="https://gerrit.osmocom.org/c/osmo-bsc/+/18907">https://gerrit.osmocom.org/c/osmo-bsc/+/18907</a> fix crashes due to OSMO_ASSERT(conn->lchan)</p>
<p>I had to spend time on this because this segfault was blocking my <a class="issue tracker-1 status-3 priority-3 priority-high3 closed" title="Bug: ttcn3-bsc-test: all LCLS test cases broken since build #987 (Resolved)" href="https://osmocom.org/issues/4619">#4619</a> investigation.</p> OsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=188302020-06-21T13:50:42Zfixeria
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-2 priority-default closed" href="/issues/3744">Bug #3744</a>: include/osmocom/bsc/gsm_data.h: OSMO_ASSERT(conn->lchan); failed in conn_get_bts()</i> added</li></ul> OsmoBSC - Bug #4620: ttcn3-bsc-sccplite: crash in BSC_Tests.TC_assignment_codec_amr_f (introduced by "count assignment rates per bts")https://osmocom.org/issues/4620?journal_id=188972020-06-23T13:42:21Zdaniel
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>Fix merged, thanks!</p>