https://osmocom.org/https://osmocom.org/favicon.ico?16647414092021-08-07T06:33:37ZOpen Source Mobile CommunicationsOsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224342021-08-07T06:33:37Zlaforge
<ul></ul><p>ok, so it clearly shows that the uplink TBF is leaking, as we shouldn't have 6860 of those</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224352021-08-07T06:50:07Zlaforge
<ul></ul><p>I've looked at the git log back to march 2021 and I couldn't immediately find something that might have affected ul_tbf memory release.</p>
<p>What was the last osmo-pcu version that you were using?</p>
<p>Completely random wild guess (my pcu knowledge is limited): Is your TRX implementing TRXDv1 or v2 fully? Not that the lack of indications for some frame numbers is creating downstream problems by code not being executed...</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224362021-08-07T07:08:15Zlaforge
<ul></ul><p>some notes:</p>
<p>UL tbf <strong>allocation</strong> happens via (inverse call chain):</p>
<ul>
<li>tbf_ul.cpp: tbf_alloc_ul_tbf()
<ul>
<li>tbf_ul.cpp: tbf_alloc_ul_pacch()
<ul>
<li>pdch.cpp: sched_ul_ass_or_rej()</li>
<li>pdch.cpp: gprs_rlcmac_pdch::rcv_resource_request()</li>
</ul>
</li>
<li>tbf_ul.cpp: tbf_alloc_ul_ccch()
<ul>
<li>bts.cpp: bts_rcv_rach()</li>
</ul></li>
</ul></li>
</ul>
<p>UL tbf <strong>free</strong> happens from:</p>
<ul>
<li>gprs_ms.c: ms_merge_and_clear_ms()
<ul>
<li>when we discover we already have a MS object for the given MS</li>
</ul>
</li>
<li>pcu_if.c: pcu_rx_susp_req()
<ul>
<li>when we receive a suspend request from the CS domain</li>
</ul>
</li>
<li>pdch.cpp: pdch_free_all_tbf()
<ul>
<li>pdch.cpp: gprs_rlcmac_pdch::free_resources()
<ul>
<li>pdch.cpp: gprs_rlcmac_pdch::disable()
<ul>
<li>pcu_rx_info_ind() - when the BSC has disabled a PDCH</li>
</ul>
</li>
</ul>
</li>
<li>bts.cpp: bts_trx_free_all_tbf()
<ul>
<li>osmobts_sock.c: pcu_sock_close()</li>
</ul>
</li>
</ul>
</li>
<li>pdch.cpp: gprs_rlcmac_pdch::rcv_control_ack()
<ul>
<li>after receiving the last CTRL ACK for a UL TBF</li>
</ul>
</li>
<li>pdch.cpp: gprs_rlcmac_pdch::rcv_resource_request()
<ul>
<li>when a new UL TBF is established for that MS, drop the old one</li>
</ul></li>
</ul>
So what I can see from all of this is that
<ul>
<li>most of the free paths are related to deactivation of PDCH/TRX/... which only happens in dynamic timeslots</li>
<li>there is no guard timer or anything like that which would make a TBF time out</li>
<li>normal free seems to happen via receiving the last CTRL ACK for that TBF</li>
<li>I currently don't see the situation that somehow the MS is gone and we never receive that last CTRL ACK.</li>
</ul>
<p>Keep in mind: I have not looked at the PCU code in a very long time, just wanted to give it a brief initial fresh set of eyes.</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224372021-08-07T07:16:26Zlaforge
<ul></ul><p><a class="user active" href="https://osmocom.org/users/732038">iedemam</a> are you seeing any log messages about invalid state transitions? We introduced osmo_fsm in osmo-pcu in May and maybe some state change doesn't happen as expected, causing code paths not to be reached? In this case you should see "transition to state %s not permitted!\n", for TBF related objects.</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224382021-08-07T07:23:39Zlaforge
<ul></ul><p>laforge wrote:</p>
<blockquote>
<ul>
<li>there is no guard timer or anything like that which would make a TBF time out</li>
</ul>
</blockquote>
<p>I missed that gprs_rlcmac_ul_tbf is derived from gprs_rlcmac_tbf, and the latter has plenty of timeout handling, and every timeout is leading to a free. So neither T3141, nor T3169, T3191, T3193, T3195 could have been running, as a timeout of any of those would lead to an immediate tbf_free().</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224512021-08-12T12:32:48Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>File</strong> <a href="/attachments/4692">osmo-pcu_TC_mt_ping_pong.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4692/osmo-pcu_TC_mt_ping_pong.log">osmo-pcu_TC_mt_ping_pong.log</a> added</li><li><strong>File</strong> <a href="/attachments/4693">_PCU_Tests.TC_mt_ping_pong.merged.gz</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4693/_PCU_Tests.TC_mt_ping_pong.merged.gz">_PCU_Tests.TC_mt_ping_pong.merged.gz</a> added</li><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>50</i></li></ul><p>Status update on how far I got with this issue:</p>
<p>I've added a check to the end of (almost) all TTCN3 PCU_Tests,<br />which counts the DL and UL TBF still allocated by fetching the talloc report via VTY.<br />If nonzero, it waits 5 seconds and checks again, repeats 10 times.</p>
<p>From this, I can identify a number of tests that still have UL TBF assigned when the test exits,<br />which don't seem to ever time out:</p>
<p>_PCU_Tests.TC_dl_egprs_data_no_llc_ui_dummy.merged<br />_PCU_Tests.TC_dl_gprs_data_no_llc_ui_dummy.merged<br />_PCU_Tests.TC_dl_multislot_tbf_ms_class_from_2phase.merged<br />_PCU_Tests.TC_dl_no_ack_retrans_imm_ass.merged<br />_PCU_Tests.TC_force_two_phase_access.merged<br />_PCU_Tests.TC_mo_ping_pong_with_ul_racap_egprs_only.merged<br />_PCU_Tests.TC_mo_ping_pong_with_ul_racap.merged<br />_PCU_Tests.TC_mt_ping_pong.merged<br />_PCU_Tests.TC_mt_ping_pong_with_dl_racap.merged<br />_PCU_Tests.TC_multiplex_dl_gprs_egprs.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_dup2.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_dup.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_twice2.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_twice3.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_twice4.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_twice5.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_twice.merged<br />_PCU_Tests.TC_nacc_outbound_pkt_cell_chg_notif_unassigned_dl_tbf.merged<br />_PCU_Tests.TC_nacc_outbound_rac_ci_resolve_conn_refused.merged<br />_PCU_Tests.TC_nacc_outbound_rac_ci_resolve_fail_parse_response.merged<br />_PCU_Tests.TC_nacc_outbound_success.merged<br />_PCU_Tests.TC_nacc_outbound_success_no_ctrl_ack.merged<br />_PCU_Tests.TC_nacc_outbound_success_twice.merged<br />_PCU_Tests.TC_nacc_outbound_success_twice_nocache.merged<br />_PCU_Tests.TC_paging_cs_multi_ms_imsi.merged<br />_PCU_Tests.TC_paging_cs_multi_ms_imsi_tmsi.merged<br />_PCU_Tests.TC_paging_cs_multi_ms_tmsi.merged<br />_PCU_Tests.TC_pcuif_fh_imm_ass_dl.merged<br />_PCU_Tests.TC_pcuif_fh_pkt_ass_dl.merged<br />_PCU_Tests.TC_pcuif_fh_pkt_ass_ul.merged<br />_PCU_Tests.TC_ul_multislot_tbf_ms_class_from_2phase.merged</p>
<p>I am now focusing on TC_mt_ping_pong and looking at its TBFs.<br />(There is a DL TBF, which gets deallocated after T3193.)<br />And there is an UL TBF which apparently never gets any timer set, and lingers after data is apparently transferred.</p>
<p>There is an osmo_fsm for the UL TBF which stays in state "ASSIGN".<br />There are no state timeouts defined on the osmo_fsm instances for TBF,<br />instead there is a list of timers gprs_rlcmac_tbf.Tarr[]</p>
<p>At the moment my main problem is lack of knowledge about how it <strong>should</strong> be timed out.<br />My usual approach would be to add a sanity timeout to the ASSIGN state of the TBF osmo_fsm instance.<br />Here though it seems that the design intends to not use any timeouts in the FSM.</p>
<p>Attaching the osmo-pcu log and the ttcn3 TC_mt_ping_pong log, taken with code from branches:<br /> osmo-pcu neels/segv<br /> osmo-ttcn3-hacks neels/pcu</p>
<p>I will continue to look into the life cycle of an UL TBF.<br />If anyone else has an idea from looking at the logs, help is welcome.</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224522021-08-12T15:19:13Zneelsnhofmeyr@sysmocom.de
<ul></ul><p>i notice now that there is a huge branch pespin/fsm in osmo-pcu, it's quite probable that Pau has covered the leak on his branch / that maybe the leak is caused by the branch being partially merged?</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224532021-08-12T15:29:34Zneelsnhofmeyr@sysmocom.de
<ul></ul><p>When I test for the UL TBF leak from TC_mt_ping_pong with Pau's branch,<br />the UL TBF is properly released with "Timeout of X2001".</p>
<p>When I test the 'latest' (v 0.9.0), there is no leak of UL TBF.</p>
<p>So this is a regression in nightly builds, and Pau's work will fix that.<br />I'll bisect to find the commit that introduces the leak and see if i can get an interim fix applied.</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224542021-08-12T16:10:34Zneelsnhofmeyr@sysmocom.de
<ul></ul><p>bisect identifies this commit as the one introducing the UL TBF leak:</p>
<pre>
commit 846fd248dc49c06441da6d7c3cd85df479810f1a
Author: Pau Espin Pedrol <pespin@sysmocom.de>
Date: Thu Apr 22 20:40:10 2021 +0200
Stop abusing T3169
Now that we finally handle N3101 and N3103 correctly, we can fix abuse
of T3169 we were doing to make sure TBFs were freed.
According to 3GPP TS 44.060, T3169 should be armed:
* N3101_MAX reached
* N3103_MAX reached
Furthermore, when T3169 is enabled, the tbf should be in state
RELEASING so that its USF is not used.
See full description: https://osmocom.org/issues/5033#note-2
Related: OS#5033
Change-Id: I2cec531e2633281b88f69ba065c0105580c81076
</pre> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224562021-08-12T19:32:48Zlaforge
<ul></ul><p>On Thu, Aug 12, 2021 at 04:10:34PM +0000, neels [REDMINE] wrote:</p>
<blockquote>
<p>bisect identifies this commit as the one introducing the UL TBF leak:</p>
</blockquote>
<p>nice catch! Are you looking to fix that commit, or are you simply going to revert it?</p>
<p>Looking forward to seeing something related in gerrit. Thanks for diving into this.</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224572021-08-13T09:19:11Zneelsnhofmeyr@sysmocom.de
<ul></ul><p>I have reverted above commit in <a class="external" href="https://gerrit.osmocom.org/c/osmo-pcu/+/25193">https://gerrit.osmocom.org/c/osmo-pcu/+/25193</a><br />and the TTCN3 PCU_Tests do pass with this revert applied.<br />Disclaimer: I'm not really familiar with the PCU and "don't know what I'm doing".</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=224602021-08-16T09:16:04Ziedemam
<ul></ul><p>Apologies for not being responsive here. My spam filter was extra greedy it seems.</p>
<p>I see this change has also already been merged. I'll report back if we see the same behavior after upgrades. Otherwise, looks good.</p>
<p>Thanks!</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=225402021-09-15T09:26:29Zlaforge
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li><li><strong>Assignee</strong> changed from <i>neels</i> to <i>pespin</i></li></ul> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=225642021-09-15T12:38:36Zpespin
<ul><li><strong>Assignee</strong> changed from <i>pespin</i> to <i>iedemam</i></li></ul><p>Same as <a class="issue tracker-1 status-3 priority-3 priority-high3 closed" title="Bug: Crash: SEGV on current master 945be910 (Resolved)" href="https://osmocom.org/issues/5205">#5205</a>, let's ask iedemam if he stills sees it with recent versions of osmo-pcu after FSM refactoring was merged (end of August).</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=228082021-10-25T06:02:37Zlaforge
<ul></ul><p>iedemam wrote:</p>
<blockquote>
<p>Apologies for not being responsive here. My spam filter was extra greedy it seems.</p>
<p>I see this change has also already been merged. I'll report back if we see the same behavior after upgrades. Otherwise, looks good.</p>
<p>Thanks!</p>
</blockquote>
<p>Any update here? Can the issue be closed?</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=228172021-10-25T08:39:36Ziedemam
<ul></ul><p>Close me, I'm fixed.<br />-The Bug</p> OsmoPCU - Bug #5209: Memory usage on current master 945be910https://osmocom.org/issues/5209?journal_id=231962021-12-09T21:11:33Zlaforge
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>50</i> to <i>100</i></li></ul>