Project

General

Profile

Actions

Bug #5337

closed

ttcn3-bsc-test: leaked struct bsc_subscr in BSC_Tests.TC_no_msc

Added by fixeria 10 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
12/05/2021
Due date:
% Done:

100%

Spec Reference:

Description

After running ttcn3-bsc-test (actually few hours later), I see some ghost subscribers:

OsmoBSC# show subscriber all 
 IMSI             TMSI      Use
 001019876543210  ffffffff  3 (3*paging-start)
 001010000100001  ffffffff  1 (conn)

Here is the relevant talloc chunks:

$ osmo_interact_vty.py -H 127.0.0.1 -p 4242 -c "en; show talloc-context application full filter subscr" 
full talloc report on 'osmo-bsc' (total 1068914 bytes in 898 blocks)
      SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000100001)[0x562b2895a4a0] contains     86 bytes in   1 blocks (ref 0) 0x562b2897f090
      msc4294967295-conn4294967295_subscr-IMSI-001010000100001 contains     57 bytes in   1 blocks (ref 0) 0x562b28978a00
    struct gsm_subscriber_connection contains   6776 bytes in   1 blocks (ref 0) 0x562b289656a0
      struct bsc_subscr              contains    152 bytes in   3 blocks (ref 0) 0x562b28970aa0
        struct osmo_use_count_entry    contains     40 bytes in   1 blocks (ref 0) 0x562b289712f0
        struct osmo_use_count_entry    contains     40 bytes in   1 blocks (ref 0) 0x562b2893c3b0
      struct bsc_subscr              contains    232 bytes in   5 blocks (ref 0) 0x562b28973ba0
        struct osmo_use_count_entry    contains     40 bytes in   1 blocks (ref 0) 0x562b28975690
        struct osmo_use_count_entry    contains     40 bytes in   1 blocks (ref 0) 0x562b28981fb0
        struct osmo_use_count_entry    contains     40 bytes in   1 blocks (ref 0) 0x562b2897d850
        struct osmo_use_count_entry    contains     40 bytes in   1 blocks (ref 0) 0x562b2897f3d0

Related issues

Related to OsmoBSC - Bug #4832: osmo-bsc hard-releases lchan if no MSC is foundStalledneels10/25/2020

Actions
Related to OsmoBSC - Bug #5355: ttcn3-bsc-test: leaked struct bsc_subscr in LCS testsResolvedneels12/13/2021

Actions
Related to OsmoBSC - Bug #5444: ttcn3-bsc-test-vamos: leaked 'struct bsc_subscr'Resolvedneels02/07/2022

Actions
Related to Cellular Network Infrastructure - Feature #5446: correlate git version to ttcn3 testsNew02/07/2022

Actions
Related to OsmoBSC - Feature #2781: Extend OsmBSC TTCN-3 test coverage regarding resource leaksFeedbackneels12/22/2017

Actions
Actions #1

Updated by laforge 10 months ago

  • Assignee set to fixeria
Actions #2

Updated by fixeria 10 months ago

  • Status changed from New to In Progress
  • Priority changed from Normal to Low
  • % Done changed from 0 to 10

The following leak:

  IMSI             TMSI      Use
  001010000100001  ffffffff  1 (conn)

can be reproduced by running the BSC_Tests.TC_no_msc.

Actions #3

Updated by fixeria 10 months ago

  • Related to Bug #4832: osmo-bsc hard-releases lchan if no MSC is found added
Actions #4

Updated by fixeria 10 months ago

fixeria wrote in #note-2:

The following leak:

[...]

can be reproduced by running the BSC_Tests.TC_no_msc.

So in gsm_08_08.c/bsc_compl_l3() we allocate:

  • a 'struct bsc_subscr' with IMSI=001010000100001, and
  • a 'struct gsm_subscriber_connection' for the allocated subscriber.

I was interested to see if the new connection can be listed using 'show conns' command, and boom!

bsc_vty.c:725:2: runtime error: member access within null pointer of type 'struct bsc_msc_data'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==711471==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000003c (pc 0x55c82025b5b6 bp 0x7ffc86eacc50 sp 0x7ffc86eacc20 T0)
==711471==The signal is caused by a READ memory access.
==711471==Hint: address points to the zero page.
    #0 0x55c82025b5b6 in dump_one_subscr_conn /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/bsc_vty.c:725
    #1 0x55c82025bf31 in show_subscr_conn /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/bsc_vty.c:757
    #2 0x7fca696030d2 in cmd_execute_command_real ../../../../src/libosmocore/src/vty/command.c:2604
    #3 0x7fca69606448 in vty_command ../../../../src/libosmocore/src/vty/vty.c:464
    #4 0x7fca69606448 in vty_execute ../../../../src/libosmocore/src/vty/vty.c:729
    #5 0x7fca69606448 in vty_read ../../../../src/libosmocore/src/vty/vty.c:1471
    #6 0x7fca69608e6d in client_data ../../../../src/libosmocore/src/vty/telnet_interface.c:154
    #7 0x7fca695c9907 in poll_disp_fds ../../../src/libosmocore/src/select.c:361
    #8 0x7fca695c9907 in _osmo_select_main ../../../src/libosmocore/src/select.c:393
    #9 0x7fca695c9a0e in osmo_select_main_ctx ../../../src/libosmocore/src/select.c:449
    #10 0x55c8201250e3 in main /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/osmo_bsc_main.c:1087
    #11 0x7fca68998b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #12 0x55c82011b12d in _start (/home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/osmo-bsc+0x74512d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/bsc_vty.c:725 in dump_one_subscr_conn
==711471==ABORTING
Actions #5

Updated by fixeria 10 months ago

  • % Done changed from 10 to 20

The following leak:

 IMSI             TMSI      Use
 001019876543210  ffffffff  3 (3*paging-start)

can be reproduced by running:

  • BSC_Tests.TC_lcs_loc_req_for_active_ms_ta_req,
  • BSC_Tests.TC_lcs_loc_req_for_active_ms_le_timeout2,
  • BSC_Tests.TC_cm_service_during_lcs_loc_req.

In order to identify them, I hacked BSC_Tests.f_gen_test_hdlr_pars() to randomize IMSI for each test case, because all test cases the same use IMSI '001019876543210'H by default. It might be a good idea to get this patch merged:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26506 BSC_Tests: ramdomize IMSI in f_gen_test_hdlr_pars() [NEW]

Actions #6

Updated by fixeria 10 months ago

  • Status changed from In Progress to New
  • Assignee changed from fixeria to neels

I don't feel competent enough to fix this myself, so handing this ticket over to neels.

Actions #7

Updated by fixeria 10 months ago

Looks like we have even more leaks.

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26506 BSC_Tests: ramdomize IMSI in f_gen_test_hdlr_pars() [NEW]

With this patch applied I am getting the following output:

OsmoBSC# show subscriber all 
 IMSI             TMSI      Use
 001018839845904  ffffffff  1 (paging-start)    # TC_lcs_loc_req_for_active_ms_ta_req
 001015247946574  ffffffff  1 (paging-start)    # TC_lcs_loc_req_for_active_ms_le_timeout2
 001019330051280  ffffffff  1 (paging-start)    # TC_lcs_loc_req_for_active_ms_ta_req
 001019060050196  ffffffff  1 (paging-start)    # TC_lcs_loc_req_for_active_ms_le_timeout2
 001017749471063  ffffffff  1 (paging-start)    # TC_cm_service_during_lcs_loc_req
 001010000100001  ffffffff  1 (conn)            # TC_no_msc

So both TC_lcs_loc_req_for_active_ms_{ta_req,le_timeout2} trigger two subscriber leaks each.

Actions #8

Updated by neels 10 months ago

  • Subject changed from ttcn3-bsc-test: leaked struct bsc_subscr to ttcn3-bsc-test: leaked struct bsc_subscr in BSC_Tests.TC_no_msc

splitting up reported leaks into separate issues

Actions #9

Updated by fixeria 8 months ago

  • Status changed from New to Feedback
  • Priority changed from Low to Normal

Adding a quick status update here:

This is nice, but (as expected) we started to see regressions in several Jenkins jobs:

As a quick solution, we can disable mamleak checking for everything except '-master'. I prepared a change for osmo-ttcn3-hacks:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/27073 BSC_Tests: add module parameter 'mp_verify_talloc_count' [NEW]

However I think a proper solution would be back-porting the patches from master. laforge, neels what do you think?

As this problem was brought up several times during the weekly review, I am setting the normal priority.

Actions #10

Updated by fixeria 8 months ago

  • Related to Bug #5355: ttcn3-bsc-test: leaked struct bsc_subscr in LCS tests added
Actions #11

Updated by laforge 8 months ago

On Sat, Feb 05, 2022 at 04:08:50PM +0000, wrote:

As a quick solution, we can disable mamleak checking for everything except '-master'. I prepared a change for osmo-ttcn3-hacks:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/27073 BSC_Tests: add module parameter 'mp_verify_talloc_count' [NEW]

However I think a proper solution would be back-porting the patches from master. laforge, neels what do you think?

The question is how many those are and how much risk we think those patches pose.

Actions #12

Updated by fixeria 8 months ago

  • Related to Bug #5444: ttcn3-bsc-test-vamos: leaked 'struct bsc_subscr' added
Actions #13

Updated by neels 8 months ago

  • Related to Feature #5446: correlate git version to ttcn3 tests added
Actions #14

Updated by neels 8 months ago

However I think a proper solution would be back-porting the patches from master. laforge, neels what do you think?

My opinion on this is found here: https://osmocom.org/issues/5446

pasting:

This happens a lot: we improve ttcn3 testing and enhance osmo-foo master, and then
obviously the latest binaries cannot possibly pass the tests. We invent and managa
shims in jenkins.sh and ttcn3 config to not do something or other on latest.

A way to not have this burden would be that the ttcn3 test suite for a program
is correlated to the git version being tested, for example if the ttcn3 is kept
in the same git tree as the program. We should use the 'latest' version of
ttcn3 tests for a latest binary. With an implicit correlation, we can always
use exactly the tests that match the specific git revision that was built, no
matter if it was released or we're just rebasing a local branch.

I think in the long run it would save us a lot of grunt work, and it would
definitely save a lot of code cruft to make specific parts of the tests
optional.

Actions #15

Updated by fixeria 6 months ago

  • Related to Feature #2781: Extend OsmBSC TTCN-3 test coverage regarding resource leaks added
Actions #16

Updated by neels about 1 month ago

  • Assignee changed from neels to fixeria

re-reading this issue that is assigned to me, i find that i'm not sure of the status.
Vadim, do you recall more about this issue?

Actions #17

Updated by fixeria about 1 month ago

  • Status changed from Feedback to Resolved
  • % Done changed from 20 to 100

fixeria wrote in #note-9:

This is nice, but (as expected) we started to see regressions in several Jenkins jobs:

osmo-bsc v1.9.0 was tagged 8 weeks ago, so the -latest already does contain the memleak fixes. No regressions seen anymore.

  • The respective CentOS jobs are affected too, as well as 2021q1 and 2021q4

We stopped running TTCN-3 tests for 2021q1 and 2021q4; 2022q2 is not affected.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)