Project

General

Profile

Actions

Bug #5248

closed

Memory leaks across A-bis/RSL connection

Added by fixeria over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
10/06/2021
Due date:
% Done:

100%

Spec Reference:

Description

While working on ttcn3-bts-test, I left osmo-bts running for a while and ran out of memory. Below are the biggest talloc chunks:

OsmoBTS# show talloc-context application full tree 0x608000000380
full talloc report on 'OsmoBTS context' (total 25932818 bytes in 2379 blocks)
  abis                           contains 197460 bytes in 1481 blocks (ref 0) 0x608000000380
    unixsocket                     contains      1 bytes in   1 blocks (ref 0) 0x60b00014f060
    ipa                            contains   1213 bytes in  13 blocks (ref 0) 0x60b00014efb0
      struct ipa_client_conn         contains    202 bytes in   2 blocks (ref 0) 0x6120000501a0
        127.0.1.1                      contains     10 bytes in   1 blocks (ref 0) 0x60b000190350
      struct ipa_client_conn         contains    202 bytes in   2 blocks (ref 0) 0x6120000405a0
        127.0.2.1                      contains     10 bytes in   1 blocks (ref 0) 0x60b00018b0d0
      struct ipa_client_conn         contains    202 bytes in   2 blocks (ref 0) 0x61200003bda0
        127.0.1.1                      contains     10 bytes in   1 blocks (ref 0) 0x60b00018a1b0
      struct ipa_client_conn         contains    202 bytes in   2 blocks (ref 0) 0x61200000e920
        127.0.1.1                      contains     10 bytes in   1 blocks (ref 0) 0x60b000184170
      struct ipa_client_conn         contains    202 bytes in   2 blocks (ref 0) 0x612000029620
        127.0.1.1                      contains     10 bytes in   1 blocks (ref 0) 0x60b00017d6e0
      struct ipa_client_conn         contains    202 bytes in   2 blocks (ref 0) 0x612000018220
        127.0.1.1                      contains     10 bytes in   1 blocks (ref 0) 0x60b000176620
...

OsmoBTS# show talloc-context application 2 tree 0x627000000160                                                                                                           
talloc report on 'OsmoBTS context' (total 598284656 bytes in 48753 blocks)                                                                                               
  struct gsm_bts                 contains 598181420 bytes in 48417 blocks (ref 0) 0x627000000160
    struct tlv_parsed              contains   4131 bytes in  16 blocks (ref 0) 0x6210016a4560
    struct tlv_parsed              contains   4106 bytes in   3 blocks (ref 0) 0x6210016a3160
    struct tlv_parsed              contains   4117 bytes in   7 blocks (ref 0) 0x6210016a1d60
    struct tlv_parsed              contains   4116 bytes in   4 blocks (ref 0) 0x6210016a0960    
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x62100169f560
...
    struct gsm_bts_trx             contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7df16860
    struct gsm_bts_trx             contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e116860
    struct gsm_bts_trx             contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e316860
    struct gsm_bts_trx             contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e816860

How to reproduce?

In ttcn3-bts-test environment start everything except the testsuite, so that osmo-bts can establish A-bis/OML connections, but not the A-bis/RSL.

Software versions

libosmocore   ca5ce0d84966c998e353b606a7054f8bc8366cae
libosmo-abis  d2d28d83a437f7478a4dfff0c6cae5305801b881
osmo-bts      93fbcdfb39674eb3b5b7768919e14fbc8c455fc4
Actions #1

Updated by pespin over 2 years ago

I guess the "..." from your traces mean there were hell a lot of entries of those types right?

Regarding the heap-allocated "struct tlv_parsed", I could only find so far the following candidates for leak (most uses are done allocating in stack):

src/common/oml.c
582:    tp_merged = osmo_tlvp_copy(bts->mo.nm_attr, bts);
583:    osmo_tlvp_merge(tp_merged, &tp);
739:    tp_merged = osmo_tlvp_copy(trx->mo.nm_attr, trx->bts);
740:    osmo_tlvp_merge(tp_merged, &tp);
953:    tp_merged = osmo_tlvp_copy(ts->mo.nm_attr, bts);
954:    osmo_tlvp_merge(tp_merged, &tp);
1397:            tp_merged = osmo_tlvp_copy(mo->nm_attr, bts);
1398:            osmo_tlvp_merge(tp_merged, &tp);
Actions #2

Updated by pespin over 2 years ago

All the uses of nm_attr look sane to me. I also reviewed osmo_tlvp_copy() and osmo_tlvp_merge(). It is expected though to have lots of those "struct tlv_parsed" since there's 1 kept allocated per BTS, 2 per TRX (rcarrier and bbtransc mo), and 1 per TS.

Hence, with 3 TRX as per ttcn3-bts-test, there should be around: 1 + 2*3 + 1*3*7 = 1 + 6 + 21 = 28

Actions #3

Updated by pespin over 2 years ago

Regarding nm_attr, we need to free them whenever reconnecting Abis, since those attributes are for some reason always merged with previous ones.

Actions #4

Updated by fixeria over 2 years ago

I guess the "..." from your traces mean there were hell a lot of entries of those types right?

Not necessary, it's just because I didn't want to paste everything.

Regarding the heap-allocated "struct tlv_parsed", I could only find so far the following candidates for leak (most uses are done allocating in stack):
[...]
Hence, with 3 TRX as per ttcn3-bts-test, there should be around: 1 + 2*3 + 1*3*7 = 1 + 6 + 21 = 28

This is fine. The number of 'struct tlv_parsed' does not change across the A-bis/RSL reconnects.

Actions #5

Updated by fixeria over 2 years ago

  • % Done changed from 0 to 10

What definitely deserves attention is a) the children of 'ipa' (0x60b00014efb0 in my report below): you'll see lots of 'struct ipa_client_conn'; and b) the children of 'struct gsm_bts_trx' (e.g. 0x7fcd7df16860 in my report below): you'll see lots of 'struct l1sched_ts' [1], lots of 'struct gsm_bts_trx_ts' [2], and lots of string chunks with lchan names. I already fixed some of them.

[1] https://gerrit.osmocom.org/c/osmo-bts/+/25731 trx_sched_clean_ts(): also free() the associated 'struct l1sched_ts'
[2] https://gerrit.osmocom.org/c/osmo-bts/+/25732 trx_sched_clean(): also free() the shadow timeslot [NEW]

Actions #6

Updated by fixeria over 2 years ago

[1] https://gerrit.osmocom.org/c/osmo-bts/+/25731 trx_sched_clean_ts(): also free() the associated 'struct l1sched_ts'
[2] https://gerrit.osmocom.org/c/osmo-bts/+/25732 trx_sched_clean(): also free() the shadow timeslot [NEW]

Here is another patch that finally stops 'struct gsm_bts_trx' from growing more and more in size:

https://gerrit.osmocom.org/c/osmo-bts/+/25733 gsm_lchan_name_update(): use shadow timeslot as talloc context [NEW]

However, I am not sure if this patch is the right approach... In the end, I feels like having the shadow timeslots allocated in the same way as primary timeslots would simplify the code at a price of higher memory consumption. Making it as WIP for now.

Actions #7

Updated by fixeria over 2 years ago

  • % Done changed from 10 to 60

What definitely deserves attention is a) the children of 'ipa' (0x60b00014efb0 in my report below): you'll see lots of 'struct ipa_client_conn'; and b) the children of 'struct gsm_bts_trx' (e.g. 0x7fcd7df16860 in my report below): you'll see lots of 'struct l1sched_ts' [1], lots of 'struct gsm_bts_trx_ts' [2], and lots of string chunks with lchan names. I already fixed some of them.

So the 'b)' part is more or less fixed. The 'a)' still needs to be investigated. pespin, I believe you know the IPA stuff better, so I leave it up to you ;)

Actions #8

Updated by pespin over 2 years ago

  • Status changed from New to In Progress
  • % Done changed from 60 to 80

I merged your shadow TS memleak fixes.
I also submitted these 2 commits, which follow similar approach to your WIP patch:
https://gerrit.osmocom.org/c/osmo-bts/+/25740 Add new gsm_bts_trx_free_shadow_ts() function
https://gerrit.osmocom.org/c/osmo-bts/+/25741 Make sure lchan allocated memory from shadow_ts is properly freed

I also agree that we should really fix at some point the shadow TS mess where shadow ones are talloc contexts while others are not. They should all be allocated the same way. I'm fine with either allocating all them as talloc ctx or simply as part of bts_trx struct (array of ts).

Fix for ipa_clien_conn memleak:
https://gerrit.osmocom.org/c/libosmo-abis/+/25742 ipaccess: e1inp_ipa_bts_rsl_connect: Fix memleak recreating ipa_client_conn

Actions #9

Updated by pespin over 2 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

This should be fixed now that the patch was merged, closing.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)