Bug #5248
closed
Memory leaks across A-bis/RSL connection
Added by fixeria over 2 years ago.
Updated over 2 years ago.
Description
While working on ttcn3-bts-test, I left osmo-bts running for a while and ran out of memory. Below are the biggest talloc chunks:
OsmoBTS# show talloc-context application full tree 0x608000000380
full talloc report on 'OsmoBTS context' (total 25932818 bytes in 2379 blocks)
abis contains 197460 bytes in 1481 blocks (ref 0) 0x608000000380
unixsocket contains 1 bytes in 1 blocks (ref 0) 0x60b00014f060
ipa contains 1213 bytes in 13 blocks (ref 0) 0x60b00014efb0
struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x6120000501a0
127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b000190350
struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x6120000405a0
127.0.2.1 contains 10 bytes in 1 blocks (ref 0) 0x60b00018b0d0
struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x61200003bda0
127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b00018a1b0
struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x61200000e920
127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b000184170
struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x612000029620
127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b00017d6e0
struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x612000018220
127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b000176620
...
OsmoBTS# show talloc-context application 2 tree 0x627000000160
talloc report on 'OsmoBTS context' (total 598284656 bytes in 48753 blocks)
struct gsm_bts contains 598181420 bytes in 48417 blocks (ref 0) 0x627000000160
struct tlv_parsed contains 4131 bytes in 16 blocks (ref 0) 0x6210016a4560
struct tlv_parsed contains 4106 bytes in 3 blocks (ref 0) 0x6210016a3160
struct tlv_parsed contains 4117 bytes in 7 blocks (ref 0) 0x6210016a1d60
struct tlv_parsed contains 4116 bytes in 4 blocks (ref 0) 0x6210016a0960
struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x62100169f560
...
struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7df16860
struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e116860
struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e316860
struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e816860
How to reproduce?¶
In ttcn3-bts-test environment start everything except the testsuite, so that osmo-bts can establish A-bis/OML connections, but not the A-bis/RSL.
Software versions¶
libosmocore ca5ce0d84966c998e353b606a7054f8bc8366cae
libosmo-abis d2d28d83a437f7478a4dfff0c6cae5305801b881
osmo-bts 93fbcdfb39674eb3b5b7768919e14fbc8c455fc4
I guess the "..." from your traces mean there were hell a lot of entries of those types right?
Regarding the heap-allocated "struct tlv_parsed", I could only find so far the following candidates for leak (most uses are done allocating in stack):
src/common/oml.c
582: tp_merged = osmo_tlvp_copy(bts->mo.nm_attr, bts);
583: osmo_tlvp_merge(tp_merged, &tp);
739: tp_merged = osmo_tlvp_copy(trx->mo.nm_attr, trx->bts);
740: osmo_tlvp_merge(tp_merged, &tp);
953: tp_merged = osmo_tlvp_copy(ts->mo.nm_attr, bts);
954: osmo_tlvp_merge(tp_merged, &tp);
1397: tp_merged = osmo_tlvp_copy(mo->nm_attr, bts);
1398: osmo_tlvp_merge(tp_merged, &tp);
All the uses of nm_attr look sane to me. I also reviewed osmo_tlvp_copy() and osmo_tlvp_merge(). It is expected though to have lots of those "struct tlv_parsed" since there's 1 kept allocated per BTS, 2 per TRX (rcarrier and bbtransc mo), and 1 per TS.
Hence, with 3 TRX as per ttcn3-bts-test, there should be around: 1 + 2*3 + 1*3*7 = 1 + 6 + 21 = 28
Regarding nm_attr, we need to free them whenever reconnecting Abis, since those attributes are for some reason always merged with previous ones.
I guess the "..." from your traces mean there were hell a lot of entries of those types right?
Not necessary, it's just because I didn't want to paste everything.
Regarding the heap-allocated "struct tlv_parsed", I could only find so far the following candidates for leak (most uses are done allocating in stack):
[...]
Hence, with 3 TRX as per ttcn3-bts-test, there should be around: 1 + 2*3 + 1*3*7 = 1 + 6 + 21 = 28
This is fine. The number of 'struct tlv_parsed' does not change across the A-bis/RSL reconnects.
- % Done changed from 0 to 10
What definitely deserves attention is a) the children of 'ipa' (0x60b00014efb0 in my report below): you'll see lots of 'struct ipa_client_conn'; and b) the children of 'struct gsm_bts_trx' (e.g. 0x7fcd7df16860 in my report below): you'll see lots of 'struct l1sched_ts' [1], lots of 'struct gsm_bts_trx_ts' [2], and lots of string chunks with lchan names. I already fixed some of them.
[1] https://gerrit.osmocom.org/c/osmo-bts/+/25731 trx_sched_clean_ts(): also free() the associated 'struct l1sched_ts'
[2] https://gerrit.osmocom.org/c/osmo-bts/+/25732 trx_sched_clean(): also free() the shadow timeslot [NEW]
[1] https://gerrit.osmocom.org/c/osmo-bts/+/25731 trx_sched_clean_ts(): also free() the associated 'struct l1sched_ts'
[2] https://gerrit.osmocom.org/c/osmo-bts/+/25732 trx_sched_clean(): also free() the shadow timeslot [NEW]
Here is another patch that finally stops 'struct gsm_bts_trx' from growing more and more in size:
https://gerrit.osmocom.org/c/osmo-bts/+/25733 gsm_lchan_name_update(): use shadow timeslot as talloc context [NEW]
However, I am not sure if this patch is the right approach... In the end, I feels like having the shadow timeslots allocated in the same way as primary timeslots would simplify the code at a price of higher memory consumption. Making it as WIP for now.
- % Done changed from 10 to 60
What definitely deserves attention is a) the children of 'ipa' (0x60b00014efb0 in my report below): you'll see lots of 'struct ipa_client_conn'; and b) the children of 'struct gsm_bts_trx' (e.g. 0x7fcd7df16860 in my report below): you'll see lots of 'struct l1sched_ts' [1], lots of 'struct gsm_bts_trx_ts' [2], and lots of string chunks with lchan names. I already fixed some of them.
So the 'b)' part is more or less fixed. The 'a)' still needs to be investigated. pespin, I believe you know the IPA stuff better, so I leave it up to you ;)
- Status changed from New to In Progress
- % Done changed from 60 to 80
I merged your shadow TS memleak fixes.
I also submitted these 2 commits, which follow similar approach to your WIP patch:
https://gerrit.osmocom.org/c/osmo-bts/+/25740 Add new gsm_bts_trx_free_shadow_ts() function
https://gerrit.osmocom.org/c/osmo-bts/+/25741 Make sure lchan allocated memory from shadow_ts is properly freed
I also agree that we should really fix at some point the shadow TS mess where shadow ones are talloc contexts while others are not. They should all be allocated the same way. I'm fine with either allocating all them as talloc ctx or simply as part of bts_trx struct (array of ts).
Fix for ipa_clien_conn memleak:
https://gerrit.osmocom.org/c/libosmo-abis/+/25742 ipaccess: e1inp_ipa_bts_rsl_connect: Fix memleak recreating ipa_client_conn
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
This should be fixed now that the patch was merged, closing.
Also available in: Atom
PDF