Bug #5248
closedMemory leaks across A-bis/RSL connection
100%
Description
While working on ttcn3-bts-test, I left osmo-bts running for a while and ran out of memory. Below are the biggest talloc chunks:
OsmoBTS# show talloc-context application full tree 0x608000000380 full talloc report on 'OsmoBTS context' (total 25932818 bytes in 2379 blocks) abis contains 197460 bytes in 1481 blocks (ref 0) 0x608000000380 unixsocket contains 1 bytes in 1 blocks (ref 0) 0x60b00014f060 ipa contains 1213 bytes in 13 blocks (ref 0) 0x60b00014efb0 struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x6120000501a0 127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b000190350 struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x6120000405a0 127.0.2.1 contains 10 bytes in 1 blocks (ref 0) 0x60b00018b0d0 struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x61200003bda0 127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b00018a1b0 struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x61200000e920 127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b000184170 struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x612000029620 127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b00017d6e0 struct ipa_client_conn contains 202 bytes in 2 blocks (ref 0) 0x612000018220 127.0.1.1 contains 10 bytes in 1 blocks (ref 0) 0x60b000176620 ... OsmoBTS# show talloc-context application 2 tree 0x627000000160 talloc report on 'OsmoBTS context' (total 598284656 bytes in 48753 blocks) struct gsm_bts contains 598181420 bytes in 48417 blocks (ref 0) 0x627000000160 struct tlv_parsed contains 4131 bytes in 16 blocks (ref 0) 0x6210016a4560 struct tlv_parsed contains 4106 bytes in 3 blocks (ref 0) 0x6210016a3160 struct tlv_parsed contains 4117 bytes in 7 blocks (ref 0) 0x6210016a1d60 struct tlv_parsed contains 4116 bytes in 4 blocks (ref 0) 0x6210016a0960 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x62100169f560 ... struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7df16860 struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e116860 struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e316860 struct gsm_bts_trx contains 111271976 bytes in 8983 blocks (ref 0) 0x7fcd7e816860
How to reproduce?¶
In ttcn3-bts-test environment start everything except the testsuite, so that osmo-bts can establish A-bis/OML connections, but not the A-bis/RSL.
Software versions¶
libosmocore ca5ce0d84966c998e353b606a7054f8bc8366cae libosmo-abis d2d28d83a437f7478a4dfff0c6cae5305801b881 osmo-bts 93fbcdfb39674eb3b5b7768919e14fbc8c455fc4
Updated by pespin over 2 years ago
I guess the "..." from your traces mean there were hell a lot of entries of those types right?
Regarding the heap-allocated "struct tlv_parsed", I could only find so far the following candidates for leak (most uses are done allocating in stack):
src/common/oml.c 582: tp_merged = osmo_tlvp_copy(bts->mo.nm_attr, bts); 583: osmo_tlvp_merge(tp_merged, &tp); 739: tp_merged = osmo_tlvp_copy(trx->mo.nm_attr, trx->bts); 740: osmo_tlvp_merge(tp_merged, &tp); 953: tp_merged = osmo_tlvp_copy(ts->mo.nm_attr, bts); 954: osmo_tlvp_merge(tp_merged, &tp); 1397: tp_merged = osmo_tlvp_copy(mo->nm_attr, bts); 1398: osmo_tlvp_merge(tp_merged, &tp);
Updated by pespin over 2 years ago
All the uses of nm_attr look sane to me. I also reviewed osmo_tlvp_copy() and osmo_tlvp_merge(). It is expected though to have lots of those "struct tlv_parsed" since there's 1 kept allocated per BTS, 2 per TRX (rcarrier and bbtransc mo), and 1 per TS.
Hence, with 3 TRX as per ttcn3-bts-test, there should be around: 1 + 2*3 + 1*3*7 = 1 + 6 + 21 = 28
Updated by pespin over 2 years ago
Regarding nm_attr, we need to free them whenever reconnecting Abis, since those attributes are for some reason always merged with previous ones.
Updated by fixeria over 2 years ago
I guess the "..." from your traces mean there were hell a lot of entries of those types right?
Not necessary, it's just because I didn't want to paste everything.
Regarding the heap-allocated "struct tlv_parsed", I could only find so far the following candidates for leak (most uses are done allocating in stack):
[...]
Hence, with 3 TRX as per ttcn3-bts-test, there should be around: 1 + 2*3 + 1*3*7 = 1 + 6 + 21 = 28
This is fine. The number of 'struct tlv_parsed' does not change across the A-bis/RSL reconnects.
Updated by fixeria over 2 years ago
- % Done changed from 0 to 10
What definitely deserves attention is a) the children of 'ipa' (0x60b00014efb0 in my report below): you'll see lots of 'struct ipa_client_conn'; and b) the children of 'struct gsm_bts_trx' (e.g. 0x7fcd7df16860 in my report below): you'll see lots of 'struct l1sched_ts' [1], lots of 'struct gsm_bts_trx_ts' [2], and lots of string chunks with lchan names. I already fixed some of them.
[1] https://gerrit.osmocom.org/c/osmo-bts/+/25731 trx_sched_clean_ts(): also free() the associated 'struct l1sched_ts'
[2] https://gerrit.osmocom.org/c/osmo-bts/+/25732 trx_sched_clean(): also free() the shadow timeslot [NEW]
Updated by fixeria over 2 years ago
[1] https://gerrit.osmocom.org/c/osmo-bts/+/25731 trx_sched_clean_ts(): also free() the associated 'struct l1sched_ts'
[2] https://gerrit.osmocom.org/c/osmo-bts/+/25732 trx_sched_clean(): also free() the shadow timeslot [NEW]
Here is another patch that finally stops 'struct gsm_bts_trx' from growing more and more in size:
https://gerrit.osmocom.org/c/osmo-bts/+/25733 gsm_lchan_name_update(): use shadow timeslot as talloc context [NEW]
However, I am not sure if this patch is the right approach... In the end, I feels like having the shadow timeslots allocated in the same way as primary timeslots would simplify the code at a price of higher memory consumption. Making it as WIP for now.
Updated by fixeria over 2 years ago
- % Done changed from 10 to 60
What definitely deserves attention is a) the children of 'ipa' (0x60b00014efb0 in my report below): you'll see lots of 'struct ipa_client_conn'; and b) the children of 'struct gsm_bts_trx' (e.g. 0x7fcd7df16860 in my report below): you'll see lots of 'struct l1sched_ts' [1], lots of 'struct gsm_bts_trx_ts' [2], and lots of string chunks with lchan names. I already fixed some of them.
So the 'b)' part is more or less fixed. The 'a)' still needs to be investigated. pespin, I believe you know the IPA stuff better, so I leave it up to you ;)
Updated by pespin over 2 years ago
- Status changed from New to In Progress
- % Done changed from 60 to 80
I merged your shadow TS memleak fixes.
I also submitted these 2 commits, which follow similar approach to your WIP patch:
https://gerrit.osmocom.org/c/osmo-bts/+/25740 Add new gsm_bts_trx_free_shadow_ts() function
https://gerrit.osmocom.org/c/osmo-bts/+/25741 Make sure lchan allocated memory from shadow_ts is properly freed
I also agree that we should really fix at some point the shadow TS mess where shadow ones are talloc contexts while others are not. They should all be allocated the same way. I'm fine with either allocating all them as talloc ctx or simply as part of bts_trx struct (array of ts).
Fix for ipa_clien_conn memleak:
https://gerrit.osmocom.org/c/libosmo-abis/+/25742 ipaccess: e1inp_ipa_bts_rsl_connect: Fix memleak recreating ipa_client_conn
Updated by pespin over 2 years ago
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
This should be fixed now that the patch was merged, closing.