Bug #5325
closedttcn3-bts-test[-latest] provokes a segfault
100%
Description
Build artifacts of the recent ttcn3-bts-test[-latest] runs contain core dumps:
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/1479/artifact/logs/bts/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test-latest/1153/artifact/logs/bts/
Looks like double free to me:
(gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f60624f642a in __GI_abort () at abort.c:89 #2 0x00007f606334bd7c in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2 #3 0x00007f606334b949 in _talloc_free () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2 #4 0x0000557d09d77752 in bts_smscb_state_reset (bts_ss=bts_ss@entry=0x557d0c068ef8) at cbch.c:336 #5 0x0000557d09d77d90 in bts_cbch_reset (bts=bts@entry=0x557d0c065bc0) at cbch.c:341 #6 0x0000557d09d7acfa in st_op_disabled_notinstalled_on_enter (fi=<optimized out>, prev_state=<optimized out>) at nm_bts_fsm.c:64 #7 0x00007f6062c9491f in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #8 0x00007f6062c94c3d in _osmo_fsm_inst_state_chg () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #9 0x00007f6062c94e14 in _osmo_fsm_inst_dispatch () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #10 0x0000557d09d7aa54 in ev_dispatch_children (event=6, site_mgr=0x557d0c065d30) at nm_bts_sm_fsm.c:48 #11 nm_bts_sm_allstate (fi=0x557d0c069250, event=<optimized out>, data=<optimized out>) at nm_bts_sm_fsm.c:135 #12 0x00007f6062c94e14 in _osmo_fsm_inst_dispatch () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #13 0x0000557d09d72f1d in st_exit_on_enter (fi=0x557d0c069020, prev_state=<optimized out>) at bts_shutdown_fsm.c:164 #14 0x00007f6062c9491f in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #15 0x00007f6062c94c3d in _osmo_fsm_inst_state_chg () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #16 0x0000557d09d72b57 in st_wait_trx_closed (fi=0x557d0c069020, event=<optimized out>, data=<optimized out>) at bts_shutdown_fsm.c:155 #17 0x00007f6062c94de4 in _osmo_fsm_inst_dispatch () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #18 0x0000557d09d7318c in bts_model_trx_close_cb (trx=<optimized out>, rc=<optimized out>) at bts_shutdown_fsm.c:277 #19 0x0000557d09d546cb in trx_prov_fsm_apply_close (plink=0x557d0c06fec0, rc=0) at trx_provision_fsm.c:316 #20 0x00007f6062c94de4 in _osmo_fsm_inst_dispatch () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #21 0x0000557d09d49fc2 in trx_ctrl_rx_rsp_poweroff (rsp=0x7ffdbf202540, rsp=0x7ffdbf202540, l1h=0x557d0c07d790) at trx_if.c:518 #22 trx_ctrl_rx_rsp (tcm=0x557d0c404bc0, rsp=0x7ffdbf202540, l1h=0x557d0c07d790) at trx_if.c:637 #23 trx_ctrl_read_cb (ofd=<optimized out>, what=<optimized out>) at trx_if.c:733 #24 0x00007f6062c906fc in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #25 0x00007f6062c907a6 in osmo_select_main () from /usr/lib/x86_64-linux-gnu/libosmocore.so.18 #26 0x0000557d09d79284 in bts_main (argc=3, argv=0x7ffdbf202db8) at main.c:437 #27 0x00007f60624e22e1 in __libc_start_main (main=0x557d09d48aa0 <main>, argc=3, argv=0x7ffdbf202db8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdbf202da8) at ../csu/libc-start.c:291 #28 0x0000557d09d48d0a in _start ()
Updated by fixeria over 2 years ago
Here is a more detailed backtrace (with libosmocore-dbg installed):
(gdb) frame bt No symbol "bt" in current context. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f60624f642a in __GI_abort () at abort.c:89 #2 0x00007f606334bd7c in ?? () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2 #3 0x00007f606334b949 in _talloc_free () from /usr/lib/x86_64-linux-gnu/libtalloc.so.2 #4 0x0000557d09d77752 in bts_smscb_state_reset (bts_ss=bts_ss@entry=0x557d0c068ef8) at cbch.c:336 #5 0x0000557d09d77d90 in bts_cbch_reset (bts=bts@entry=0x557d0c065bc0) at cbch.c:341 #6 0x0000557d09d7acfa in st_op_disabled_notinstalled_on_enter (fi=<optimized out>, prev_state=<optimized out>) at nm_bts_fsm.c:64 #7 0x00007f6062c9491f in state_chg (fi=0x557d0c069480, new_state=<optimized out>, keep_timer=keep_timer@entry=false, timeout_ms=0, T=<optimized out>, file=<optimized out>, line=159) at fsm.c:699 #8 0x00007f6062c94c3d in _osmo_fsm_inst_state_chg (fi=<optimized out>, new_state=<optimized out>, timeout_secs=<optimized out>, T=<optimized out>, file=<optimized out>, line=<optimized out>) at fsm.c:748 #9 0x00007f6062c94e14 in _osmo_fsm_inst_dispatch (fi=0x557d0c069480, event=event@entry=6, data=data@entry=0x0, file=file@entry=0x557d09d968e0 "nm_bts_sm_fsm.c", line=line@entry=48) at fsm.c:865 #10 0x0000557d09d7aa54 in ev_dispatch_children (event=6, site_mgr=0x557d0c065d30) at nm_bts_sm_fsm.c:48 #11 nm_bts_sm_allstate (fi=0x557d0c069250, event=<optimized out>, data=<optimized out>) at nm_bts_sm_fsm.c:135 #12 0x00007f6062c94e14 in _osmo_fsm_inst_dispatch (fi=0x557d0c069250, event=event@entry=6, data=data@entry=0x0, file=file@entry=0x557d09d8d48f "bts_shutdown_fsm.c", line=line@entry=164) at fsm.c:865 #13 0x0000557d09d72f1d in st_exit_on_enter (fi=0x557d0c069020, prev_state=<optimized out>) at bts_shutdown_fsm.c:164 #14 0x00007f6062c9491f in state_chg (fi=fi@entry=0x557d0c069020, new_state=new_state@entry=3, keep_timer=keep_timer@entry=false, timeout_ms=0, T=<optimized out>, file=<optimized out>, line=155) at fsm.c:699 #15 0x00007f6062c94c3d in _osmo_fsm_inst_state_chg (fi=fi@entry=0x557d0c069020, new_state=new_state@entry=3, timeout_secs=<optimized out>, T=<optimized out>, file=<optimized out>, line=<optimized out>) at fsm.c:748 #16 0x00007f6062ca5a32 in _osmo_tdef_fsm_inst_state_chg (fi=fi@entry=0x557d0c069020, state=state@entry=3, timeouts_array=timeouts_array@entry=0x557d09d8d520 <bts_shutdown_fsm_timeouts>, tdefs=<optimized out>, default_timeout=93995561029664, default_timeout@entry=-1, file=file@entry=0x557d09d8d48f "bts_shutdown_fsm.c", line=155) at tdef.c:357 #17 0x0000557d09d72b57 in st_wait_trx_closed (fi=0x557d0c069020, event=<optimized out>, data=<optimized out>) at bts_shutdown_fsm.c:155 #18 0x00007f6062c94de4 in _osmo_fsm_inst_dispatch (fi=0x557d0c069020, event=event@entry=2, data=0x7f6064067070, file=file@entry=0x557d09d8d48f "bts_shutdown_fsm.c", line=line@entry=277) at fsm.c:877 #19 0x0000557d09d7318c in bts_model_trx_close_cb (trx=<optimized out>, rc=rc@entry=0) at bts_shutdown_fsm.c:277 #20 0x0000557d09d546cb in trx_prov_fsm_apply_close (plink=0x557d0c06fec0, rc=0) at trx_provision_fsm.c:316 #21 0x00007f6062c94de4 in _osmo_fsm_inst_dispatch (fi=0x557d0c07d930, event=16, data=0x0, file=0x557d09d84d37 "trx_provision_fsm.c", line=59) at fsm.c:877 #22 0x0000557d09d49fc2 in trx_ctrl_rx_rsp_poweroff (rsp=0x7ffdbf202540, rsp=0x7ffdbf202540, l1h=0x557d0c07d790) at trx_if.c:518 #23 trx_ctrl_rx_rsp (tcm=0x557d0c404bc0, rsp=0x7ffdbf202540, l1h=0x557d0c07d790) at trx_if.c:637 #24 trx_ctrl_read_cb (ofd=<optimized out>, what=<optimized out>) at trx_if.c:733 #25 0x00007f6062c906fc in poll_disp_fds (n_fd=<optimized out>) at select.c:361 #26 _osmo_select_main (polling=polling@entry=0) at select.c:393 #27 0x00007f6062c907a6 in osmo_select_main (polling=polling@entry=0) at select.c:432 #28 0x0000557d09d79284 in bts_main (argc=3, argv=0x7ffdbf202db8) at main.c:437 #29 0x00007f60624e22e1 in __libc_start_main (main=0x557d09d48aa0 <main>, argc=3, argv=0x7ffdbf202db8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdbf202da8) at ../csu/libc-start.c:291 #30 0x0000557d09d48d0a in _start ()
Updated by laforge over 2 years ago
I think the problem is that the list head of bts->smscb_basic.queue and bts->smscb_extended.queue are not initialized anywhere?
Updated by laforge over 2 years ago
laforge wrote in #note-2:
I think the problem is that the list head of bts->smscb_basic.queue and bts->smscb_extended.queue are not initialized anywhere?
nevermind, typo in my grep. bts_main9) calls bts_init() which initializes those fields.
I think the problem is likely related to
commit ae606d69a46a59cab1415502ffee24020bce515b Author: Pau Espin Pedrol <pespin@sysmocom.de> Date: Wed Oct 20 16:11:54 2021 +0200 Reset CBCH state after BTS shutdown Related: OS#5273 Change-Id: Ib01d38c59ba9fa083fcc0682009c13d2db3664fe
Updated by laforge over 2 years ago
- Status changed from New to Feedback
- Assignee set to fixeria
- % Done changed from 0 to 60
Expected to be fixed by https://gerrit.osmocom.org/c/osmo-bts/+/26351
commit 7ea684723ca9ac8acf47302921e1af988c567322 Author: Harald Welte <laforge@osmocom.org> Date: Wed Nov 24 14:47:23 2021 +0100 cbch: Fix dangling cur_msg leading to double-free in bts_cbch_reset() If a new default message is installed via RSL, and the old default message is currently being transmitted, we must set cur_msg to NULL. The old default message must be talloc_free()d unconditionally whenever a new default message is being set. We can do that by using the TALLOC_FREE macro. Change-Id: Id32c2074b61cd1f09957b9d1558ffb3a7691a8e0 Closes: OS#5325 diff --git a/src/common/cbch.c b/src/common/cbch.c index addd68c9..46774803 100644 --- a/src/common/cbch.c +++ b/src/common/cbch.c @@ -233,10 +233,10 @@ int bts_process_smscb_cmd(struct gsm_bts *bts, struct rsl_ie_cb_cmd_type cmd_typ rate_ctr_inc2(bts_ss->ctrs, CBCH_CTR_RCVD_QUEUED); break; case RSL_CB_CMD_TYPE_DEFAULT: - /* old default msg will be free'd in get_smscb_block() if it is currently in transit - * and we set a new default_msg here */ + /* clear the cur_msg pointer if it is the old default message */ if (bts_ss->cur_msg && bts_ss->cur_msg == bts_ss->default_msg) - talloc_free(bts_ss->cur_msg); + bts_ss->cur_msg = NULL; + talloc_free(bts_ss->default_msg); if (cmd_type.def_bcast == RSL_CB_CMD_DEFBCAST_NORMAL) /* def_bcast == 0: normal message */ bts_ss->default_msg = scm;
Updated by laforge over 2 years ago
- Status changed from Feedback to In Progress
- Assignee changed from fixeria to laforge
problem still persists even with patch:
(gdb) frame 7 #7 0x000055555589e445 in bts_smscb_state_reset (bts_ss=0x627000003498) at cbch.c:336 336 TALLOC_FREE(bts_ss->default_msg); (gdb) p bts_ss->default_msg $3 = (struct smscb_msg *) 0x6110000092e0 (gdb) p *bts_ss->default_msg $4 = {list = {next = 0x0, prev = 0x0}, is_schedule = false, msg = "\001\002\003\004\005\006\a\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025", '+' <repeats 66 times>, num_segs = 1 '\001'} (gdb) p *bts_ss $5 = {queue = {next = 0x627000003498, prev = 0x627000003498}, queue_len = 0, ctrs = 0x6160000051e0, cur_msg = 0x0, default_msg = 0x6110000092e0}
Updated by laforge over 2 years ago
There was another problem with the cbch-state-clearing code introduced recently, it's now fixed in https://gerrit.osmocom.org/c/osmo-bts/+/26355
commit 79f21c4ed172eadf1e3b046446cdec48ccce6a99 Author: Harald Welte <laforge@osmocom.org> Date: Wed Nov 24 20:00:29 2021 +0100 cbch: Fix bts_smscb_state_reset() to avoid double-free If the currently transmitted message is the default message, bts_ss->cur_msg == bts_ss->derfault_msg. In this case we cannot simply talloc_free() both of them, as it would result in a boudle-free. Change-Id: I2d3645e34d31507b012a53ffe12d14223682f808 Closes: OS#5325 Fixes: Ib01d38c59ba9fa083fcc0682009c13d2db3664fe diff --git a/src/common/cbch.c b/src/common/cbch.c index addd68c9..a3e12961 100644 --- a/src/common/cbch.c +++ b/src/common/cbch.c @@ -332,7 +332,10 @@ static void bts_smscb_state_reset(struct bts_smscb_state *bts_ss) } bts_ss->queue_len = 0; rate_ctr_group_reset(bts_ss->ctrs); - TALLOC_FREE(bts_ss->cur_msg); + /* avoid double-free of default_msg in case cur_msg == default_msg */ + if (bts_ss->cur_msg && bts_ss->cur_msg != bts_ss->default_msg) + talloc_free(bts_ss->cur_msg); + bts_ss->cur_msg = NULL; TALLOC_FREE(bts_ss->default_msg); }
Updated by laforge over 2 years ago
- Status changed from In Progress to Resolved
- % Done changed from 60 to 100
Applied in changeset osmo-bts|79f21c4ed172eadf1e3b046446cdec48ccce6a99.