Bug #4948
closedns2/framerelay segfault when running ttcn3-gbproxy-test-fr
100%
Description
Not sure what caused this, recent change was moving the SGSN side to SNS-IP
gdb session:
(gdb) bt #0 osmo_fr_tx_dlc (msg=0x55d55835bcd0) at frame_relay.c:789 #1 0x00007fba9b8cfdc1 in gprs_ns2_st_alive_onenter (fi=<optimized out>, old_state=0) at gprs_ns2_vc_fsm.c:366 #2 0x00007fba9b464bdf in state_chg (fi=0x55d55835b940, new_state=<optimized out>, keep_timer=keep_timer@entry=false, timeout_ms=3000, T=<optimized out>, file=<optimized out>, line=227) at fsm.c:699 #3 0x00007fba9b464efd in _osmo_fsm_inst_state_chg (fi=<optimized out>, new_state=<optimized out>, timeout_secs=<optimized out>, T=<optimized out>, file=<optimized out>, line=<optimized out>) at fsm.c:748 #4 0x00007fba9b4650a4 in _osmo_fsm_inst_dispatch (fi=0x55d55835b940, event=event@entry=0, data=data@entry=0x0, file=file@entry=0x7fba9b8e5dd4 "gprs_ns2_vc_fsm.c", line=line@entry=641) at fsm.c:877 #5 0x00007fba9b8d03a6 in gprs_ns2_vc_fsm_start (nsvc=nsvc@entry=0x55d55835b290) at gprs_ns2_vc_fsm.c:641 #6 0x00007fba9b8cba26 in gprs_ns2_start_alive_all_nsvcs (nse=nse@entry=0x55d558341e40) at gprs_ns2.c:1200 #7 0x00007fba9b8d275e in ns_sns_st_config_sgsn_ip4 (data=<optimized out>, event=4, fi=0x55d558345f80) at gprs_ns2_sns.c:907 #8 ns2_sns_st_config_sgsn (fi=0x55d558345f80, event=4, data=<optimized out>) at gprs_ns2_sns.c:984 #9 0x00007fba9b4650a4 in _osmo_fsm_inst_dispatch (fi=0x55d558345f80, event=event@entry=4, data=data@entry=0x7fff9e8018a0, file=file@entry=0x7fba9b8e5ee9 "gprs_ns2_sns.c", line=line@entry=1457) at fsm.c:877 #10 0x00007fba9b8d2e05 in gprs_ns2_sns_rx (nsvc=nsvc@entry=0x55d55834d9f0, msg=msg@entry=0x55d5583587b0, tp=tp@entry=0x7fff9e8018a0) at gprs_ns2_sns.c:1457 #11 0x00007fba9b8cb5c4 in ns2_recv_vc (nsvc=0x55d55834d9f0, msg=msg@entry=0x55d5583587b0) at gprs_ns2.c:1047 #12 0x00007fba9b8cc976 in handle_nsip_read (bfd=0x55d558345dd0) at gprs_ns2_udp.c:243 #13 nsip_fd_cb (bfd=0x55d558345dd0, what=1) at gprs_ns2_udp.c:261 #14 0x00007fba9b460a18 in poll_disp_fds (n_fd=<optimized out>) at select.c:350 #15 _osmo_select_main (polling=<optimized out>) at select.c:378 #16 0x00007fba9b460a96 in osmo_select_main (polling=<optimized out>) at select.c:417 #17 0x000055d5570792d1 in main (argc=3, argv=0x7fff9e802bc8) at gb_proxy_main.c:332 (gdb) p msg $1 = (struct msgb *) 0x55d55835bcd0 (gdb) p *msg $2 = {list = {next = 0x0, prev = 0x0}, {dst = 0xcf1, trx = 0xcf1}, lchan = 0x0, l1h = 0x0, l2h = 0x55d55835bd6c "\n", l3h = 0x0, l4h = 0x0, cb = {0, 0, 0, 0, 0}, data_len = 3072, len = 1, head = 0x55d55835bd58 "", tail = 0x55d55835bd6d "", data = 0x55d55835bd6c "\n", _data = 0x55d55835bd58 ""} (gdb) p msg->dst $3 = (void *) 0xcf1 (gdb) p dlc $4 = (struct osmo_fr_dlc *) 0xcf1 (gdb) p *dlc Cannot access memory at address 0xcf1
Updated by daniel over 3 years ago
Seems it's using the SNS NSVC and not one used by framerelay.
(gdb) p *priv->nsvc $24 = {list = {next = 0x55d55835a850, prev = 0x55d558341e60}, blist = {next = 0x55d55834a9c0, prev = 0x55d558347d08}, nse = 0x55d558341e40, bind = 0x55d558347cf0, persistent = false, nsvci = 0, sig_weight = 1 '\001', sig_counter = 0 '\000', data_weight = 1 '\001', priv = 0x55d55835bbe0, nsvci_is_valid = false, sns_only = false, ctrg = 0x55d55835b360, statg = 0x55d55835b7b0, mode = NS2_VC_MODE_ALIVE, fi = 0x55d55835b940} (gdb) p *priv->nsvc->nse $25 = {nsei = 101, nsi = 0x55d558316b40, list = {next = 0x55d558316b60, prev = 0x55d55834a690}, nsvc = { next = 0x55d55835b290, prev = 0x55d55834d9f0}, nsvc_count = 0, persistent = true, first = true, alive = false, ll = GPRS_NS2_LL_UDP, dialect = NS2_DIALECT_SNS, bss_sns_fi = 0x55d558345f80} (gdb)
notice the nsvci_is_valid = false and ll = GPRS_NS2_LL_UDP, dialect = NS2_DIALECT_SNS
Updated by daniel over 3 years ago
- % Done changed from 0 to 20
So our IP-SNS NSVC uses a Framerelay bind...
(gdb) p *priv->nsvc->bind $30 = {name = 0x55d558347c70 "hdlcnet1", list = {next = 0x55d558345c98, prev = 0x55d5583482e8}, nsvc = { next = 0x55d55835b2a0, prev = 0x55d55834a9c0}, priv = 0x55d558347dc0, nsi = 0x55d558316b40, driver = 0x7fba9baf7c10 <vc_driver_fr>, accept_ipaccess = false, accept_sns = false, transfer_capability = 2, ll = GPRS_NS2_LL_FR, send_vc = 0x7fba9b8ce8e0 <fr_vc_sendmsg>, free_vc = 0x7fba9b8cea10 <free_vc>, dump_vty = 0x7fba9b8ce7f0 <dump_vty>}
Updated by laforge over 3 years ago
Just to clarify: This is what has been breaking all TTCN3 tests for gbproxy-fr for the past three builds/nights: https://jenkins.osmocom.org/jenkins/job/ttcn3-gbproxy-test-fr/test_results_analyzer/
Updated by laforge over 3 years ago
(04:52:13 PM) laforge: dwillmann: https://git.osmocom.org/libosmocore/tree/src/gb/gprs_ns2_sns.c#n287 (04:52:26 PM) laforge: /* for every bind, create a connection if bind type == IP */ (04:52:36 PM) laforge: but it actually doesn't skip non-IP types (04:53:45 PM) laforge: I would suspect that's it, please check (04:54:08 PM) laforge: same in https://git.osmocom.org/libosmocore/tree/src/gb/gprs_ns2_sns.c#n258 for ipv4
Updated by daniel over 3 years ago
- % Done changed from 20 to 60
There were multiple checks missing for the bind type when adding NSVCs in the SNS code.
Patch is here: https://gerrit.osmocom.org/c/libosmocore/+/22193
Updated by daniel over 3 years ago
14/01/21 | 16:54:41 laforge: might make sense to unify that somehow if possible 14/01/21 | 16:55:09 laforge: like derive the 'remote' from the gprs_nse_ie_ip*_elem and then call a common function 14/01/21 | 17:00:27 laforge: dwillmann: it might also make sense to check if we should add some ASSERT that the bind-type matches the caller 14/01/21 | 17:00:47 dwillmann: yeah
Updated by daniel over 3 years ago
- Status changed from In Progress to Feedback
- % Done changed from 60 to 90
Refactoring and sanity checks are addressed here:
https://gerrit.osmocom.org/c/libosmocore/+/22234
https://gerrit.osmocom.org/c/libosmocore/+/22235
Gerrit ttcn3 tests pass now after https://gerrit.osmocom.org/c/libosmocore/+/22193 got merged.
Updated by daniel over 3 years ago
- Status changed from Feedback to Resolved
- % Done changed from 90 to 100