Bug #5862
closedsegfault during HNBGW_Tests.TC_rab_assign_mgcp_to
100%
Description
It appears that jenkins does not trigger this segfault, but
I get a consistently occuring segfault in osmo-hnbgw on my machine,
when running HNBGW_Tests.TC_rab_assign_mgcp_to()
bisect shows that this commit introduces the segfault:
e62af4d46a74af4a98dc9399082f4277fb6379e5 is the first bad commit
Author: Pau Espin Pedrol <pespin@sysmocom.de>
Introduce support for libosmo-mgcp-client MGW pooling
...
Related: SYS#5091
Related: SYS#5987
Change-Id: I371dc773b58788ee21037dc25d77f556c89c6b61
I am pretty sure this same osmo-hnbgw version ran ok at an earlier time on my machine,
and I recently upgraded my Debian 'unstable' system. So maybe that exposes a fault not encountered by earlier gcc.
I am running gcc (Debian 12.2.0-14) 12.2.0
The segfault bt:
20230117224546362 DLMGCP DEBUG MGCP_CONN(mgw-fsm-14429752-0)[0x612000003ca0]{ST_CRCX}: state_chg to ST_CRCX_RESP (mgcp_client_fsm.c:239) 20230117224546362 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: rtpbridge/*@mgw Sent messages: 1 (mgcp_client_endpoint_fsm.c:920) 20230117224546362 DLMGCP DEBUG MGW(mgw) Tx MGCP: r=127.0.0.1:2427<->l=127.0.0.1:35409: len=92 'CRCX 1 rtpbridge/*@mgw MGCP 1.0\r\nC: dc2e38'... (mgcp_client.c:742) 20230117224550365 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Timeout of T1 (fsm.c:317) 20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Terminating (cause = OSMO_FSM_TERM_REGULAR) (mgcp_client_fsm.c:509) 20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Removing from parent mgw-endp(mgw-fsm-14429752-0)[0x612000003b20] (mgcp_client_fsm.c:509) 20230117224550366 DLMGCP DEBUG MGW(mgw) Canceled transaction 1 (mgcp_client.c:1106) 20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Freeing instance (mgcp_client_fsm.c:509) 20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Deallocated (fsm.c:568) 20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Received Event MGW Response for CI #0 (mgcp_client_fsm.c:509) 20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: rtpbridge/*@mgw CI in use: 0, waiting for response: 0 (mgcp_client_endpoint_fsm.c:864) 20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Terminating (cause = OSMO_FSM_TERM_REGULAR) (mgcp_client_endpoint_fsm.c:869) 20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Removing from parent mgw(mgw-fsm-14429752-0)[0x612000003820] (mgcp_client_endpoint_fsm.c:869) 20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Freeing instance (mgcp_client_endpoint_fsm.c:869) 20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Deallocated (fsm.c:568) 20230117224550366 DMGW DEBUG mgw(mgw-fsm-14429752-0)[0x612000003820]{MGW_ST_CRCX_HNB}: Received Event MGW_EV_MGCP_TERM (mgcp_client_endpoint_fsm.c:869) ================================================================= ==255699==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b000000260 at pc 0x7f282a6ee143 bp 0x7fff0d9bcae0 sp 0x7fff0d9bcad8 READ of size 8 at 0x62b000000260 thread T0 #0 0x7f282a6ee142 in osmo_mgcpc_ep_client ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:223 #1 0x55e2a84f1889 in mgw_fsm_allstate_action ../../../../src/osmo-hnbgw/src/osmo-hnbgw/mgw_fsm.c:504 #2 0x7f2829d50c56 in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:863 #3 0x7f2829d55a08 in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:962 #4 0x7f282a72679a in osmo_mgcpc_ep_fsm_check_state_chg_after_response ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:869 #5 0x7f282a6f1869 in on_failure ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:414 #6 0x7f282a727ac6 in osmo_mgcpc_ep_fsm_handle_ci_events ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:935 #7 0x7f2829d5177b in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:875 #8 0x7f2829d55a08 in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:962 #9 0x7f282a6e90b6 in fsm_timeout_cb ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_fsm.c:509 #10 0x7f2829d45000 in fsm_tmr_cb ../../../src/libosmocore/src/fsm.c:320 #11 0x7f2829d24cb4 in osmo_timers_update ../../../src/libosmocore/src/timer.c:269 #12 0x7f2829d299b1 in _osmo_select_main ../../../src/libosmocore/src/select.c:394 #13 0x7f2829d29b5d in osmo_select_main_ctx ../../../src/libosmocore/src/select.c:455 #14 0x55e2a84b849f in main ../../../../src/osmo-hnbgw/src/osmo-hnbgw/hnbgw.c:840 #15 0x7f2829246189 (/lib/x86_64-linux-gnu/libc.so.6+0x27189) #16 0x7f2829246244 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27244) #17 0x55e2a84b0120 in _start (/usr/local/bin/osmo-hnbgw+0x9f120)
The ttcn3 test runs a CS RAB Assignment, but does not respond to osmo-hnbgw's CRCX request.
The CRCX times out, MGCP_CONN fsm terminates (libosmo-mgcp-client).
In turn the parent mgw-endp fsm terminates (libosmo-mgcp-client).
This generates an MGW_EV_MGCP_TERM event to the mgw_fsm (osmo-ttcn3-hacks).
This attempts to retrieve a pointer from mgw_fsm state:
mgw_fsm_priv->mgcpc_ep->mgcp_client
where the middle one, mgcpc_ep, is the 'mgw-endp' that already deallocated above.
It appears the /osmo-hnbgw/mgw_fsm.c should have a separate pointer to the mgcp_client, in order to call mgcp_client_pool_put() on it.
It should not rely on looking up the mgcp_client via the mgcpc_ep, because that deallocates independently.
pespin, I am making sense, right?
Updated by neels about 1 year ago
- Status changed from New to Feedback
- Assignee set to pespin
This patch solves the problem for me:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/31008
Updated by neels almost 1 year ago
- Status changed from Feedback to Resolved
- % Done changed from 70 to 100
indeed, patch is merged