Project

General

Profile

Actions

Bug #5862

closed

segfault during HNBGW_Tests.TC_rab_assign_mgcp_to

Added by neels about 1 year ago. Updated almost 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
01/17/2023
Due date:
% Done:

100%

Spec Reference:

Description

It appears that jenkins does not trigger this segfault, but
I get a consistently occuring segfault in osmo-hnbgw on my machine,
when running HNBGW_Tests.TC_rab_assign_mgcp_to()

bisect shows that this commit introduces the segfault:

e62af4d46a74af4a98dc9399082f4277fb6379e5 is the first bad commit
Author: Pau Espin Pedrol <>
Introduce support for libosmo-mgcp-client MGW pooling
...
Related: SYS#5091
Related: SYS#5987
Change-Id: I371dc773b58788ee21037dc25d77f556c89c6b61

I am pretty sure this same osmo-hnbgw version ran ok at an earlier time on my machine,
and I recently upgraded my Debian 'unstable' system. So maybe that exposes a fault not encountered by earlier gcc.
I am running gcc (Debian 12.2.0-14) 12.2.0

The segfault bt:

20230117224546362 DLMGCP DEBUG MGCP_CONN(mgw-fsm-14429752-0)[0x612000003ca0]{ST_CRCX}: state_chg to ST_CRCX_RESP (mgcp_client_fsm.c:239)
20230117224546362 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: rtpbridge/*@mgw Sent messages: 1 (mgcp_client_endpoint_fsm.c:920)
20230117224546362 DLMGCP DEBUG MGW(mgw) Tx MGCP: r=127.0.0.1:2427<->l=127.0.0.1:35409: len=92 'CRCX 1 rtpbridge/*@mgw MGCP 1.0\r\nC: dc2e38'... (mgcp_client.c:742)
20230117224550365 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Timeout of T1 (fsm.c:317)
20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Terminating (cause = OSMO_FSM_TERM_REGULAR) (mgcp_client_fsm.c:509)
20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Removing from parent mgw-endp(mgw-fsm-14429752-0)[0x612000003b20] (mgcp_client_fsm.c:509)
20230117224550366 DLMGCP DEBUG MGW(mgw) Canceled transaction 1 (mgcp_client.c:1106)
20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Freeing instance (mgcp_client_fsm.c:509)
20230117224550366 DLMGCP DEBUG MGCP_CONN(to-HNB)[0x612000003ca0]{ST_CRCX_RESP}: Deallocated (fsm.c:568)
20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Received Event MGW Response for CI #0 (mgcp_client_fsm.c:509)
20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: rtpbridge/*@mgw CI in use: 0, waiting for response: 0 (mgcp_client_endpoint_fsm.c:864)
20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Terminating (cause = OSMO_FSM_TERM_REGULAR) (mgcp_client_endpoint_fsm.c:869)
20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Removing from parent mgw(mgw-fsm-14429752-0)[0x612000003820] (mgcp_client_endpoint_fsm.c:869)
20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Freeing instance (mgcp_client_endpoint_fsm.c:869)
20230117224550366 DLMGCP DEBUG mgw-endp(mgw-fsm-14429752-0)[0x612000003b20]{WAIT_MGW_RESPONSE}: Deallocated (fsm.c:568)
20230117224550366 DMGW DEBUG mgw(mgw-fsm-14429752-0)[0x612000003820]{MGW_ST_CRCX_HNB}: Received Event MGW_EV_MGCP_TERM (mgcp_client_endpoint_fsm.c:869)
=================================================================
==255699==ERROR: AddressSanitizer: heap-use-after-free on address 0x62b000000260 at pc 0x7f282a6ee143 bp 0x7fff0d9bcae0 sp 0x7fff0d9bcad8
READ of size 8 at 0x62b000000260 thread T0
    #0 0x7f282a6ee142 in osmo_mgcpc_ep_client ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:223
    #1 0x55e2a84f1889 in mgw_fsm_allstate_action ../../../../src/osmo-hnbgw/src/osmo-hnbgw/mgw_fsm.c:504
    #2 0x7f2829d50c56 in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:863
    #3 0x7f2829d55a08 in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:962
    #4 0x7f282a72679a in osmo_mgcpc_ep_fsm_check_state_chg_after_response ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:869
    #5 0x7f282a6f1869 in on_failure ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:414
    #6 0x7f282a727ac6 in osmo_mgcpc_ep_fsm_handle_ci_events ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_endpoint_fsm.c:935
    #7 0x7f2829d5177b in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:875
    #8 0x7f2829d55a08 in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:962
    #9 0x7f282a6e90b6 in fsm_timeout_cb ../../../../src/osmo-mgw/src/libosmo-mgcp-client/mgcp_client_fsm.c:509
    #10 0x7f2829d45000 in fsm_tmr_cb ../../../src/libosmocore/src/fsm.c:320
    #11 0x7f2829d24cb4 in osmo_timers_update ../../../src/libosmocore/src/timer.c:269
    #12 0x7f2829d299b1 in _osmo_select_main ../../../src/libosmocore/src/select.c:394
    #13 0x7f2829d29b5d in osmo_select_main_ctx ../../../src/libosmocore/src/select.c:455
    #14 0x55e2a84b849f in main ../../../../src/osmo-hnbgw/src/osmo-hnbgw/hnbgw.c:840
    #15 0x7f2829246189  (/lib/x86_64-linux-gnu/libc.so.6+0x27189)
    #16 0x7f2829246244 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27244)
    #17 0x55e2a84b0120 in _start (/usr/local/bin/osmo-hnbgw+0x9f120)

The ttcn3 test runs a CS RAB Assignment, but does not respond to osmo-hnbgw's CRCX request.
The CRCX times out, MGCP_CONN fsm terminates (libosmo-mgcp-client).
In turn the parent mgw-endp fsm terminates (libosmo-mgcp-client).
This generates an MGW_EV_MGCP_TERM event to the mgw_fsm (osmo-ttcn3-hacks).
This attempts to retrieve a pointer from mgw_fsm state:
mgw_fsm_priv->mgcpc_ep->mgcp_client
where the middle one, mgcpc_ep, is the 'mgw-endp' that already deallocated above.

It appears the /osmo-hnbgw/mgw_fsm.c should have a separate pointer to the mgcp_client, in order to call mgcp_client_pool_put() on it.
It should not rely on looking up the mgcp_client via the mgcpc_ep, because that deallocates independently.
pespin, I am making sense, right?

Actions #1

Updated by neels about 1 year ago

  • Status changed from New to Feedback
  • Assignee set to pespin

This patch solves the problem for me:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/31008

Actions #2

Updated by neels about 1 year ago

  • % Done changed from 0 to 70
Actions #3

Updated by fixeria almost 1 year ago

I guess this ticket can be closed?

Actions #4

Updated by neels almost 1 year ago

  • Status changed from Feedback to Resolved
  • % Done changed from 70 to 100

indeed, patch is merged

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)