Project

General

Profile

Bug #5123

coredump nightly mgw on 3g voicecall startup

Added by roh 17 days ago. Updated about 3 hours ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
04/20/2021
Due date:
% Done:

0%

Spec Reference:

Description

-nightly dumped core on me trying to start a voicecall:

Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch).

range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:CB4F498E Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:CB4F498E CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:CB4F498E In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89
#2  0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n")
    at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49
#3  osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
#5  0x0804ed44 in rx_rtp (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514
#6  rtp_data_net (fd=0x810aa80, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477
#7  0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350
#8  _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378
#9  0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417
#10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406
(gdb) quit

/etc/osmocom/osmo-mgw.cfg

mgcp
  bind ip 10.23.24.1
  rtp port-range 4002 16000
  rtp bind-ip 10.23.24.1
  rtp ip-probing
  rtp ip-tos 184
  bind port 2427
  sdp audio payload number 98
  sdp audio payload name GSM
  number endpoints 31
  loop 0
  force-realloc 1
  rtcp-omit
  rtp-patch ssrc
  rtp-patch timestamp

osmo-mgw 1.8.1+gitr0+9ffaba7c1b-r2.18.0.24

mgw.log mgw.log 6.59 KB roh, 04/20/2021 01:29 PM
mgw2.pcap mgw2.pcap 52.6 KB roh, 04/20/2021 01:29 PM
mgw.log mgw.log 6.24 KB roh, 04/20/2021 01:57 PM
mgw3.pcap mgw3.pcap 26.9 KB roh, 04/20/2021 01:57 PM
my_pcap.pcapng.gz my_pcap.pcapng.gz 42.6 KB pespin, 04/20/2021 04:09 PM

Related issues

Related to OsmoMGW - Bug #5119: mgcp_client.c should not assert on unexpected codec name in the input dataNew04/18/2021

History

#1 Updated by laforge 17 days ago

  • Related to Bug #5119: mgcp_client.c should not assert on unexpected codec name in the input data added

#2 Updated by laforge 17 days ago

  • Assignee set to dexter
  • Priority changed from Normal to High

In general, no matter what happens at a remote implementation that sends packets to us, we must never OSMO_ASSERT(). This is a serious problem. OSMO_ASSERT() is to guard against conditions entirely under control of our implementation (mgw in this case).

Any remote user, even a malicious one, must always be ble to send us anything without us running into OSMO_ASSERT(). If a remote user can trigger this, it's a denial of service vulnerability.

#3 Updated by laforge 17 days ago

The pcap file shows UDP packets from 10.23.24.192 to the MGW at 10.23.24.1 port 4002. Those are definitely IPv4 packets, so AF_INET.

Can you go to "frame 4" (and then print the two values tha triger the assert, e.g. libosmo-mgcp/mgcp_network.c:1272)

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
(gdb) frame 4
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
(gdb) p from_addr->u.sa.sa_family

#4 Updated by roh 17 days ago

Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch).

range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:933CE96A Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:933CE96A CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:933CE96A In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89
#2  0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n")
    at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49
#3  osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
#5  0x0804ed44 in rx_rtp (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514
#6  rtp_data_net (fd=0x81274e0, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477
#7  0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350
#8  _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378
#9  0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417
#10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406
(gdb) frame 4
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
1272    /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c: No such file or directory.
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0
(gdb) p from_addr->u.sa.sa_family
value has been optimized out
(gdb) 

#5 Updated by laforge 17 days ago

mgcp traffic is not in the pcap file.

#6 Updated by roh 17 days ago

tcpdump -s0 -w mgw3.pcap port not 22 -i any

#7 Updated by laforge 17 days ago

So the

(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0

already tells us that it's neither AF_INET (2) nor AF_INET6 (20), but either uninitialized or AF_UNSPEC, while the received packet is of course AF_INET...

#8 Updated by laforge 17 days ago

tentative fix in https://gerrit.osmocom.org/c/osmo-mgw/+/23812 but I don't understand enough of osmo-mgw to know if it's the correct way to solve or not. It seems more reasonable that after CRCX the conn->u.rtp.end.addr.u.sa.sa_family is properly initialized?

#9 Updated by pespin 17 days ago

Indeed, the problem is similar to that of "A]" in SYS#5435. That is, nano3g is starting to send data to us really quickly, immediately after receiving RAB-ASsignment Request and before answering with RAB-Assignment Response (I actually see none of those in the pcap trace I took myself...)

So, the problem is that mgw is receiving RTP traffic on the endpoint at a time where it only went through CRCX + CRCX ACK, setting up the local address, but never got a MDCX from osmo-msc (due to no Assignment Response?) to set the remote address, here the AF_UNSET.

#10 Updated by pespin 17 days ago

I also add a pcap I took myself while seeing the issue in roh's setup.

# /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:B520FAE4 Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:B520FAE4 CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:B520FAE4 In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb763f49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7632633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb76326a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]
Aborted (core dumped)
# cat /etc/osmocom/osmo-mgw.cfg
!
! MGCP configuration example
!
log file /home/root/mgw.log
  logging filter all 1
  logging color 1
  logging print category-hex 1
  logging print category 0
  logging timestamp 1
  logging print file 1
  logging level set-all debug
mgcp
  bind ip 10.23.24.1
  rtp port-range 4002 16000
  rtp bind-ip 10.23.24.1
  rtp ip-probing
  rtp ip-tos 184
  bind port 2427
  sdp audio payload number 98
  sdp audio payload name GSM
  number endpoints 512
  loop 0
  force-realloc 1
  rtcp-omit
  rtp-patch ssrc
  rtp-patch timestamp

#11 Updated by pespin 17 days ago

The related address bits which trigger the crash from the assert (addr) are set in code path:

mgcp_parse_sdp_data:
    case 'c':
        if (audio_ip_from_sdp(&rtp->addr, line) < 0)
mgcp_parse_sdp_data:
    case 'c':
        if (audio_ip_from_sdp(&rtp->addr, line) < 0)

That is, when osmo-msc/bsc sends CRCX or MDCX with SDP and "c" option set.
In the pcap trace causing the crash, it can be seen that only 1 CRCX is sent before receiving the RTP packet which triggers the assert, and this CRCX contains no "c" option.

I would simply drop that ASSERT since it's not useful at all and only causes problems.

It should be fairly simple to create a TTCN3 MGCP_Tests that triggers the crash by sending a CRCX without "c=" option to MGW, receive the CRCX ACK with the mgw-side rtp socket and send an RTP packet there. Then, with current osmo-mgw master it should crash. Then correct behavior can be checked by sending an MDCX with "c=" after sending the first RTP pkt and receiving a MDCX ACK (it wouldn't send us an ACK if it crashed beforehand). Leaving that to dexter if he feels like adding that test.

#12 Updated by dexter about 3 hours ago

  • Status changed from New to In Progress

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)