https://osmocom.org/https://osmocom.org/favicon.ico?16647414092021-04-20T13:33:29ZOpen Source Mobile CommunicationsOsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219432021-04-20T13:33:29Zlaforge
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-3 priority-high3 closed" href="/issues/5119">Bug #5119</a>: mgcp_client.c should not assert on unexpected codec name in the input data</i> added</li></ul> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219442021-04-20T13:35:29Zlaforge
<ul><li><strong>Assignee</strong> set to <i>dexter</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>In general, no matter what happens at a remote implementation that sends packets to us, we must never OSMO_ASSERT(). This is a serious problem. OSMO_ASSERT() is to guard against conditions entirely under control of our implementation (mgw in this case).</p>
<p>Any remote user, even a malicious one, must always be ble to send us anything without us running into OSMO_ASSERT(). If a remote user can trigger this, it's a denial of service vulnerability.</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219462021-04-20T13:42:57Zlaforge
<ul></ul><p>The pcap file shows UDP packets from 10.23.24.192 to the MGW at 10.23.24.1 port 4002. Those are definitely IPv4 packets, so AF_INET.</p>
<p>Can you go to "frame 4" (and then print the two values tha triger the assert, e.g. libosmo-mgcp/mgcp_network.c:1272)<br /><pre>
Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51 /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
(gdb) frame 4
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
(gdb) p from_addr->u.sa.sa_family
</pre></p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219472021-04-20T13:46:19Zroh
<ul></ul><pre>
Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch).
range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:933CE96A Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:933CE96A CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:933CE96A In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]
Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51 /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1 0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89
#2 0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n")
at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49
#3 osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84
#4 0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
#5 0x0804ed44 in rx_rtp (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514
#6 rtp_data_net (fd=0x81274e0, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477
#7 0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350
#8 _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378
#9 0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417
#10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406
(gdb) frame 4
#4 0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
1272 /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c: No such file or directory.
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0
(gdb) p from_addr->u.sa.sa_family
value has been optimized out
(gdb)
</pre> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219482021-04-20T13:48:09Zlaforge
<ul></ul><p>mgcp traffic is not in the pcap file.</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219492021-04-20T13:58:02Zroh
<ul><li><strong>File</strong> <a href="/attachments/4648">mgw3.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4648/mgw3.pcap">mgw3.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/4647">mgw.log</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4647/mgw.log">mgw.log</a> added</li></ul><p>tcpdump -s0 -w mgw3.pcap port not 22 -i any</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219512021-04-20T16:00:12Zlaforge
<ul></ul><p>So the<br /><pre>
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0
</pre></p>
<p>already tells us that it's neither AF_INET (2) nor AF_INET6 (20), but either uninitialized or AF_UNSPEC, while the received packet is of course AF_INET...</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219522021-04-20T16:04:01Zlaforge
<ul></ul><p>tentative fix in <a class="external" href="https://gerrit.osmocom.org/c/osmo-mgw/+/23812">https://gerrit.osmocom.org/c/osmo-mgw/+/23812</a> but I don't understand enough of osmo-mgw to know if it's the correct way to solve or not. It seems more reasonable that after CRCX the conn->u.rtp.end.addr.u.sa.sa_family is properly initialized?</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219532021-04-20T16:05:10Zpespin
<ul></ul><p>Indeed, the problem is similar to that of "A]" in SYS#5435. That is, nano3g is starting to send data to us really quickly, immediately after receiving RAB-ASsignment Request and before answering with RAB-Assignment Response (I actually see none of those in the pcap trace I took myself...)</p>
<p>So, the problem is that mgw is receiving RTP traffic on the endpoint at a time where it only went through CRCX + CRCX ACK, setting up the local address, but never got a MDCX from osmo-msc (due to no Assignment Response?) to set the remote address, here the AF_UNSET.</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219542021-04-20T16:10:18Zpespin
<ul><li><strong>File</strong> <a href="/attachments/4649">my_pcap.pcapng.gz</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4649/my_pcap.pcapng.gz">my_pcap.pcapng.gz</a> added</li></ul><p>I also add a pcap I took myself while seeing the issue in roh's setup.</p>
<pre>
# /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:B520FAE4 Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:B520FAE4 CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:B520FAE4 In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb763f49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7632633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb76326a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]
Aborted (core dumped)
</pre>
<pre>
# cat /etc/osmocom/osmo-mgw.cfg
!
! MGCP configuration example
!
log file /home/root/mgw.log
logging filter all 1
logging color 1
logging print category-hex 1
logging print category 0
logging timestamp 1
logging print file 1
logging level set-all debug
mgcp
bind ip 10.23.24.1
rtp port-range 4002 16000
rtp bind-ip 10.23.24.1
rtp ip-probing
rtp ip-tos 184
bind port 2427
sdp audio payload number 98
sdp audio payload name GSM
number endpoints 512
loop 0
force-realloc 1
rtcp-omit
rtp-patch ssrc
rtp-patch timestamp
</pre> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=219552021-04-20T17:09:57Zpespin
<ul></ul><p>The related address bits which trigger the crash from the assert (addr) are set in code path:<br /><pre>
mgcp_parse_sdp_data:
case 'c':
if (audio_ip_from_sdp(&rtp->addr, line) < 0)
mgcp_parse_sdp_data:
case 'c':
if (audio_ip_from_sdp(&rtp->addr, line) < 0)
</pre></p>
<p>That is, when osmo-msc/bsc sends CRCX or MDCX with SDP and "c" option set.<br />In the pcap trace causing the crash, it can be seen that only 1 CRCX is sent before receiving the RTP packet which triggers the assert, and this CRCX contains no "c" option.</p>
<p>I would simply drop that ASSERT since it's not useful at all and only causes problems.</p>
<p>It should be fairly simple to create a TTCN3 MGCP_Tests that triggers the crash by sending a CRCX without "c=" option to MGW, receive the CRCX ACK with the mgw-side rtp socket and send an RTP packet there. Then, with current osmo-mgw master it should crash. Then correct behavior can be checked by sending an MDCX with "c=" after sending the first RTP pkt and receiving a MDCX ACK (it wouldn't send us an ACK if it crashed beforehand). Leaving that to <a class="user active" href="https://osmocom.org/users/15">dexter</a> if he feels like adding that test.</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=220502021-05-07T16:16:15Zdexter
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=220512021-05-07T21:17:21Zdexter
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>90</i></li></ul><p>I think I have fixed the problem now. The following TTCN3 test triggers the problem:</p>
<p><a class="external" href="https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24173">https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24173</a> MGCP_Test: test LOOPBACK with implicit destination addr</p>
<p>I have now dropped the OSMO_ASSERT() but I do not understand why the OSMO_ASSERT() is even there since it defeats the purpose of the code. When the call agent does not specify the destination address in loopback mode then the sa_family is of course not initialized and different from the from-address. So its indeed correct to remove the OSMO_ASSERT().</p>
<p>I also noticed that there is a problem with writing the sa_family, I do not understand this fully but I think it is better to copy the address as a whole anyway. Since the event happens only once and is a bit unusual, I think its a good idea to put a log statement.</p>
<p>See also:<br /><a class="external" href="https://gerrit.osmocom.org/c/osmo-mgw/+/24174">https://gerrit.osmocom.org/c/osmo-mgw/+/24174</a> mgcp_network: fix implicit address loopback</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=221172021-05-17T10:47:18Zdexter
<ul></ul><p>The patch for osmo-mgw is merged but TC_one_crcx_loopback_rtp_implicit is still failing. This needs to be checked.</p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=221202021-05-17T16:48:01Zdexter
<ul></ul><p>It turned out that the problem with TC_one_crcx_loopback_rtp_implicit was IPv6 related. The MGW is returning an IPv6 address when no local address is sent with the first CRCX. I have changed TC_one_crcx_loopback_rtp_implicit now that it expects IPv6 instead of IPv4.</p>
<p>See also: <a class="external" href="https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24250">https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24250</a></p> OsmoMGW - Bug #5123: coredump nightly mgw on 3g voicecall startuphttps://osmocom.org/issues/5123?journal_id=221872021-06-02T19:21:07Zdexter
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>Assignee</strong> changed from <i>dexter</i> to <i>roh</i></li><li><strong>% Done</strong> changed from <i>90</i> to <i>100</i></li></ul><p>The problems with the OSMO_ASSERT are resolved and the TTCN3 tests pass, so I think this can be closed.</p>
<p>(assigning this back to roh, so he can have a look himself and retest if he thinks this is necessary)</p>