Project

General

Profile

Bug #5123

coredump nightly mgw on 3g voicecall startup

Added by roh 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
04/20/2021
Due date:
% Done:

100%

Spec Reference:

Description

-nightly dumped core on me trying to start a voicecall:

Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch).

range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:CB4F498E Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:CB4F498E CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:CB4F498E In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89
#2  0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n")
    at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49
#3  osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
#5  0x0804ed44 in rx_rtp (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514
#6  rtp_data_net (fd=0x810aa80, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477
#7  0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350
#8  _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378
#9  0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417
#10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406
(gdb) quit

/etc/osmocom/osmo-mgw.cfg

mgcp
  bind ip 10.23.24.1
  rtp port-range 4002 16000
  rtp bind-ip 10.23.24.1
  rtp ip-probing
  rtp ip-tos 184
  bind port 2427
  sdp audio payload number 98
  sdp audio payload name GSM
  number endpoints 31
  loop 0
  force-realloc 1
  rtcp-omit
  rtp-patch ssrc
  rtp-patch timestamp

osmo-mgw 1.8.1+gitr0+9ffaba7c1b-r2.18.0.24

mgw.log mgw.log 6.59 KB roh, 04/20/2021 01:29 PM
mgw2.pcap mgw2.pcap 52.6 KB roh, 04/20/2021 01:29 PM
mgw.log mgw.log 6.24 KB roh, 04/20/2021 01:57 PM
mgw3.pcap mgw3.pcap 26.9 KB roh, 04/20/2021 01:57 PM
my_pcap.pcapng.gz my_pcap.pcapng.gz 42.6 KB pespin, 04/20/2021 04:09 PM

Related issues

Related to OsmoMGW - Bug #5119: mgcp_client.c should not assert on unexpected codec name in the input dataResolved04/18/2021

Associated revisions

Revision 97a9312b (diff)
Added by dexter 5 months ago

mgcp_network: fix implicit address loopback

A call agent may send a CRCX to create a connection in LOOPBACK mode but
without specifiying the destination address. In those cases the MGW
should deduct the destination address from the first incoming RTP
packet.

Unfortunately this is currently blocked by an OSMO_ASSERT that checks the
current sa_familiy against the sa_family from the incoming packet. This
makes no sense since the current sa_family is still uninitalized, which
is expected and not an error since the code that follows will initalize
it.

It also makes sense not to access the osmo_sockaddr struct members
individually but rather copy the address as a wohle.

Since the event only happens once and since it is also somewhat special
it makes sense to log the event as well.

Change-Id: I2dbd6f62170a7f62e5287d04a4ee6716b8786c26
Related: OS#5123

Revision 1ac1398d (diff)
Added by dexter 5 months ago

MGCP_Test: test LOOPBACK with implicit destination addr

Test what happens when the MGW gets a CRCX that creates a connection in
LOOPBACK mode but does not specify an RTP destination address. The MGW
is expected to deduct the destination address from the first incoming
RTP packet and loop it back to its originating address.

Change-Id: I7baf827fb0c3f33e13ccbaffd37ba0eb4e20c304
Related: OS#5123

Revision eba70db6 (diff)
Added by dexter 5 months ago

MGCP_Test: fix TC_one_crcx_loopback_rtp_implicit

The testcase TC_one_crcx_loopback_rtp_implicit uses
f_TC_one_crcx_loopback_rtp, which creates the RTP flow with IPv4
addresses but since we do not send a local RTP IP address with the CRCX
to the MGW, the MGW will prefer IPv6, which means that we get an IPv6
address back while the RTP strem is IPv4 on the TTCN3 side.

Related: OS#5123
Change-Id: I80498737d5b32f28b62e0c17cce1969b54af948c

Revision 37965088 (diff)
Added by dexter 5 months ago

MGCP_Test: avoid crash in latest (TC_one_crcx_loopback_rtp_implicit)

The testcase TC_one_crcx_loopback_rtp_implicit triggers a bug in older
osmo-mgw version that eventually leads into a crash of osmo-mgw. This
also means that all tests after TC_one_crcx_loopback_rtp_implicit will
also fail. Lets move TC_one_crcx_loopback_rtp_implicit to the end of the
control section to postpone the crash to the very end of the testrun.

Change-Id: I25abf30f8c49e580c46e7a61e887bd0add9a4cd4
Related: OS#5123

History

#1 Updated by laforge 6 months ago

  • Related to Bug #5119: mgcp_client.c should not assert on unexpected codec name in the input data added

#2 Updated by laforge 6 months ago

  • Assignee set to dexter
  • Priority changed from Normal to High

In general, no matter what happens at a remote implementation that sends packets to us, we must never OSMO_ASSERT(). This is a serious problem. OSMO_ASSERT() is to guard against conditions entirely under control of our implementation (mgw in this case).

Any remote user, even a malicious one, must always be ble to send us anything without us running into OSMO_ASSERT(). If a remote user can trigger this, it's a denial of service vulnerability.

#3 Updated by laforge 6 months ago

The pcap file shows UDP packets from 10.23.24.192 to the MGW at 10.23.24.1 port 4002. Those are definitely IPv4 packets, so AF_INET.

Can you go to "frame 4" (and then print the two values tha triger the assert, e.g. libosmo-mgcp/mgcp_network.c:1272)

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
(gdb) frame 4
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
(gdb) p from_addr->u.sa.sa_family

#4 Updated by roh 6 months ago

Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch).

range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:933CE96A Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:933CE96A CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:933CE96A In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89
#2  0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n")
    at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49
#3  osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
#5  0x0804ed44 in rx_rtp (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514
#6  rtp_data_net (fd=0x81274e0, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477
#7  0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350
#8  _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378
#9  0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417
#10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406
(gdb) frame 4
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
1272    /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c: No such file or directory.
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0
(gdb) p from_addr->u.sa.sa_family
value has been optimized out
(gdb) 

#5 Updated by laforge 6 months ago

mgcp traffic is not in the pcap file.

#6 Updated by roh 6 months ago

tcpdump -s0 -w mgw3.pcap port not 22 -i any

#7 Updated by laforge 6 months ago

So the

(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0

already tells us that it's neither AF_INET (2) nor AF_INET6 (20), but either uninitialized or AF_UNSPEC, while the received packet is of course AF_INET...

#8 Updated by laforge 6 months ago

tentative fix in https://gerrit.osmocom.org/c/osmo-mgw/+/23812 but I don't understand enough of osmo-mgw to know if it's the correct way to solve or not. It seems more reasonable that after CRCX the conn->u.rtp.end.addr.u.sa.sa_family is properly initialized?

#9 Updated by pespin 6 months ago

Indeed, the problem is similar to that of "A]" in SYS#5435. That is, nano3g is starting to send data to us really quickly, immediately after receiving RAB-ASsignment Request and before answering with RAB-Assignment Response (I actually see none of those in the pcap trace I took myself...)

So, the problem is that mgw is receiving RTP traffic on the endpoint at a time where it only went through CRCX + CRCX ACK, setting up the local address, but never got a MDCX from osmo-msc (due to no Assignment Response?) to set the remote address, here the AF_UNSET.

#10 Updated by pespin 6 months ago

I also add a pcap I took myself while seeing the issue in roh's setup.

# /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:B520FAE4 Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:B520FAE4 CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:B520FAE4 In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb763f49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7632633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb76326a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]
Aborted (core dumped)
# cat /etc/osmocom/osmo-mgw.cfg
!
! MGCP configuration example
!
log file /home/root/mgw.log
  logging filter all 1
  logging color 1
  logging print category-hex 1
  logging print category 0
  logging timestamp 1
  logging print file 1
  logging level set-all debug
mgcp
  bind ip 10.23.24.1
  rtp port-range 4002 16000
  rtp bind-ip 10.23.24.1
  rtp ip-probing
  rtp ip-tos 184
  bind port 2427
  sdp audio payload number 98
  sdp audio payload name GSM
  number endpoints 512
  loop 0
  force-realloc 1
  rtcp-omit
  rtp-patch ssrc
  rtp-patch timestamp

#11 Updated by pespin 6 months ago

The related address bits which trigger the crash from the assert (addr) are set in code path:

mgcp_parse_sdp_data:
    case 'c':
        if (audio_ip_from_sdp(&rtp->addr, line) < 0)
mgcp_parse_sdp_data:
    case 'c':
        if (audio_ip_from_sdp(&rtp->addr, line) < 0)

That is, when osmo-msc/bsc sends CRCX or MDCX with SDP and "c" option set.
In the pcap trace causing the crash, it can be seen that only 1 CRCX is sent before receiving the RTP packet which triggers the assert, and this CRCX contains no "c" option.

I would simply drop that ASSERT since it's not useful at all and only causes problems.

It should be fairly simple to create a TTCN3 MGCP_Tests that triggers the crash by sending a CRCX without "c=" option to MGW, receive the CRCX ACK with the mgw-side rtp socket and send an RTP packet there. Then, with current osmo-mgw master it should crash. Then correct behavior can be checked by sending an MDCX with "c=" after sending the first RTP pkt and receiving a MDCX ACK (it wouldn't send us an ACK if it crashed beforehand). Leaving that to dexter if he feels like adding that test.

#12 Updated by dexter 6 months ago

  • Status changed from New to In Progress

#13 Updated by dexter 6 months ago

  • % Done changed from 0 to 90

I think I have fixed the problem now. The following TTCN3 test triggers the problem:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24173 MGCP_Test: test LOOPBACK with implicit destination addr

I have now dropped the OSMO_ASSERT() but I do not understand why the OSMO_ASSERT() is even there since it defeats the purpose of the code. When the call agent does not specify the destination address in loopback mode then the sa_family is of course not initialized and different from the from-address. So its indeed correct to remove the OSMO_ASSERT().

I also noticed that there is a problem with writing the sa_family, I do not understand this fully but I think it is better to copy the address as a whole anyway. Since the event happens only once and is a bit unusual, I think its a good idea to put a log statement.

See also:
https://gerrit.osmocom.org/c/osmo-mgw/+/24174 mgcp_network: fix implicit address loopback

#14 Updated by dexter 5 months ago

The patch for osmo-mgw is merged but TC_one_crcx_loopback_rtp_implicit is still failing. This needs to be checked.

#15 Updated by dexter 5 months ago

It turned out that the problem with TC_one_crcx_loopback_rtp_implicit was IPv6 related. The MGW is returning an IPv6 address when no local address is sent with the first CRCX. I have changed TC_one_crcx_loopback_rtp_implicit now that it expects IPv6 instead of IPv4.

See also: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24250

#16 Updated by dexter 5 months ago

  • Status changed from In Progress to Resolved
  • Assignee changed from dexter to roh
  • % Done changed from 90 to 100

The problems with the OSMO_ASSERT are resolved and the TTCN3 tests pass, so I think this can be closed.

(assigning this back to roh, so he can have a look himself and retest if he thinks this is necessary)

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)