Bug #5123
closedcoredump nightly mgw on 3g voicecall startup
100%
Description
-nightly dumped core on me trying to start a voicecall:
Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch). range must end at an odd port number, autocorrecting port (16000) to: 16001 <0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243 <0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267 <0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427 <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ... <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here! <0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:CB4F498E Failed to send dummy RTP packet. <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:CB4F498E CRCX: connection successfully created <0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:CB4F498E In loopback mode and remote address not set: allowing data from address: 10.23.24.192 Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272 backtrace() returned 9 addresses /usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d] /usr/bin/osmo-mgw() [0x8051271] /usr/bin/osmo-mgw() [0x804ed44] /usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633] /usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3] /usr/bin/osmo-mgw() [0x804acc7] /lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290] /usr/bin/osmo-mgw() [0x804adc6] Program received signal SIGABRT, Aborted. __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51 51 /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51 #1 0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89 #2 0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49 #3 osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84 #4 0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272 #5 0x0804ed44 in rx_rtp (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514 #6 rtp_data_net (fd=0x810aa80, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477 #7 0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350 #8 _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378 #9 0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417 #10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406 (gdb) quit
/etc/osmocom/osmo-mgw.cfg
mgcp bind ip 10.23.24.1 rtp port-range 4002 16000 rtp bind-ip 10.23.24.1 rtp ip-probing rtp ip-tos 184 bind port 2427 sdp audio payload number 98 sdp audio payload name GSM number endpoints 31 loop 0 force-realloc 1 rtcp-omit rtp-patch ssrc rtp-patch timestamp
osmo-mgw 1.8.1+gitr0+9ffaba7c1b-r2.18.0.24
Files
Related issues
Updated by laforge almost 3 years ago
- Related to Bug #5119: mgcp_client.c should not assert on unexpected codec name in the input data added
Updated by laforge almost 3 years ago
- Assignee set to dexter
- Priority changed from Normal to High
In general, no matter what happens at a remote implementation that sends packets to us, we must never OSMO_ASSERT(). This is a serious problem. OSMO_ASSERT() is to guard against conditions entirely under control of our implementation (mgw in this case).
Any remote user, even a malicious one, must always be ble to send us anything without us running into OSMO_ASSERT(). If a remote user can trigger this, it's a denial of service vulnerability.
Updated by laforge almost 3 years ago
The pcap file shows UDP packets from 10.23.24.192 to the MGW at 10.23.24.1 port 4002. Those are definitely IPv4 packets, so AF_INET.
Can you go to "frame 4" (and then print the two values tha triger the assert, e.g. libosmo-mgcp/mgcp_network.c:1272)
Program received signal SIGABRT, Aborted. __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51 51 /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt (gdb) frame 4 (gdb) p conn->u.rtp.end.addr.u.sa.sa_family (gdb) p from_addr->u.sa.sa_family
Updated by roh almost 3 years ago
Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch). range must end at an odd port number, autocorrecting port (16000) to: 16001 <0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243 <0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267 <0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427 <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ... <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here! <0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:933CE96A Failed to send dummy RTP packet. <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:933CE96A CRCX: connection successfully created <0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:933CE96A In loopback mode and remote address not set: allowing data from address: 10.23.24.192 Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272 backtrace() returned 9 addresses /usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d] /usr/bin/osmo-mgw() [0x8051271] /usr/bin/osmo-mgw() [0x804ed44] /usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633] /usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3] /usr/bin/osmo-mgw() [0x804acc7] /lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290] /usr/bin/osmo-mgw() [0x804adc6] Program received signal SIGABRT, Aborted. __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51 51 /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51 #1 0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89 #2 0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49 #3 osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84 #4 0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272 #5 0x0804ed44 in rx_rtp (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514 #6 rtp_data_net (fd=0x81274e0, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477 #7 0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350 #8 _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378 #9 0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417 #10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406 (gdb) frame 4 #4 0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272 1272 /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c: No such file or directory. (gdb) p conn->u.rtp.end.addr.u.sa.sa_family $1 = 0 (gdb) p from_addr->u.sa.sa_family value has been optimized out (gdb)
Updated by laforge almost 3 years ago
So the
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family $1 = 0
already tells us that it's neither AF_INET (2) nor AF_INET6 (20), but either uninitialized or AF_UNSPEC, while the received packet is of course AF_INET...
Updated by laforge almost 3 years ago
tentative fix in https://gerrit.osmocom.org/c/osmo-mgw/+/23812 but I don't understand enough of osmo-mgw to know if it's the correct way to solve or not. It seems more reasonable that after CRCX the conn->u.rtp.end.addr.u.sa.sa_family is properly initialized?
Updated by pespin almost 3 years ago
Indeed, the problem is similar to that of "A]" in SYS#5435. That is, nano3g is starting to send data to us really quickly, immediately after receiving RAB-ASsignment Request and before answering with RAB-Assignment Response (I actually see none of those in the pcap trace I took myself...)
So, the problem is that mgw is receiving RTP traffic on the endpoint at a time where it only went through CRCX + CRCX ACK, setting up the local address, but never got a MDCX from osmo-msc (due to no Assignment Response?) to set the remote address, here the AF_UNSET.
Updated by pespin almost 3 years ago
- File my_pcap.pcapng.gz my_pcap.pcapng.gz added
I also add a pcap I took myself while seeing the issue in roh's setup.
# /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg range must end at an odd port number, autocorrecting port (16000) to: 16001 <0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243 <0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267 <0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427 <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ... <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here! <0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:B520FAE4 Failed to send dummy RTP packet. <0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:B520FAE4 CRCX: connection successfully created <0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:B520FAE4 In loopback mode and remote address not set: allowing data from address: 10.23.24.192 Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272 backtrace() returned 9 addresses /usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb763f49d] /usr/bin/osmo-mgw() [0x8051271] /usr/bin/osmo-mgw() [0x804ed44] /usr/lib/libosmocore.so.17(+0xb633) [0xb7632633] /usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb76326a3] /usr/bin/osmo-mgw() [0x804acc7] /lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290] /usr/bin/osmo-mgw() [0x804adc6] Aborted (core dumped)
# cat /etc/osmocom/osmo-mgw.cfg ! ! MGCP configuration example ! log file /home/root/mgw.log logging filter all 1 logging color 1 logging print category-hex 1 logging print category 0 logging timestamp 1 logging print file 1 logging level set-all debug mgcp bind ip 10.23.24.1 rtp port-range 4002 16000 rtp bind-ip 10.23.24.1 rtp ip-probing rtp ip-tos 184 bind port 2427 sdp audio payload number 98 sdp audio payload name GSM number endpoints 512 loop 0 force-realloc 1 rtcp-omit rtp-patch ssrc rtp-patch timestamp
Updated by pespin almost 3 years ago
The related address bits which trigger the crash from the assert (addr) are set in code path:
mgcp_parse_sdp_data: case 'c': if (audio_ip_from_sdp(&rtp->addr, line) < 0) mgcp_parse_sdp_data: case 'c': if (audio_ip_from_sdp(&rtp->addr, line) < 0)
That is, when osmo-msc/bsc sends CRCX or MDCX with SDP and "c" option set.
In the pcap trace causing the crash, it can be seen that only 1 CRCX is sent before receiving the RTP packet which triggers the assert, and this CRCX contains no "c" option.
I would simply drop that ASSERT since it's not useful at all and only causes problems.
It should be fairly simple to create a TTCN3 MGCP_Tests that triggers the crash by sending a CRCX without "c=" option to MGW, receive the CRCX ACK with the mgw-side rtp socket and send an RTP packet there. Then, with current osmo-mgw master it should crash. Then correct behavior can be checked by sending an MDCX with "c=" after sending the first RTP pkt and receiving a MDCX ACK (it wouldn't send us an ACK if it crashed beforehand). Leaving that to dexter if he feels like adding that test.
Updated by dexter almost 3 years ago
- % Done changed from 0 to 90
I think I have fixed the problem now. The following TTCN3 test triggers the problem:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24173 MGCP_Test: test LOOPBACK with implicit destination addr
I have now dropped the OSMO_ASSERT() but I do not understand why the OSMO_ASSERT() is even there since it defeats the purpose of the code. When the call agent does not specify the destination address in loopback mode then the sa_family is of course not initialized and different from the from-address. So its indeed correct to remove the OSMO_ASSERT().
I also noticed that there is a problem with writing the sa_family, I do not understand this fully but I think it is better to copy the address as a whole anyway. Since the event happens only once and is a bit unusual, I think its a good idea to put a log statement.
See also:
https://gerrit.osmocom.org/c/osmo-mgw/+/24174 mgcp_network: fix implicit address loopback
Updated by dexter almost 3 years ago
The patch for osmo-mgw is merged but TC_one_crcx_loopback_rtp_implicit is still failing. This needs to be checked.
Updated by dexter almost 3 years ago
It turned out that the problem with TC_one_crcx_loopback_rtp_implicit was IPv6 related. The MGW is returning an IPv6 address when no local address is sent with the first CRCX. I have changed TC_one_crcx_loopback_rtp_implicit now that it expects IPv6 instead of IPv4.
See also: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24250
Updated by dexter almost 3 years ago
- Status changed from In Progress to Resolved
- Assignee changed from dexter to roh
- % Done changed from 90 to 100
The problems with the OSMO_ASSERT are resolved and the TTCN3 tests pass, so I think this can be closed.
(assigning this back to roh, so he can have a look himself and retest if he thinks this is necessary)