Project

General

Profile

Actions

Bug #6074

closed

Current master as of 2023-06-25 broken in my environment

Added by falconia 10 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
06/27/2023
Due date:
% Done:

0%

Spec Reference:

Description

Let me preface this report by saying that it is certainly not up to the quality standard for proper bug reports, and as a developer I am expected to do better. However, in my defense I have to point out that I have only one working BTS, and when I reached the limit of how much downtime I could allow on my production network, I had to abort the investigation and go back to a stable build.

On 2023-06-25 (Themyscira time zone, GMT-08:00 all year round, no DST) I made an attempt to update my production network (specifically the software that runs on my Slackware Linux server, namely OsmoHLR, OsmoSTP, OsmoMSC, OsmoBSC and one OsmoMGW for both MSC and BSC) from the now-elderly 2021-11 release to what was current master as of this attempt two days ago. (The osmo-bts-sysmo process running on the sysmoBTS box is already up to recent master, keeping up with changes I've been submitting and getting mainlined in OsmoBTS, but the other processes running on the Slackware server are a different story.) Once I got the new software version up and running, I saw this behavior: the two test MS I had on hand at the time registered successfully (good LU), USSD also worked as expected, but voice calls were completely broken, both in my production config with external MNCC and even when I switched to internal MNCC for a test. I never tested SMS.

My first test call was made in the production-style setup with ThemWi external MNCC, using themwi-test-mtc command line program that makes a single-leg (MT only, no MO leg) test call to a single connected MS. Result: the test phone started ringing and then immediately stopped some fraction of a second later; the output from OsmoMSC on the MNCC socket was MNCC_CALL_CONF_IND, then MNCC_RTP_CREATE, and then immediately MNCC_REL_IND. I looked in syslog (that's how I get logs from OsmoCNI components) - the excerpt corresponding to this test call is attached in log-frag1.txt. The logging verbosity levels were unchanged from my previously-stable production setup that ran 2021-11 release. These lines drew my attention: line numbers 9, 10, 12 and 13 in the log-frag1.txt attachment.

For the next experiment I took themwi-system-sw out of the equation by switching to internal MNCC (thereby reducing the setup to pure Osmocom, without any ThemWi components) and dialing a test call from one MS to another. This time the destination phone never rang at all, and the calling phone indicated call failure right away. Seeing the same errors in syslog as in the previous case of ThemWi external MNCC, I started looking closer at OsmoMGW and MGCP. I enabled higher logging verbosity in OsmoMGW, and I ran tcpdump on udp port 2427 (MGCP). log-frag2.txt and mgcp-debug.pcap attachments correspond to this test run. Looking at the pcap in particular, something definitely looks amiss, beginning with the second captured packet where the SDP response from MGW throws the two codecs together into a single invalid rtpmap line - but I don't know enough about this protocol to tell if the bug is on OsmoMSC or OsmoMGW side.

After the above experiment I reached the limit of how long I could keep my production network down for debug chases, and I switched to 2023-02 stable release for production use for the time being. I accomplished one goal of updating from the now-elderly 2021-11 release, but I didn't get all the way to current master. In my current setup osmo-hlr and osmo-msc have some local patches, published as branch falconia/production in the respective repos, but if you look at those local patches, you will see that they are very minor. All other components and libraries are stock 2023-02 release.

Back to debugging the issue with current master, I have to pause until I am able to acquire a second working BTS. I would need to set up a separate test network, separate from Themyscira production network, and I will need another BTS for it.


Files

log-frag1.txt log-frag1.txt 1.78 KB falconia, 06/27/2023 02:11 PM
log-frag2.txt log-frag2.txt 26.4 KB falconia, 06/27/2023 02:11 PM
mgcp-debug.pcap mgcp-debug.pcap 3.16 KB falconia, 06/27/2023 02:11 PM

Related issues

Related to OsmoMSC - Bug #6080: ERROR ptmap contains illegal mapping: codec=4294967295Resolved06/29/2023

Actions
Related to OsmoMGW - Bug #6081: osmo-mgw fails to parse the semicolon separator in MGCP header like "L: a: GSM-EFR;GSM"Resolvedneels06/30/2023

Actions
Actions #1

Updated by osmith 10 months ago

AFAICT this relates to the codecs improvements that were added to OsmoMSC in 2023-03. With patches around https://gerrit.osmocom.org/c/osmo-msc/+/30126, it seems OsmoMSC should now be able to send multiple codecs in the CRCX towards the MGW. From the pcap, packet 1 (MGCP CRCX):

Codecs (a): GSM-EFR;GSM

Looking at the pcap in particular, something definitely looks amiss, beginning with the second captured packet where the SDP response from MGW throws the two codecs together into a single invalid rtpmap line - but I don't know enough about this protocol to tell if the bug is on OsmoMSC or OsmoMGW side.

Packet 2 (MGCP OK):

Media Attribute (a): rtpmap:96 GSM-EFR;GSM

Looks like OsmoMGW isn't parsing the CRCX correctly and generates this invalid OK.

neels, dexter: can you take a look, is the support for multiple codecs incomplete in OsmoMGW?

From log-frag2.txt:

Jun 25 19:15:22 sentinel OsmoMSC: <0024> mgcp_client.c:130 ptmap contains illegal mapping: codec=4294967295 maps to pt=96

It seems OsmoMSC passes "GSM-EFR;GSM" to map_str_to_codec() in libosmo-mgcp-client (from osmo-mgw.git) and doesn't check its return code, leading to this error message later on:

https://gerrit.osmocom.org/c/osmo-mgw/+/33527

Actions #2

Updated by laforge 10 months ago

  • Assignee set to osmith

I do recall seeing plenty of those odd "codec=4294967295 maps to pt=96" messages in the logs during our exhibit/demo at the HAM radio 2023 conference over the weekend. This didn't prevent the (FR) calls from working between multipole MS in the same osmo-* network (using internal MNCC).

Actions #3

Updated by laforge 10 months ago

  • Status changed from New to In Progress
  • Assignee changed from osmith to dexter
Actions #4

Updated by neels 10 months ago

Hi,

the below kind of prose is not a good way to report a bug.
Please stick to the actual facts relevant to the bug and reproduction recipes.
It's good to give context, but either keep that short, or put it at the end.

The rationale here is to allow a reader to quickly understand the issue.
It is super annoying to have to filter out noise when you are time pressed.

Deliberate noise is not only rude, particularly in the issue tracker it is unacceptable.

Thanks for being helpful in that regard!

falconia wrote:

Let me preface this report by saying that it is certainly not up to the quality standard for proper bug reports, and as a developer I am expected to do better. However, in my defense I have to point out that I have only one working BTS, and when I reached the limit of how much downtime I could allow on my production network, I had to abort the investigation and go back to a stable build.

Actions #5

Updated by neels 10 months ago

ah what a coincidence, i just reported #6080 about that 'ptmap contains illegal mapping: codec=4294967295 maps to pt=96'.
Oh and I missed that we already merged a fix for it.

I am also currently busy trying to get a voice call to work with the current master versions,
on 3G hardware other than what I used when implementing the codec patches.
I'm battling with the modems, but should have some feedback on that soon...

Actions #6

Updated by neels 10 months ago

  • Related to Bug #6080: ERROR ptmap contains illegal mapping: codec=4294967295 added
Actions #7

Updated by neels 10 months ago

I can reproduce the problem:

20230629220457698 DLMGCP ERROR Failed to parse SDP parameter, can't parse codec in rtpmap: "a=rtpmap:96 AMR;AMR-WB" (mgcp_client.c:397)
20230629220457698 DLMGCP ERROR MGCP_CONN(RTP_TO_CN)[0x559e944dde10]{ST_CRCX_RESP}: MGW/CRCX: Cannot parse CRCX response (mgcp_client_fsm.c:291)
20230629220457698 DLMGCP ERROR MGW(mgw) Empty endpoint name, can not generate MGCP message (mgcp_client.c:1444)                                

It is basically that libosmo-mgcp-client flies off the hinge when there is more than one codec in the MGCP header like "AMR;AMR-WB" -- in the header, not the SDP part. The codecs in the header are pretty insignificant, because we rely on the codecs being set up by the SDP part.

A workaround for you, falconia could be to configure osmo-msc.cfg and/or osmo-bsc.cfg to exclude one of the two codecs you are seeing. In your pcap I see "a: GSM-EFR;GSM". So now, if you forbid GSM-EFR aka FR2, chances are that the codec filter leaves only GSM, hence libosmo-mgcp-client can parse the string.

The problem is that our osmo-mgw code simply cannot parse multiple codecs as specified in RFC2705 in the "L: a:xx" header:

" * The preferred type of compression algorithm, encoded as the
keyword "a", followed by a colon and a character string. If the
Call Agent specifies a list of values, these values will be
separated by a semicolon.
"

It expects a single string and treats "GSM-EFR;GSM" or in my case "AMR;AMR-WB" as a single codec that it does not know.

I'm still figuring out whether we should

  • not list more than one codec there anyway, because what possibly is the use of that.
    All important codecs bits are done via SDP.
  • And/Or whether libosmo-mgcp-client should ignore anything past the first codec.
  • Or whether libosmo-mgcp-client should correctly parse them, put them in the ptmap, and then let that be overwritten by SDP.
Actions #8

Updated by neels 10 months ago

It's a bit more intricate than I thought, and it's an osmo-mgw bug.
I filed #6081 for it.

Actions #9

Updated by neels 10 months ago

  • Related to Bug #6081: osmo-mgw fails to parse the semicolon separator in MGCP header like "L: a: GSM-EFR;GSM" added
Actions #10

Updated by falconia 10 months ago

neels wrote in #note-7:

A workaround for you, falconia could be to configure osmo-msc.cfg and/or osmo-bsc.cfg to exclude one of the two codecs you are seeing. In your pcap I see "a: GSM-EFR;GSM". So now, if you forbid GSM-EFR aka FR2, chances are that the codec filter leaves only GSM, hence libosmo-mgcp-client can parse the string.

This workaround could potentially work for a functionally-limited test network (which is currently a moot point as I have yet to acquire the necessary second BTS), but I won't be able to migrate ThemWi prod network to a post-SDP-merge OsmoCNI version until all bugs (which I won't discover until I do get that second BTS, build that separate test network, and do a ton of testing) are fixed properly. On the prod network supporting both FRv1 and EFR is an absolute must: those phones that support EFR need to be connected in EFR for better voice quality, but super-old phones that support only FRv1 must also be supported by the network.

I see that you got the actual bug isolated in #6081 and already got a fix in review, so it will likely be fixed before I acquire the needed hw to proceed further on my end.

Actions #11

Updated by neels 10 months ago

I see that you got the actual bug isolated in #6081 and already got a fix in review, so it will likely be fixed before I acquire the needed hw to proceed further on my end.

Agreed =)

Actions #12

Updated by dexter 9 months ago

  • Assignee changed from dexter to neels
Actions #13

Updated by neels 9 months ago

  • Status changed from In Progress to Feedback
  • Assignee changed from neels to falconia

I assume this issue is resolved?

Actions #14

Updated by falconia 9 months ago

  • Status changed from Feedback to Resolved

I won't be able to retest current master until I set up a separate test network, which is probably still some 2-3 months away - I got some extra BTS hardware, but it will be easier for me to set up this test network after I move my current production CNI instance to a different server, which is one of my work items in addition to SMSC. But I am OK with closing the present ticket - once I do reach the point of being able to test current master once again, if I still have issues, I'll open another ticket.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)