Project

General

Profile

Actions

Bug #5742

closed

Gaps in RTP stream originating from BTS cause problems for downstream MGW

Added by falconia over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
11/04/2022
Due date:
% Done:

0%

Spec Reference:

Description

With current osmo-bts code the RTP stream that originates from the BTS for the call uplink intentionally pauses (i.e., the BTS keeps track of RTP time base, but intentionally chooses to not send packets) under all of the following conditions:

  • whenever radio errors on UL prevent decoding
  • whenever TCH bursts are stolen for FACCH
  • whenever the MS exercises DTX on UL

However, this design creates a problem for downstream transcoding MGW implementations that rely on this RTP stream as their timing source. Consider a configuration in which the RTP stream originating from an IP-based BTS such as sysmoBTS passes through the two required OsmoMGW instances (one for BSC, one for MSC) and then goes to an external MGW that performs transcoding for a PSTN interface. Furthermore, suppose that the interface to PSTN is SIP+RTP, as opposed to a TDM trunk, hence the transcoding function is pure software without any TDM hardware to act as a timing pacer. (In an alternative scenario, the problem at hand would remain the same if the transcoding function were to be integrated into MSC-serving OsmoMGW itself.)

In this scenario, whenever the RTP stream from the BTS is flowing, the best course of action that the transcoding MGW can take (in terms of latency minimization) is to forward each RTP packet (with the necessary transcoding) as soon as it arrives from the BTS, forwarded by OsmoMGW instances - any other course of action by the transcoding MGW, such as resynchronizing to its own time base, would only add latency while providing no benefit. But if the RTP stream originating from the BTS suddenly stops, what then?

Now let us further suppose that the G.711 RTP stream toward PSTN is expected to be continuous, without any breaks or pauses except those caused by Internet packet loss, and that the transcoding MGW is responsible for generating comfort noise toward PSTN whenever the mobile user is silent and his/her MS is exercising uplink DTX. But if the transcoding MGW uses the RTP stream from the BTS as its timing source, how will it pace its own generated comfort noise packets if this timing source goes away when the BTS decides to pause its RTP output? Even if the MGW has its own local clock and even if that clock is very good (e.g., ntpd using a local GPS timing receiver), it will be impossible to implement switching between BTS RTP stream timing and local clock timing without causing a noticeable timing discontinuity in the RTP stream going toward PSTN, which is not acceptable.

The only solution I can think of is to make the RTP stream from the BTS strictly continuous, without any intentional gaps or pauses, such that gaps will only occur as a result of IP packet loss. Packet loss events are still unavoidable, but they are an error condition that can be expected (and assumed at the system engineering level) to be infrequent, whereas uplink DTX is a fully expected normal operation condition.

Looking at the current osmo-bts code, I see two places where the RTP stream is paused intentionally, as opposed to lacking implementation:

  • In osmo-bts-trx version, in rx_tchf_fn() and rx_tchh_fn() functions right after the bfi: goto label, there is a check for DTXu - if the last valid frame received from MS was a SID, then RTP sending (ECU or BFI) is intentionally suppressed.
  • In all versions, in common/l1sap.c l1sap_tch_ind() function there is a check for (tch_ind->lqual_cb >= bts->min_qual_norm) - if this condition isn't met, RTP output will be suppressed even if the lower layer provided a frame to send out, presumably from ECU.

In the case of osmo-bts-sysmo (the version of primary interest to me given the hardware I have to work with), there is the additional complication that the code for ECU and BFI packets that already exists in the osmo-bts-trx version is missing there - but porting this code from osmo-bts-trx to osmo-bts-sysmo will be easy in comparison to the more fundamental problem of apparently conflicting intentions.

Actions #1

Updated by laforge over 1 year ago

falconia wrote:

With current osmo-bts code the RTP stream that originates from the BTS for the call uplink intentionally pauses (i.e., the BTS keeps track of RTP time base, but intentionally chooses to not send packets) under all of the following conditions:

  • whenever radio errors on UL prevent decoding
  • whenever TCH bursts are stolen for FACCH
  • whenever the MS exercises DTX on UL

However, this design creates a problem for downstream transcoding MGW implementations that rely on this RTP stream as their timing source.

Well, let me use this opportunity that osmo-bts are deployed in various production cellular networks and not a single transcoding MGW of any vendor ever had problems with this behavior. And these days those MGWs are all not TDM MGWs but pure software, with other cellular or VoIP networks on the other side.

Whatever RTP timing that is recovered from the RTP flow needs to be "held over" during RTP stream pauses. That is also what happens during any kind of packet loss, which can always happen in any case. Or what during hand-over? In that case you definitely have some gaps / interruptions in the RTP flow...

The only solution I can think of is to make the RTP stream from the BTS strictly continuous, without any intentional gaps or pauses, such that gaps will only occur as a result of IP packet loss. Packet loss events are still unavoidable, but they are an error condition that can be expected (and assumed at the system engineering level) to be infrequent, whereas uplink DTX is a fully expected normal operation condition.

I'm not certain that this is what's needed. If you're willing to contribute a clean patch, with a vty config option to enable it, we may be able to merge it.

In the case of osmo-bts-sysmo (the version of primary interest to me given the hardware I have to work with), there is the additional complication that the code for ECU and BFI packets that already exists in the osmo-bts-trx version is missing there - but porting this code from osmo-bts-trx to osmo-bts-sysmo will be easy in comparison to the more fundamental problem of apparently conflicting intentions.

the reason of this absence of the ECU/BFI is that the DSP is already doing that part in the sysmoBTS. So there's no need to port it over.

Actions #2

Updated by falconia over 1 year ago

laforge wrote in #note-1:

Well, let me use this opportunity that osmo-bts are deployed in various production cellular networks and not a single transcoding MGW of any vendor ever had problems with this behavior. And these days those MGWs are all not TDM MGWs but pure software, with other cellular or VoIP networks on the other side.

OK, so maybe the people who wrote those other MGWs are way smarter than I am, or there is some other difference between my environment and theirs - perhaps their environment is such that they are allowed to let pauses in the RTP stream propagate to the other side, with no requirement for the MGW to fill them in.

Whatever RTP timing that is recovered from the RTP flow needs to be "held over" during RTP stream pauses.

And how would you actually implement this idea in practice? I am not smart enough to come up with a way that won't make the MGW overly sensitive to slightest timing jitter on its input and won't make it introduce extra jitter of its own on the output. Suppose RTP stream packet number N arrives at time T. Packet N+1 is expected to arrive at T+20ms. Let's say at time T, as we have received, transcoded and forwarded packet N, we start a timer of exactly 20 ms, and if that timer expires without packet N+1 having arrived, then we assume that packet N+1 was either suppressed or lost and synthesize our own error muting or comfort noise packet in its place - and if packet N+1 does arrive a little late, too bad. This approach would be horrible in terms of jitter sensitivity - any slightest jitter in the path from the BTS to the MGW will cause the MGW to throw out perfectly good speech frames from the BTS and activate error handling in their place. So to increase jitter tolerance, we change the packet loss-or-suppression detection timer from 20 ms to some higher value - but what happens then? If we increase the timer to 21 ms in order to tolerate up to 1 ms of jitter from the BTS, we deliberately introduce 1 ms of jitter into MGW output stream (21 ms from transcoded packet N to locally synthesized packet N+1), if we increase the timer to 23 ms to tolerate 3 ms of input jitter, then we generate 3 ms of intentional output jitter, and so on. All bad in my opinion - instead, changing the BTS to always send out an RTP packet no matter what, be it rain or shine, is a more robust solution all around.

That is also what happens during any kind of packet loss, which can always happen in any case.

Whenever an infrequent, unintentional packet loss event happens in the path from the BTS to the MGW, I have no problem with propagating this stream disturbance to the PSTN side of my MGW, i.e., letting the stream toward PSTN experience a dropout. Rationale: packet loss events on the public Internet path from my MGW to whatever PSTN switch is on the other end of the call are far more likely than packet loss events on my internal network, thus causing the latter to behave like the former does not seem to be a problem.

Or what during hand-over? In that case you definitely have some gaps / interruptions in the RTP flow...

Whether it is packet loss or RAN handover, having a momentary disruption in the RTP stream upon these rare, infrequent events is not a problem - I am OK with having my MGW abruptly stop any in-band DTMF generation, making its comfort noise output less smooth than it would be otherwise, etc. What I do object to are sustained pauses in the RTP stream during fully expected, normal-operation conditions of uplink DTX.

I'm not certain that this is what's needed. If you're willing to contribute a clean patch, with a vty config option to enable it, we may be able to merge it.

Producing a configurable option, acting on OsmoBTS but controlled from OsmoBSC, would be beyond my current ability, at my current level of familiarity with CNI realm. fixeria previously told me that the current behavior can be considered a bug, but given the new input from laforge, it appears that I will have to produce a non-mergeable patch for the time being, to be published by me and applied locally by whoever (if there is any other such person) find themselves in a situation similar to mine.

the reason of this absence of the ECU/BFI is that the DSP is already doing that part in the sysmoBTS. So there's no need to port it over.

Now this part is really interesting! So which osmo-bts-sysmo code path is responsible then for the intentional pauses I see in the RTP stream from the BTS during times of uplink DTX? Is it the (data_ind->msgUnitParam.u8Size < 1) condition in l1if_tch_rx() in src/osmo-bts-sysmo/tch.c? If so, would you be able to tell me under exactly what conditions the DSP sends such empty payload packets instead of its own ECU? Are these empty payload packets the DSP's idea of BFI?

Actions #3

Updated by falconia over 1 year ago

I have done some work on this issue, and I have a working solution implemented for GSM FR codec, also expected to work the same for EFR once I port the codec implementation from ETSI into a proper library (move all global variables into a state structure etc). The solution I've implemented consists of a non-standard extension to RTP payload format for FR and EFR codecs, a non-mergeable patch to osmo-bts, a new library implementing Rx DTX handler functions for GSM FR as a front-end to classic libgsm, and a massive clean-up to themwi-mgw, using the new BFI protocol and the new Rx DTX handler.

The patch to osmo-bts lives here:

https://www.freecalypso.org/hg/themwi-system-sw/file/tip/osmo-patches/osmo-bts-rtp-bfi.patch

and the accompanying explanation (rationale, other considered solutions, current limitations) lives here:

https://www.freecalypso.org/hg/themwi-system-sw/file/tip/doc/RTP-BFI-extension

The new library code (Rx DTX handler for GSM FR already there, EFR library coming soon) lives here:

https://www.freecalypso.org/hg/gsm-codec-lib/

Actions #4

Updated by falconia about 1 year ago

  • Status changed from New to Resolved

This issue is an old duplicate of #5975, which has now been implemented in osmo-bts master.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)