Gaps in RTP stream originating from BTS cause problems for downstream MGW
With current osmo-bts code the RTP stream that originates from the BTS for the call uplink intentionally pauses (i.e., the BTS keeps track of RTP time base, but intentionally chooses to not send packets) under all of the following conditions:
- whenever radio errors on UL prevent decoding
- whenever TCH bursts are stolen for FACCH
- whenever the MS exercises DTX on UL
However, this design creates a problem for downstream transcoding MGW implementations that rely on this RTP stream as their timing source. Consider a configuration in which the RTP stream originating from an IP-based BTS such as sysmoBTS passes through the two required OsmoMGW instances (one for BSC, one for MSC) and then goes to an external MGW that performs transcoding for a PSTN interface. Furthermore, suppose that the interface to PSTN is SIP+RTP, as opposed to a TDM trunk, hence the transcoding function is pure software without any TDM hardware to act as a timing pacer. (In an alternative scenario, the problem at hand would remain the same if the transcoding function were to be integrated into MSC-serving OsmoMGW itself.)
In this scenario, whenever the RTP stream from the BTS is flowing, the best course of action that the transcoding MGW can take (in terms of latency minimization) is to forward each RTP packet (with the necessary transcoding) as soon as it arrives from the BTS, forwarded by OsmoMGW instances - any other course of action by the transcoding MGW, such as resynchronizing to its own time base, would only add latency while providing no benefit. But if the RTP stream originating from the BTS suddenly stops, what then?
Now let us further suppose that the G.711 RTP stream toward PSTN is expected to be continuous, without any breaks or pauses except those caused by Internet packet loss, and that the transcoding MGW is responsible for generating comfort noise toward PSTN whenever the mobile user is silent and his/her MS is exercising uplink DTX. But if the transcoding MGW uses the RTP stream from the BTS as its timing source, how will it pace its own generated comfort noise packets if this timing source goes away when the BTS decides to pause its RTP output? Even if the MGW has its own local clock and even if that clock is very good (e.g., ntpd using a local GPS timing receiver), it will be impossible to implement switching between BTS RTP stream timing and local clock timing without causing a noticeable timing discontinuity in the RTP stream going toward PSTN, which is not acceptable.
The only solution I can think of is to make the RTP stream from the BTS strictly continuous, without any intentional gaps or pauses, such that gaps will only occur as a result of IP packet loss. Packet loss events are still unavoidable, but they are an error condition that can be expected (and assumed at the system engineering level) to be infrequent, whereas uplink DTX is a fully expected normal operation condition.
Looking at the current osmo-bts code, I see two places where the RTP stream is paused intentionally, as opposed to lacking implementation:
- In osmo-bts-trx version, in rx_tchf_fn() and rx_tchh_fn() functions right after the bfi: goto label, there is a check for DTXu - if the last valid frame received from MS was a SID, then RTP sending (ECU or BFI) is intentionally suppressed.
- In all versions, in common/l1sap.c l1sap_tch_ind() function there is a check for (tch_ind->lqual_cb >= bts->min_qual_norm) - if this condition isn't met, RTP output will be suppressed even if the lower layer provided a frame to send out, presumably from ECU.
In the case of osmo-bts-sysmo (the version of primary interest to me given the hardware I have to work with), there is the additional complication that the code for ECU and BFI packets that already exists in the osmo-bts-trx version is missing there - but porting this code from osmo-bts-trx to osmo-bts-sysmo will be easy in comparison to the more fundamental problem of apparently conflicting intentions.