Project

General

Profile

Bug #4467

bad voice quality in current omo-bts-trx master

Added by laforge 14 days ago. Updated 12 days ago.

Status:
In Progress
Priority:
Urgent
Assignee:
Category:
osmo-bts-trx
Target version:
-
Start date:
03/20/2020
Due date:
% Done:

50%

Spec Reference:

Description

As has been confirmed by pespin and myself, voice quality is very problematic in current osmo-trx + osmo-bts-master. It works within 1m of the SDR (USRP B2xx in this case), but as soon as the MS are moved further away, considerable audible codec artefacts are observed.

  • the radio link should not be that bad in the first plcae. RxLev UL/DL are reported better than -70dBm
  • even if there are problems on the radio link, we would expect the bad frames to be dropped and the ECU to smooth over the drop-outs. WE would expect interrupted voice, or periods of quiet, but not beepy chirpy crazy codec artefacts.

What's strange is the bad RxQual in uplink, typically between 4..7.

BTS 0, TRX 0, Timeslot 1, Lchan 0: Type TCH_F
  Connection: 1, State: ESTABLISHED
  BS Power: 3 dBm, MS Power: 16 dBm
  Channel Mode / Codec: SPEECH_V1
  No Subscriber
  Bound IP: 127.0.0.1 Port 16412 RTP_TYPE2=0 CONN_ID=0
  Conn. IP: 192.168.103.248 Port 4180 RTP_TYPE=3 SPEECH_MODE=0x00
  Measurement Report:
    Flags:  
    MS Timing Offset: 0
    L1 MS Power: 0 dBm, Timing Advance: 0
    RXL-FULL-dl:  -71 dBm, RXL-SUB-dl:  -72 dBm RXQ-FULL-dl: 0, RXQ-SUB-dl: 0
    RXL-FULL-ul:  -72 dBm, RXL-SUB-ul:  -72 dBm RXQ-FULL-ul: 5, RXQ-SUB-ul: 4
BTS 0, TRX 0, Timeslot 2, Lchan 0: Type TCH_F
  Connection: 1, State: ESTABLISHED
  BS Power: 3 dBm, MS Power: 16 dBm
  Channel Mode / Codec: SPEECH_V1
  Subscriber:
    IMSI: 901700000025130
    TMSI: 0x2100bc7e
    Use count: 1
  Bound IP: 127.0.0.1 Port 16414 RTP_TYPE2=0 CONN_ID=0
  Conn. IP: 192.168.103.248 Port 4188 RTP_TYPE=3 SPEECH_MODE=0x00
  Measurement Report:
    Flags: DLinval 
    RXL-FULL-ul:  -74 dBm, RXL-SUB-ul:  -75 dBm RXQ-FULL-ul: 7, RXQ-SUB-ul: 7

So my working theory is that there's something broken in our decoder/receiver that causes lots of BER (and hence bad RxQual), and then there's a second bug which causes the ECU not to work as expected.


Related issues

Related to OsmoBTS - Bug #4466: " N(S) sequence error" when operating osmo-bts-trxRejected03/20/2020

Related to OsmoBTS - Bug #4465: Incorrect number of SUB measurements detectedNew03/20/2020

Related to libosmo-abis - Bug #4464: "osmo_rtp_socket_poll(): ERROR!" messages during normal osmo-bts usageNew03/20/2020

Related to OsmoTRX - Bug #4468: "RSSI offset" default of 0 is not usefulIn Progress03/21/2020

History

#1 Updated by laforge 14 days ago

  • Related to Bug #4466: " N(S) sequence error" when operating osmo-bts-trx added

#2 Updated by laforge 14 days ago

  • Related to Bug #4465: Incorrect number of SUB measurements detected added

#3 Updated by laforge 14 days ago

  • Related to Bug #4464: "osmo_rtp_socket_poll(): ERROR!" messages during normal osmo-bts usage added

#4 Updated by ipse 13 days ago

I can confirm that we've been observing this issue for some time now.

#5 Updated by laforge 13 days ago

I tried with osmo-trx(master) and osmo-bts(1.2.0/latest), and the problem is the same. So either it's osmo-trx' fault, or osmo-bts had the problem already before.

#6 Updated by laforge 13 days ago

ipse wrote:

I can confirm that we've been observing this issue for some time now.

What did you do about it? What is the last version you have not observed it with? Any related bug reports?

#7 Updated by laforge 13 days ago

  • % Done changed from 0 to 10

osmo-trx(master) and osmo-bts(1.1.0) seem much better with otherwise exactly identical test situation.

I can still get to audible codec artefacts, but only if I move away much further from the BTS.

#8 Updated by laforge 13 days ago

I bisected the range 1.1.0 ... 1.2.0 and ended up with:

git bisect start
# bad: [ee8f4b0a91dff21c2d7d8361afb6b41637fcdc29] Bump version: 1.1.0.95-24e7-dirty → 1.2.0
git bisect bad ee8f4b0a91dff21c2d7d8361afb6b41637fcdc29
# good: [ca8aa071271eeef2011afa4764df72b811aa61f3] Bump version: 1.0.1 → 1.1.0
git bisect good ca8aa071271eeef2011afa4764df72b811aa61f3
# good: [b378fccef11b41009f224ded1f42bdbf254eee6d] Fix common misspellings and typos
git bisect good b378fccef11b41009f224ded1f42bdbf254eee6d
# good: [c693067b7e99a643da673cb3e2a36162cbd0f59c] Introduce BTS feature BTS_FEAT_MS_PWR_CTRL_DSP
git bisect good c693067b7e99a643da673cb3e2a36162cbd0f59c
# bad: [e3a45309198a44322572dc136bcf4a3e6ed99523] bts-trx: Drop low layer MS Power Control Loop algo
git bisect bad e3a45309198a44322572dc136bcf4a3e6ed99523
# good: [595eb576fc370d54f1a137616b16a5748389a427] osmo-bts-trx/trx_if.c: fix: NOPE.ind also contains C/I field
git bisect good 595eb576fc370d54f1a137616b16a5748389a427
# good: [a070e863e21443aabbc8a39f87538d4ac7cdaad7] pcuif_proto.h: extend RACH.ind with TRX and timeslot number fields
git bisect good a070e863e21443aabbc8a39f87538d4ac7cdaad7
# good: [0d8cd8ce39557f1aeb8e4174cfc01194573fdb92] scheduler_trx.c: cast ptrdiff value to fix printf format
git bisect good 0d8cd8ce39557f1aeb8e4174cfc01194573fdb92
# first bad commit: [e3a45309198a44322572dc136bcf4a3e6ed99523] bts-trx: Drop low layer MS Power Control Loop algo

this wuld be:

e3a45309198a44322572dc136bcf4a3e6ed99523 is the first bad commit
commit e3a45309198a44322572dc136bcf4a3e6ed99523
Author: Pau Espin Pedrol <pespin@sysmocom.de>
Date:   Thu Nov 14 16:58:10 2019 +0100

    bts-trx: Drop low layer MS Power Control Loop algo

    Let's drop it instead of having code duplication from common code in a
    lower layer, and maintain only the one in l1sap for all BTS models.
    As a result, osmo-bts-trx loses feature BTS_FEAT_MS_PWR_CTRL_DSP and
    will only be able to use "ms-power-control osmo" in VTY, which will be
    enabled by default (meaning: change of behavior, now MS Power Control is
    enabled by default in osmo-bts-trx and can only by disabled by BSC).
    Old bts-trx specific VTY command "(no) osmotrx ms-power-loop" is marked
    as deprecated but still working for more usual case (1 TRX configured)
    to avoid breaking backward compatibility.

    TA low level loop is still kept in loops.c and will be moved to l1sap at
    some point too.

    Related: OS#1851
    Change-Id: I0d8b0c981d9ead91d93999df6e45fb06e426aeb9

:040000 040000 7d496033f0f68997d0cf5030bd48665b05de3a73 2a1d7b4e07981631c56284dc3f5e49d1516b61e6 M      include
:040000 040000 954f8f14451c3ae79c7200668e486d68db9645e0 0566c009862c071dcd78edbc51c44da194882e96 M      src

#9 Updated by laforge 13 days ago

with broken veersion (after above-mentioned commit), TEMS shows TPower starts at ~14 and then goes down to 0.
In the BTS side logs I can see:

<0004> l1_if.c:601 959837/723/21/17/41 RX UL measurement for (bts=0,trx=0,ts=1,ss=0) fn=959837 chan_nr=0x09 MS pwr=2dBm rssi=-74.0 dBFS ber=2.63% (12/456 bits) L1_ta=0 rqd_ta=0 toa256=99
<0004> measurement.c:348 959837/723/21/17/41 (bts=0,trx=0,ts=1,ss=0) adding measurement (is_sub=0), num_ul_meas=25, fn_mod=21
<0004> l1_if.c:601 959837/723/21/17/41 RX UL measurement for (bts=0,trx=0,ts=2,ss=0) fn=959837 chan_nr=0x0a MS pwr=6dBm rssi=-68.0 dBFS ber=0.00% (0/378 bits) L1_ta=0 rqd_ta=0 toa256=51
<0004> measurement.c:348 959837/723/21/17/41 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=29, fn_mod=21

last working version (one commit earlier) shows that TxPower on the TEMS phone stays at 16:

<0004> l1_if.c:601 1020608/769/04/47/36 RX UL measurement for (bts=0,trx=0,ts=1,ss=0) fn=1020608 chan_nr=0x09 MS pwr=16dBm rssi=-58.0 dBFS ber=0.00% (0/378 bits) L1_ta=0 rqd_ta=0 toa256=9
<0004> measurement.c:348 1020608/769/04/47/36 (bts=0,trx=0,ts=1,ss=0) adding measurement (is_sub=1), num_ul_meas=14, fn_mod=56
<0004> l1_if.c:601 1020608/769/04/47/36 RX UL measurement for (bts=0,trx=0,ts=2,ss=0) fn=1020608 chan_nr=0x0a MS pwr=16dBm rssi=-64.0 dBFS ber=0.00% (0/378 bits) L1_ta=0 rqd_ta=0 toa256=96
<0004> measurement.c:348 1020608/769/04/47/36 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=11, fn_mod=56

#10 Updated by laforge 13 days ago

  • Assignee changed from sysmocom to laforge
  • % Done changed from 10 to 30
So, it looks like this:
  • before above-mentioned commit, we have
    • osmo-bts-trx sets BTS_FEAT_MS_PWR_CTRL_DSP
    • common/bts.c:bts_trx_init() does not eanble the upper-layer power control (trx->ms_pwr_ctl_soft remains false)
    • plink->u.osmotr.trx_ms_power_loop is initialized to false in osmo-bts-trx/main.c
    • hence, neither of the two power control loops is active, and the MS stays at its initial power level of '16'
  • after the above-mentioned commit, we have
    • osmo-bts-trx no longer sets BTS_FEAT_MS_PWR_CTRL_DSP
    • common/bts.c:bts_trx_init() does enable the upper-layer power control loop (trx->ms_pwr_ctl_soft = true)
    • hence, the common power control loop is active and the MS transmit power is reduced down to 0/2/4 in ordder to reach the -75 dBm target

On a "broken" new version, if one sets uplink-power-target -50 via the VTY, the same quality as in previous versions can be achieved.

So what do we learn or re-iterate from this:
  • uncalibrated SDR hardware doesn't provide absolute dBm values, but "dB full-scale". So if we receive a signal at "-70 dBFS", we have no clue about its actual receive level, unless there's some underlying magic in the driver or osmo-trx which would contain calibration tables to convert dBFS into dBm. This is one of my main complaint about GPSDR hardware baed BTSs, and any vendor going down that route would have to implement calibration tables and a factory calibration procedure.
  • we desparately need automatic voice quality testing over less-than-ideal radio channels (e.g. osmo-gsm-tester setup already has quite a bit of attenuation exactly for that purpose)

Unrelated to the above, I still think we do have a separate problem related to codec artefacts on bad channels. Changing the uplink target by ~25 dB just makes the same problem appear at much larger distances. If there is lost audio, it should just be gaps/silence, and not funny tones.

#11 Updated by laforge 13 days ago

  • Related to Bug #4468: "RSSI offset" default of 0 is not useful added

#12 Updated by laforge 13 days ago

Rather than changing the uplink power control loop target in osmo-bts-trx, we should probably simply use osmo-trx rssi-offset vty command. Its default should probably be a conservative expected value for the given SDR used. The default then would no longer be 0, but a radio-specific (or possibly radio + band specific) constant.

Anyone wanting to have proper performance out of GPSDR devices would have to implement rx power calibration tables at exactly that point. I created OS#4468 for tracking that.

#13 Updated by Hoernchen 13 days ago

Can't we just lazily compare spectrograms of the voice data to find major deviations from expected patterns?

#14 Updated by ipse 13 days ago

laforge wrote:

ipse wrote:

I can confirm that we've been observing this issue for some time now.

What did you do about it? What is the last version you have not observed it with? Any related bug reports?

MS power control was always working well for us - we implemented RSSI offset in osmo-trx a long time ago to make it work a long time ago. So while we had complaints from users once in a while about high-pitch chirping in bad coverage situations, we failed to reproduce it in a lab reliably, couldn't find the source of this and didn't file a bug.

Our users then stopped reporting this and we stopped looking into this. We've got new users reporting the same issue with chirping noise at bad coverage again in our new installation just a few days ago and we haven't had time to look into this yet.

So I should say that the chirping noise issue exists for at least 2-3 years. Maybe it even always was there but we didn't notice it.

#15 Updated by laforge 12 days ago

On Sat, Mar 21, 2020 at 08:25:04PM +0000, Hoernchen [REDMINE] wrote:

Can't we just lazily compare spectrograms of the voice data to find major deviations from expected patterns?

The problem has many different parts to it. Comparing the actual voice content is unlikely
going to be the problem - you can presumably simply use the PESQ reference code.

Number one problem is to get to the PCM audio data of cellular modems or phones. See
https://laforge.gnumonks.org/blog/20170902-cellular_modems-voice/

We have built a desging and assembled a borad with four modems and an XMOS chip. I
spent some time in prototyping a PCM audio slave, but only as an early PoC in the
xmos simulator, never found the time to actually implement the firmware for that.

#16 Updated by laforge 12 days ago

  • % Done changed from 30 to 50
Ok, so there are multiple steps to solve this:
  1. have a proper uplink target by using rssi_offset matching your device (#4468)
  2. make use of the NOPE.ind from TRX to make sure even if no burst was decoded, we driver the TDMA decoder etc (#4661)
  3. properly use the Error Concealment Unit by setting the bfi_flag (bad frame indication) not only if the convolutional decoder returns < 0 (CRC8 error of class 1 bits), but also if the bit-error rate reaches a certain threshold.

Only the last part should be tracked here in this ticket.

I did some simplistic experiments and it seems like there's
  • no significant improvement in drop-out voice quality if we set bfi_flag at > 50% bit errors
  • a significant improvement in drop-out voice quality if we set bfi_flag at > 20% bit errors

#17 Updated by ipse 12 days ago

laforge Do you assume here that the issue with chirping is in the uplink?

I may be wrong but I think it somehow might be on the downlink, e.g. if there is no voice data to transmit or something like that. I have only a vague recollection of this but I think we heard the chirping when we called a MOH (music on hold), i.e. when we knew that the RTP stream coming from the BSC is good.

But again - it's been a while and I may be wrong, just sharing here in case it helps.

#18 Updated by laforge 12 days ago

On Sun, Mar 22, 2020 at 12:18:18PM +0000, wrote:

laforge Do you assume here that the issue with chirping is in the uplink?

the issue observed here so far is clearly in uplink.

I may be wrong but I think it somehow might be on the downlink, e.g. if there is no voice data to transmit or something like that. I have only a vague recollection of this but I think we heard the chirping when we called a MOH (music on hold), i.e. when we knew that the RTP stream coming from the BSC is good.

that might be another issue then.

#19 Updated by laforge 12 days ago

See also https://gerrit.osmocom.org/c/osmo-bts/+/17565 for a slightly TCH/H related fix

#20 Updated by ipse 12 days ago

laforge wrote:

On Sun, Mar 22, 2020 at 12:18:18PM +0000, wrote:

laforge Do you assume here that the issue with chirping is in the uplink?

the issue observed here so far is clearly in uplink.

I might not completely understand your test setup but it looks like you're making a phone-to-phone call and observe the chirping? In this case, could it be that the uplink issues cause e.g. RTP loss which then causes the chirping. Or have you tried decoding RTP stream in the uplink and the chirping is present there as well?

#21 Updated by keith 12 days ago

If there is lost audio, it should just be gaps/silence, and not funny tones.

I can't say I've heard anything like "funny tones", except for with AMR codec in OA mode getting a BE stream, but then that's a permanent thing, no possibility to hear audio. Nothing to do with this anyway.

I usually am working with one soft SIP phone/ATA and one GSM. I wonder if you did this, would you find that one of your phones is responible for the chirping sounds (decoding weirdness in the phone?) Assuming they are not the same model. Or try other models? That might explain @ipse's reports from the field that could not be reproduced?

I also feel I have to query something in the original description of this ticket, it's bugging me because I really feel like I am misunderstanding something;

It works within 1m of the SDR, but as soon as the MS are moved further away, considerable audible codec.....

It terms of SDR in the lab, I have only ever used Ettus N210 and LimeSDR-mini, but I have never seen anything like this. Even at the minimum possible MS TX power, I can't imagine that moving one meter away would be an RF problem, even without any SMA antenna attached and no rx-gain, the WBX daughter board in my Ettus N210 would receive more than sufficient signal from the MS at 1m.

I started writing the above yesterday, then I was checking my understand of GSM power levels and I think IIUC, the minimum power level is GSM900 is 5dBM which is more than in GSM1800, right? That might explain what I observe.

Also, what ipse says makes sense to me - to capture and decode the uplink RTP?

#22 Updated by laforge 12 days ago

On Sun, Mar 22, 2020 at 01:26:50PM +0000, wrote:

I might not completely understand your test setup but it looks like you're making a phone-to-phone call and observe the chirping? In this case, could it be that the uplink issues cause e.g. RTP loss which then causes the chirping. Or have you tried decoding RTP stream in the uplink and the chirping is present there as well?

it is mobile-to-mobile, but one MS is close to the BTS and the other one is further apart. This
way I can see the uplink deteriorating on the "far" MS.

#23 Updated by laforge 12 days ago

Hi Keith,

On Sun, Mar 22, 2020 at 06:29:36PM +0000, keith [REDMINE] wrote:

I usually am working with one soft SIP phone/ATA and one GSM. I wonder if you did this, would you find that one of your phones is responible for the chirping sounds (decoding weirdness in the phone?) Assuming they are not the same model. Or try other models? That might explain @ipse's reports from the field that could not be reproduced?

I tried a FP2, Nokia 3310 and Sony/Ericsson K800i - they all behaved the same, as far as I can hear.

I also feel I have to query something in the original description of this ticket, it's bugging me because I really feel like I am misunderstanding something;

It works within 1m of the SDR, but as soon as the MS are moved further away, considerable audible codec.....

It terms of SDR in the lab, I have only ever used Ettus N210 and LimeSDR-mini, but I have never seen anything like this. Even at the minimum possible MS TX power, I can't imagine that moving one meter away would be an RF problem, even without any SMA antenna attached and no rx-gain, the WBX daughter board in my Ettus N210 would receive more than sufficient signal from the MS at 1m.

This is due to the fact that

phy 0
instance 0
osmotrx rx-gain 1

was used in the configuration to artificially get into "poor coverage"
without having to insert external attenuators and/or move the MS further
away to trigger the problem.

In the end, it doesn't matter where you put attenuation. At some point
the signal will be weak, and we get
  • lost bursts on the radio interface
  • bursts with failing CRC8 on the class-1 bits of FR
  • bursts with passing CRC8 on the class-1 bits but still high BER

And in all of those situations, the audible artefacts of osmo-bts-trx
are much worse / different than on any of the other BTS models using a
proprietary PHY. This is (next to all the other tickets that have spun
off of this) what I would want to figure out here in this ticket.

I've done some experiments on

  • enabling / disabling the ECU
  • setting the BFI flat on ECU input based on BER, not only CRC errors
  • suppressing the tranmission of RTP frames in uplink if CRC fails
    (which is what osmo-bts-sysmo is actually doing).

With BER-based-BFI I've had some success, but not much.

There may also be a difference on the downlink side in case we have a
RTP underflow (no RTP received but GSM TCH/F block must be encoded).

What osmo-bts-trx is doing here is to transmit bursts full of zeroes,
while it's undocumented what the various proprietary PHYs of
osmo-bts-{sysmo,lc15,octphy,...} are doing in that situation.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)