Project

General

Profile

Bug #3428

Too many contiguous elapsed fn, dropping...

Added by fixeria 3 months ago. Updated 2 months ago.

Status:
Stalled
Priority:
High
Assignee:
Category:
osmo-bts-trx
Target version:
Start date:
07/28/2018
Due date:
% Done:

40%

Spec Reference:

Description

The following output can be observed with trxcon and FakeTRX:

Too many contiguous elapsed fn, dropping 80375
<0000> ../../../src/common/rsl.c:741 (bts=0,trx=0,ts=1,pchan=SDCCH8) (ss=0) SDCCH Tx CHAN ACT ACK
Too many contiguous elapsed fn, dropping 79538
Too many contiguous elapsed fn, dropping 16
<0011> lapd_core.c:920 Store content res. (dl=0x7f9793b90d68)
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
Too many contiguous elapsed fn, dropping 48
Too many contiguous elapsed fn, dropping 29
Too many contiguous elapsed fn, dropping 16
<0000> ../../../src/common/rsl.c:720 (bts=0,trx=0,ts=1,pchan=SDCCH8) (ss=0) SDCCH Tx CHAN REL ACK

It can be also observed with OsmoTRX, but the most of such messages
in this case caused by false-positive detection of bursts (i.e. noise).


Related issues

Related to OsmoBTS - Bug #2700: Odd RTP behavior in case of bad / missing speech framesIn Progress2017-12-02

Related to Cellular Network Infrastructure - Bug #3370: osmo-bts "substituting all-zero burst" and osmo-pcu "Link quality (0dB) left window [5, 8], modifying uplink CS levels: CS-2 -> CS-1"New2018-06-29

History

#1 Updated by fixeria 3 months ago

  • Status changed from New to In Progress
  • Assignee set to fixeria
  • % Done changed from 0 to 10

With a bit more detailed logging it's clean that the counting of lost
bursts is implemented in a wrong way:

<0012> input/ipa.c:131 127.0.0.1:3003 connection done
<0012> input/ipaccess.c:704 received ID get from 1801/0/0
Too many (2427 >= 10) contiguous elapsed bursts, last fn=0, dropping fn=2427
<0000> common/rsl.c:741 (bts=0,trx=0,ts=1,pchan=SDCCH8) (ss=0) SDCCH Tx CHAN ACT ACK
Too many (2495 >= 10) contiguous elapsed bursts, last fn=0, dropping fn=2495
Too many (16 >= 10) contiguous elapsed bursts, last fn=2498, dropping fn=2514
<0011> lapd_core.c:920 Store content res. (dl=0x7f779e72fd68)
Too many (48 >= 10) contiguous elapsed bursts, last fn=2517, dropping fn=2565
Too many (29 >= 10) contiguous elapsed bursts, last fn=2568, dropping fn=2597
Too many (16 >= 10) contiguous elapsed bursts, last fn=2600, dropping fn=2616
Too many (48 >= 10) contiguous elapsed bursts, last fn=2619, dropping fn=2667
Too many (29 >= 10) contiguous elapsed bursts, last fn=2670, dropping fn=2699
Too many (16 >= 10) contiguous elapsed bursts, last fn=2702, dropping fn=2718
Too many (48 >= 10) contiguous elapsed bursts, last fn=2721, dropping fn=2769
Too many (29 >= 10) contiguous elapsed bursts, last fn=2772, dropping fn=2801
Too many (16 >= 10) contiguous elapsed bursts, last fn=2804, dropping fn=2820
Too many (48 >= 10) contiguous elapsed bursts, last fn=2823, dropping fn=2871
Too many (29 >= 10) contiguous elapsed bursts, last fn=2874, dropping fn=2903
Too many (16 >= 10) contiguous elapsed bursts, last fn=2906, dropping fn=2922
Too many (48 >= 10) contiguous elapsed bursts, last fn=2925, dropping fn=2973
Too many (29 >= 10) contiguous elapsed bursts, last fn=2976, dropping fn=3005
Too many (16 >= 10) contiguous elapsed bursts, last fn=3008, dropping fn=3024
Too many (48 >= 10) contiguous elapsed bursts, last fn=3027, dropping fn=3075
Too many (29 >= 10) contiguous elapsed bursts, last fn=3078, dropping fn=3107
Too many (16 >= 10) contiguous elapsed bursts, last fn=3110, dropping fn=3126
Too many (48 >= 10) contiguous elapsed bursts, last fn=3129, dropping fn=3177
Too many (29 >= 10) contiguous elapsed bursts, last fn=3180, dropping fn=3209
Too many (16 >= 10) contiguous elapsed bursts, last fn=3212, dropping fn=3228
Too many (48 >= 10) contiguous elapsed bursts, last fn=3231, dropping fn=3279
Too many (29 >= 10) contiguous elapsed bursts, last fn=3282, dropping fn=3311
Too many (16 >= 10) contiguous elapsed bursts, last fn=3314, dropping fn=3330
Too many (48 >= 10) contiguous elapsed bursts, last fn=3333, dropping fn=3381
<0000> ../../../src/common/rsl.c:720 (bts=0,trx=0,ts=1,pchan=SDCCH8) (ss=0) SDCCH Tx CHAN REL ACK

#2 Updated by fixeria 3 months ago

  • % Done changed from 10 to 20

Let's have a look at the following line:

Too many (29 >= 10) contiguous elapsed frames, last fn=2568, dropping fn=2597

OsmoBTS thinks that 29 frames elapsed since the last_fn=2568.
What does it mean? Let's have a look at the multiframe layout:

static const struct trx_frame frame_sdcch8[102] = {
    /* dl_chan        dl_bid    ul_chan        ul_bid */
    { TRXC_SDCCH8_0,    0,    TRXC_SACCH8_5,    0 },
    { TRXC_SDCCH8_0,    1,    TRXC_SACCH8_5,    1 },
    { TRXC_SDCCH8_0,    2,    TRXC_SACCH8_5,    2 },
    { TRXC_SDCCH8_0,    3,    TRXC_SACCH8_5,    3 },
    { TRXC_SDCCH8_1,    0,    TRXC_SACCH8_6,    0 },
    { TRXC_SDCCH8_1,    1,    TRXC_SACCH8_6,    1 },
    { TRXC_SDCCH8_1,    2,    TRXC_SACCH8_6,    2 },
    { TRXC_SDCCH8_1,    3,    TRXC_SACCH8_6,    3 },
    { TRXC_SDCCH8_2,    0,    TRXC_SACCH8_7,    0 },
    { TRXC_SDCCH8_2,    1,    TRXC_SACCH8_7,    1 },
    { TRXC_SDCCH8_2,    2,    TRXC_SACCH8_7,    2 },
    { TRXC_SDCCH8_2,    3,    TRXC_SACCH8_7,    3 },
    { TRXC_SDCCH8_3,    0,    TRXC_IDLE,    0 },
    { TRXC_SDCCH8_3,    1,    TRXC_IDLE,    0 },
    { TRXC_SDCCH8_3,    2,    TRXC_IDLE,    0 },

    /**
     * last_fn=2568 % mf_period=102 == 18,
     * so here are the previous 4 bursts
     * SDCCH/8 (ss=0)
     */
    { TRXC_SDCCH8_3,    3,    TRXC_SDCCH8_0,    0 },
    { TRXC_SDCCH8_4,    0,    TRXC_SDCCH8_0,    1 },
    { TRXC_SDCCH8_4,    1,    TRXC_SDCCH8_0,    2 },
    { TRXC_SDCCH8_4,    2,    TRXC_SDCCH8_0,    3 }, // <--- last_fn=2568 (18)

    /* SDCCH/8 (ss=1), another subscriber */
    { TRXC_SDCCH8_4,    3,    TRXC_SDCCH8_1,    0 },
    { TRXC_SDCCH8_5,    0,    TRXC_SDCCH8_1,    1 },
    { TRXC_SDCCH8_5,    1,    TRXC_SDCCH8_1,    2 },
    { TRXC_SDCCH8_5,    2,    TRXC_SDCCH8_1,    3 },

    /* SDCCH/8 (ss=2), another subscriber */
    { TRXC_SDCCH8_5,    3,    TRXC_SDCCH8_2,    0 },
    { TRXC_SDCCH8_6,    0,    TRXC_SDCCH8_2,    1 },
    { TRXC_SDCCH8_6,    1,    TRXC_SDCCH8_2,    2 },
    { TRXC_SDCCH8_6,    2,    TRXC_SDCCH8_2,    3 },

    /* SDCCH/8 (ss=3), another subscriber */
    { TRXC_SDCCH8_6,    3,    TRXC_SDCCH8_3,    0 },
    { TRXC_SDCCH8_7,    0,    TRXC_SDCCH8_3,    1 },
    { TRXC_SDCCH8_7,    1,    TRXC_SDCCH8_3,    2 },
    { TRXC_SDCCH8_7,    2,    TRXC_SDCCH8_3,    3 },

    /* SDCCH/8 (ss=4), another subscriber */
    { TRXC_SDCCH8_7,    3,    TRXC_SDCCH8_4,    0 },
    { TRXC_SACCH8_0,    0,    TRXC_SDCCH8_4,    1 },
    { TRXC_SACCH8_0,    1,    TRXC_SDCCH8_4,    2 },
    { TRXC_SACCH8_0,    2,    TRXC_SDCCH8_4,    3 },

    /* SDCCH/8 (ss=5), another subscriber */
    { TRXC_SACCH8_0,    3,    TRXC_SDCCH8_5,    0 },
    { TRXC_SACCH8_1,    0,    TRXC_SDCCH8_5,    1 },
    { TRXC_SACCH8_1,    1,    TRXC_SDCCH8_5,    2 },
    { TRXC_SACCH8_1,    2,    TRXC_SDCCH8_5,    3 },

    /* SDCCH/8 (ss=6), another subscriber */
    { TRXC_SACCH8_1,    3,    TRXC_SDCCH8_6,    0 },
    { TRXC_SACCH8_2,    0,    TRXC_SDCCH8_6,    1 },
    { TRXC_SACCH8_2,    1,    TRXC_SDCCH8_6,    2 },
    { TRXC_SACCH8_2,    2,    TRXC_SDCCH8_6,    3 },

    /* SDCCH/8 (ss=7), another subscriber */
    { TRXC_SACCH8_2,    3,    TRXC_SDCCH8_7,    0 },
    { TRXC_SACCH8_3,    0,    TRXC_SDCCH8_7,    1 },
    { TRXC_SACCH8_3,    1,    TRXC_SDCCH8_7,    2 },
    { TRXC_SACCH8_3,    2,    TRXC_SDCCH8_7,    3 },

    /**
     * current_fn=2597 % mf_period=102 == 47
     * 
     * so here are the current 4 bursts
     * SACCH/8 (ss=0), here MS sends Measurement Reports
     *
     * current_fn=2597 - last_fn=2568 == elapsed_fn=29
     */
    { TRXC_SACCH8_3,    3,    TRXC_SACCH8_0,    0 }, // <--- current_fn=2597 (47)
    { TRXC_IDLE,        0,    TRXC_SACCH8_0,    1 },
    { TRXC_IDLE,        0,    TRXC_SACCH8_0,    2 },
    { TRXC_IDLE,        0,    TRXC_SACCH8_0,    3 },

    { TRXC_SDCCH8_0,    0,    TRXC_SACCH8_1,    0 },
    { TRXC_SDCCH8_0,    1,    TRXC_SACCH8_1,    1 },
    { TRXC_SDCCH8_0,    2,    TRXC_SACCH8_1,    2 },
    { TRXC_SDCCH8_0,    3,    TRXC_SACCH8_1,    3 },
    { TRXC_SDCCH8_1,    0,    TRXC_SACCH8_2,    0 },
    { TRXC_SDCCH8_1,    1,    TRXC_SACCH8_2,    1 },
    { TRXC_SDCCH8_1,    2,    TRXC_SACCH8_2,    2 },
    { TRXC_SDCCH8_1,    3,    TRXC_SACCH8_2,    3 },
    { TRXC_SDCCH8_2,    0,    TRXC_SACCH8_3,    0 },
    { TRXC_SDCCH8_2,    1,    TRXC_SACCH8_3,    1 },
    { TRXC_SDCCH8_2,    2,    TRXC_SACCH8_3,    2 },
    { TRXC_SDCCH8_2,    3,    TRXC_SACCH8_3,    3 },

    /* ... */
};

Actually "29 elapsed frames" is an expected behaviour, because
they belong to other subscribers (ss != 0)! If all sub-slots are
allocated and actively used by subscribers, this warning wouldn't
appear, because the 'l1ts->mf_last_fn' variable would be updated
on each received frame. Excluding IDLE frames, which anyway would
cause this warning...

So, I think this is not a good idea to track lost bursts/frames
using this 'l1ts->mf_last_fn', which is global for all logical
channels on the current time-slot.

I suggest to change the logic, and track lost bursts/frames
individually for each allocated logical channel.

Do you guys agree?

#3 Updated by fixeria 3 months ago

  • % Done changed from 20 to 40

Just introduced some helpful commands for path loss simulation in FakeTRX.
In particular, now it's possible to ask FakeTRX to drop some amount of UL/DL bursts at runtime.

Please see: https://gerrit.osmocom.org/10305/

#4 Updated by fixeria 3 months ago

Also, I have some draft implementation (still cleaning up..) of per logical channel
TDMA frame loss tracking for OsmoBTS. I am going to write some TTCN-3 test cases
and upload everything to Gerrit...

P.S. I am finally going to drop this hackish while (42) { ... } ;)

#5 Updated by fixeria 3 months ago

  • Status changed from In Progress to Feedback
  • Target version set to osmo-bts-trx refresh
  • % Done changed from 40 to 80

#6 Updated by fixeria 3 months ago

It was noticed that TDMA frame loss detection becomes crazy on PDTCH logical channels.
Most likely, because they are always active (when OsmoPCU is connected), and since I've
increased the detection period, multiple false-positive detections (noise) from OsmoTRX
do trigger the logic more often than on other channels.

A possible solution is not to track the frame loss on PDTCH channels...
Any ideas?

#7 Updated by laforge 2 months ago

On Sat, Aug 04, 2018 at 03:49:57AM +0000, fixeria [REDMINE] wrote:

It was noticed that TDMA frame loss detection becomes crazy on PDTCH logical channels.
Most likely, because they are always active (when OsmoPCU is connected), and since I've
increased the detection period, multiple false-positive detections (noise) from OsmoTRX
do trigger the logic more often than on other channels.

A possible solution is not to track the frame loss on PDTCH channels...
Any ideas?

I think there's two good solutions:

a) make sure we send information for every block on every active timeslot, whether osmo-trx
has received something, or not. I think optimizing this by dropping blocks in case nothing
was received is a false optimization anyway. The CPU of the BTS must be able to handle all
logical channels active at the same time anyway. And not sending the blocks creates an
ambiguity: Were the blocks lost on the UDP/IP/socket_buffer level, or were they simply
below the detection threshold in osmo-trx? The first would be an important problem to resolve,
the second is normal operation.

b) remove the 'lost block' detection in osmo-bts-trx. But then, we don't know if that was due
to loss on the RF path, or whether itw as due to local UDP/IP/socket issues between osmo-trx
and omso-bts-trx

#8 Updated by fixeria 2 months ago

Hi Harald,

I think there's two good solutions:

to be honest, I am not sure both are 100% good solutions. Let's discuss?

a) make sure we send information for every block on every active
timeslot, whether osmo-trx has received something, or not

We can go this way, but I think it would complicate the logic of OsmoTRX.
At least, we need to make OsmoTRX distinguish between logical channels,
as sending such indications on RACH and PD{T|C}CH doesn't make sense.
I am not familiar with DTX, but this may also be affected...

The CPU of the BTS must be able to handle all logical channels active
at the same time anyway. And not sending the blocks creates an ambiguity:
were the blocks lost on the UDP/IP/socket_buffer level, or were they simply
below the detection threshold in osmo-trx?

I think sending such "dummy" bursts would only increase the
chances that some meaningful bursts will be lost, especially
on embedded systems. Furthermore, in case of PD{T|C}CH channels,
which are always active, so every 4th burst will trigger
a decoding attempt...

If a burst or even a few bursts are lost, they are lost. No matter,
due to CPU load, UDP/IP packet loss, or RSSI/ToA treshold,
we are unable to recover them anyway...

And finally, this approach assumes the compatibility problems.

b) remove the 'lost block' detection in osmo-bts-trx.
But then, we don't know if that was due to loss on the RF path,
or whether itw as due to local UDP/IP/socket issues
between osmo-trx and omso-bts-trx

Yep, and additionally if one or even a few blocs (e.g. 4 bursts
for xCCH) are lost, we wouldn't even reflect this in the Uplink
measurements...

At the moment, I would prefer not to modify the logic of OsmoTRX,
and keep the burst loss detection in OsmoBTS. Also, instead
of sending zero-filled bursts to compensate a gap in OsmoBTS,
I would modify the measurements logic to account the amount
of received vs lost bursts (I was WIP on this)...

#9 Updated by laforge 2 months ago

On Sun, Aug 05, 2018 at 05:12:58PM +0000, fixeria [REDMINE] wrote:

The CPU of the BTS must be able to handle all logical channels active
at the same time anyway. And not sending the blocks creates an ambiguity:
were the blocks lost on the UDP/IP/socket_buffer level, or were they simply
below the detection threshold in osmo-trx?

I think sending such "dummy" bursts would only increase the
chances that some meaningful bursts will be lost, especially
on embedded systems.

As indicated before, if your BTS CPU is not able to decode all blocks/bursts,
you are running a broken setup anyway. If we sent all bursts from osmo-trx
to osmo-bts-trx, then it would be visible immediataly at start-up. By not
sending all bursts to osmo-bts, you are only hiding/delaying the visibility
of having insufficient CPU until you're actually under load.

In fact, at the moment it's rather hard (without a "GSM air interface load generator")
to establish whether your BTS CPU is sufficient for a fully loaded BTS or not.

The only argument in favor of skipping/supressing bursts is to save
power (and heat dissipation), which may really be a valid point in case
of solar powerd systems, where you can use any excess energy to charge
batteries for night time operation.

Furthermore, in case of PD{T|C}CH channels,
which are always active, so every 4th burst will trigger
a decoding attempt...

You could still skip the attempt to decode (viterbi) in osmo-bts-trx, if the
receive level is below a certain thershold. What does osmo-trx
currently use to determine whether or not to send a burst?

If a burst or even a few bursts are lost, they are lost. No matter,
due to CPU load, UDP/IP packet loss, or RSSI/ToA treshold,
we are unable to recover them anyway...

yes. But, as indicated, one type of loss is normal, the other
(TCP/IP/socket) is a clear indication of a problem with your
hardware/software setup, which should never ever happen during
production.

And finally, this approach assumes the compatibility problems.

ACK.

b) remove the 'lost block' detection in osmo-bts-trx.
But then, we don't know if that was due to loss on the RF path,
or whether itw as due to local UDP/IP/socket issues
between osmo-trx and omso-bts-trx

Yep, and additionally if one or even a few blocs (e.g. 4 bursts
for xCCH) are lost, we wouldn't even reflect this in the Uplink
measurements...

this is definitely an even worse bug. Uplink measurements must be
correctly computed, whether or not burst data was receivd on all
frames of a given logical chanenl, or not.

At the moment, I would prefer not to modify the logic of OsmoTRX,
and keep the burst loss detection in OsmoBTS. Also, instead
of sending zero-filled bursts to compensate a gap in OsmoBTS,
I would modify the measurements logic to account the amount
of received vs lost bursts (I was WIP on this)...

Fine with me. All non-trx osmo-bts backends receive some kind of
message from the PHY every frame number, which is actually rather nice
in order to drive all the various logic bits...

#10 Updated by fixeria 2 months ago

  • Status changed from Feedback to Stalled
  • % Done changed from 80 to 40

I think we can merge the current change, as it helps to avoid the original problem of this issue.

As was noted at #osmocom:

There is a problem of the current TDMA frame loss tracking approach: we calculate the amount
of lost/processed bursts only when something is received on Uplink, so the algorithm is being
driven by 'unstable source'. As a result, Uplink measurements may not reflect the actual values...

So, it really makes sense to send some kind of indications from TRX that nothing was detected.

At the moment, I have no enough time, so setting to 'Stalled', but will resume ASAP.

#11 Updated by fixeria 2 months ago

What does osmo-trx currently use to determine whether or not to send a burst?

There are several threads, which share a queue of detected bursts in OsmoTRX.
I had a quick look at the source code, and I think it should be easy to
implement the indications we are talking about, please see:

https://git.osmocom.org/osmo-trx/tree/Transceiver52M/Transceiver.cpp#n922

#12 Updated by laforge 2 months ago

On Mon, Aug 06, 2018 at 12:50:19PM +0000, fixeria [REDMINE] wrote:

There are several threads, which share a queue of detected bursts in OsmoTRX.
I had a quick look at the source code, and I think it should be easy to
implement the indications we are talking about, please see:

https://git.osmocom.org/osmo-trx/tree/Transceiver52M/Transceiver.cpp#n922

FYI, AFAICT there are the following cases where pullRadioBurst returns NULL:

  • nothing can be pulled out of the FIFO (really bad)
  • timeslot is off (cannot legally happen if there's an active lchan, as
    it is switched on/off by osmo-bts-trx)
  • energy low (energyDetect() <= -1.0)
  • type == IDLE
  • unable to detect NORMAL or RACH burst (including clipping)

#13 Updated by pespin 28 days ago

  • Related to Bug #2700: Odd RTP behavior in case of bad / missing speech frames added

#14 Updated by fixeria 25 days ago

  • Related to Bug #3370: osmo-bts "substituting all-zero burst" and osmo-pcu "Link quality (0dB) left window [5, 8], modifying uplink CS levels: CS-2 -> CS-1" added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)