Project

General

Profile

Bug #2975

OsmoBTS doesn't generate measurement indications in absence of uplink bursts

Added by laforge about 3 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
osmo-bts-trx
Target version:
-
Start date:
02/21/2018
Due date:
% Done:

100%

Spec Reference:

Description

If there are no uplink bursts received (but timestamp indications sent) from OsmoTRX, osmo-bts-trx doesn't appear to generate RSL MEAS REP messages at every SACCH multiframe (103 frames), as expected. Odd.


Related issues

Related to OsmoBTS - Bug #2965: No measurement reports sent for channels other than TCHResolved02/19/2018

Related to OsmoBTS - Bug #2987: OsmoBTS RxQual/RxLev averaging broken if bursts are missignResolved02/23/2018

Related to OsmoBTS - Bug #2700: Odd RTP behavior in case of bad / missing speech framesClosed12/02/2017

Related to OsmoBTS - Feature #2977: OsmoBTS measurment processing at L1SAP too complex / pass measurements along with dataStalled02/21/2018

Related to OsmoBTS - Bug #3665: TTCN3 BTS_Tests last SACCH burst received too late -> wrong fake uplink measurement reportClosed10/23/2018

Blocked by OsmoBTS - Feature #3428: Implement handling of NOPE / IDLE indications from TransceiverResolved07/28/2018

Associated revisions

Revision b5a28bd9 (diff)
Added by dexter over 2 years ago

cosmetic: unify measurement sample handling in one function

In l1sap.c we call lchan_new_ul_meas() and lchan_meas_check_compute()
directly in sequence. Lets unify thos two steps inside measurement.c so
that we only need to call one function from l1sap.c.

Change-Id: If48bc7442dfaab8c36b93949f741de6e836e792a
Related: OS#2975

Revision 9feddb7e (diff)
Added by dexter over 2 years ago

measurement: make sure state is reset on chan act.

At the moment only lchan_meas_reset is reset on channel activation.
All other states are not reset. This may lead to irretations in the
first measurement interval if there are still leftover messages from
a previous connection. Lets ensure everything is reset to zero by
zeroing out the whole .meas struct in struct lchan.

- Add a centralized function that does the reset
- Call that function from rsl_tx_chan_act_ack() in rsl.c

Change-Id: I880ae3030df6dcd60c32b7144c3430528429bdea
Related: OS#2975
Related: OS#2987

Revision 4553890d (diff)
Added by dexter over 2 years ago

measurement: make sure measurement interval end is detected

the measurement interval end is detected by using the measurement
indication that is related to the SACCH block as a trigger to start the
computation. If the measurement indication for the SACCH gets lost
because the block could not be received then the processing is not
executed. This may cause wrong results or when it happens condecutively
an overflow of the measurement sample buffer.

- Store the frame number of the last received measurement indication
- Use the stored frame number to check if an interval was crossed when
the next measurement indication is received. If we detect that we
missed the interval, catch up by running the computation and
start the next interval.

Change-Id: I3a86cd8185cc6b94258373fe929f0c2f1cf27cfa
Related: OS#2975

Revision 092e4e85 (diff)
Added by dexter over 2 years ago

measurement: fix measurement interval end detection

for SDDCH4 channels, the detection is not working correctly since the
function uses the lookup table for SDCCH8 interval endings there. This
needs to be corrected. Also there are two unnecessary assignments in
the code which should be removed.

- use correct table (sdcch4_meas_rep_fn102 instead of
sdcch8_meas_rep_fn102.
- remove unnecessary assignments to last_fn_mod

Change-Id: If8a269ecd3f9fa4eeadf379114db816ef5c77d77
Related: OS#2975

Revision 02c79f12 (diff)
Added by dexter over 2 years ago

cosmetic: fix sourcecode formatting

Change-Id: Ia112af0b63478bdcf3cfab2537dc1ba08e03dfb1
Related: OS#2975

Revision 9f5203d2 (diff)
Added by dexter over 2 years ago

cosmetic: remove wrong comment

is_meas_overdue() does not use is_meas_complete() anymore.

Change-Id: I5925fad161843c06e76543d9098c598fe9e72d68
Related: OS#2975

Revision 324a3cd6 (diff)
Added by dexter over 2 years ago

measurement: fix is_meas_overdue() and increase testcoverage

The tests TC_meas_res_sign_sdcch4 and TC_meas_res_sign_sdcch8 are
failing mainly because lchan->ts->nr is confused with lchan->nr.
There is also a small problem with one of the formulas that compute
fn_missed_end.

- Add explainatory comment to the lookup tables on what the index
is refering to
- use lchan-nr instead of lchan->ts->nr when dealing with SDCCH/4/8
- simplfy and fix the formula
- increase the testcoverage of the unit tests, give SDCCH/4/8 also
a thorough check.

Change-Id: I5d555a21003943bf720c53f3a611029ba45339a9
Related: OS#2975

Revision 42495a15 (diff)
Added by dexter over 2 years ago

cosmetic: rename meas_rep_fn10 to _meas_rep_fn10_by*s

The lookup table that control the measurement interval endings do not
make clear what their indexes refer to. Lets give them more distinct
names.

rename sdcch8_meas_rep_fn102 to sdcch8_meas_rep_fn102_by_ss
rename sdcch4_meas_rep_fn102 to sdcch4_meas_rep_fn102_by_ss
rename tchf_meas_rep_fn104 to tchf_meas_rep_fn104_by_ts
rename tchh0_meas_rep_fn104 to tchh0_meas_rep_fn104_by_ts
rename tchh1_meas_rep_fn104 to tchh1_meas_rep_fn104_by_ts

Change-Id: I3dc891e1860109f803c1bfa46445e8fef35586d9
Related: OS#2975

Revision fb70a2ed (diff)
Added by dexter over 2 years ago

cosmetic: test_is_meas_overdue() does not test is_meas_complete()

The function is_meas_overdue() does not use is_meas_complete() anymore
and therefore the related log output is wrong. Lets correct this.

Change-Id: I9b7aa2f7a7c75bc3eed0c94b6ef9d17e7e36ce96
Related: OS#2975

Revision bf87717c (diff)
Added by dexter over 2 years ago

measurement: add SUB measurements in test_lchan_meas_process_measurement

The unit-test function test_lchan_meas_process_measurement() does not
tag measurements as SUB. Lets make the test function more realistic by
setting the is_sub flag at the correct positions.

- Add SUB-Measurements in the correct position
- Print log lines when adding measurements for sub, also fix
minor bugs in the log printing.

Change-Id: I25c361b21a406c0017ee586f0492c38f2e737e57
Related: OS#3502
Related: OS#2975

Revision c7875905 (diff)
Added by dexter over 2 years ago

measurement: remove missed interval end detection

The function is_meas_overdue() was introduced to allow
lchan_meas_process_measurement() to detect when the end of a measurement
interval has been missed. Interval ends may be missed when the SACCH
block of the related measurement interval gets lost. This is due to the
fact that the SACCH block is used as a trigger to start the measurement
result computation.

The idea behind is_meas_overdue() was to check the frame number of the
current measurement against the frame number of the previous measurement
in order to see if there was a measurement for SACCH in between or not.
Unfortunately SACCH and TCH Voice data is not necessarly processed in
order by each phy. Depending on the phy there may be a jitter between
the timing of SACCH and TCH Voice. Depending on the phy this jitter may
be enough to mess up the timing so that we see a SACCH block earlier
than expected. So we can not use the current frame number of TCH Voice
measurements to check for missed SACCH blocks.

Change-Id: Idfdbf64c1f965f35c12559b3995e2b746c74ee9e
Related: OS#3502
Related: OS#2975

Revision 27a86005 (diff)
Added by dexter over 2 years ago

measurement: fix unit-test test_lchan_meas_process_measurement

The unit test that tests lchan_meas_process_measurement() only inputs
test data to lchan_meas_process_measurement() but it is not checked if
the interval end could be detected or not.

- Add a return code to lchan_meas_process_measurement()
- Ensure that the return code is checked in the unit-test

Change-Id: I9e00ce683e8c44528804f65181dbfed9e85e3aed
Related: OS#2975

Revision 66c17cfc (diff)
Added by dexter over 1 year ago

rsl: ensure measurement reports are sent

osmo-bts currently does not generate a measurement report in case the
SACCH of the related traffic channel is lost. This is a problem because
the moment when reception gets bad measurmenet reporting is crucial to
carry out handover decisions effectively.

The presence of a SACCH block controls the conclusion of the measurement
interval and the sending of the RSL measurement report. The latter one
not only requires a measurmenet indication, it also requires a fully
intact SACCH block.

Lets use the NOPE / IDLE indications from V1 of the TRXD protocol to
ensure a SACCH block is always reported up to l1sap.c. In cases where
the SACCH is bad, trigger the sending of the RSL measurement report
manually without attaching the measurmenet data from the MS (which we do
not have in this case)

Related: OS#2975
Depends: osmo-ttcn3-hacks Ib2f511991349ab15e02db9c5e45f0df3645835a4
Change-Id: Idfa8ef94e8cf131ff234dac8f93f337051663ae2

Revision 4e07b83a (diff)
Added by laforge about 1 year ago

trx: Use NOPE indications from OsmoTRX for TCH/F and TCH/H

Without using the NOPE indication it might happen that we get
into the following situation:
  • bursts 0,1,2 of a given block are received
  • burst 3 is lost on the radio interface, OsmoTRX sends NOPE
  • osmo-bts-trx doesn't pass the NOPE the the rx_tch*_fn()
  • we never detect the end of the block, never perform decoding
    and even if the burst could be fully decoded, we loose the block

For voice, it can lead to lost RTP frames in uplink, which is also
problematic.

Let's deal with burst_len=0 in rx_tch*_fn() and use it as nope_fn.

Closes: OS#4661
Related: OS#2975
Change-Id: I0fbf4617daf24bd8aecfd9cfe1efd66cf73a277a

Revision e2f9d0ea (diff)
Added by laforge about 1 year ago

trx: Use NOPE indications on SDCCH

Without using the NOPE indication it might happen that we get
into the following situation:
  • bursts 0,1,2 of a given block are received
  • burst 3 is lost on the radio interface, OsmoTRX sends NOPE
  • osmo-bts-trx doesn't pass the NOPE the the rx_tch*_fn()
  • we never detect the end of the block, never perform decoding
    and even if the burst could be fully decoded, we loose the block

Related: OS#4661
Related: OS#2975
Change-Id: Idfc5c9a23db808c5f87ef5646c7e1d1cd3127371

History

#1 Updated by laforge about 3 years ago

  • Related to Bug #2965: No measurement reports sent for channels other than TCH added

#2 Updated by laforge about 3 years ago

The entire measurement computation + reporting process is driven by lchan_meas_check_compute(), which is only called from the l1sap whenever a PRIM_INFO_MEAS is reported up. In absence of bursts/blocks, this primitive is not reported and subsequently no measurement reports are generated.

What we should do instead is track the frame number and whenever the SACCH multiframe ends, we should trigger a RSL MEAS REP. the missing uplink bursts all have to count as erroneous, i.e. 100% bit errors.

The entire dualism of PH_DATA.ind / PH_TCH.ind containg (unsued) measurement data, but then having a separate PRIM_INFO_MEAS is odd to begin with. The measurements should always accompany the PH-DATA.ind / PH-TCH.ind and PRIM_INFO_MEAS should be abandoned.

#3 Updated by laforge about 3 years ago

  • Status changed from New to In Progress

#4 Updated by laforge about 3 years ago

#5 Updated by laforge about 3 years ago

#6 Updated by laforge about 3 years ago

  • Related to Bug #2987: OsmoBTS RxQual/RxLev averaging broken if bursts are missign added

#7 Updated by laforge almost 3 years ago

  • Assignee set to dexter

#8 Updated by dexter over 2 years ago

  • % Done changed from 0 to 50

One of the most sensitive parts here is when the SACCH block drops out because then the measurement computation process is not triggered. As we receive measurement indications we need to compare the frame number from the currently received one against the frame number of the previous one in order to check if we already crossed the boundary of a SACCH interval. I have now added a patch that does exactly that. Now a dropout of the SACCH interval will not supress the measurement computation anymore.

See also: https://gerrit.osmocom.org/#/c/osmo-bts/+/10492

However, we are not done yet. When we get a complete dropout with no measurements at all (battery died, tunnel etc...) then we have a problem. For this I would propose to use the time indication to implement a timeout. When lets say a quarter of a SACCH interval has passed without executing the computation/measurement report we could forcefully trigger the computation to generate a report. Unfortunately we are still not good in handling intervals with no measurements so I think its better to wait until that is fixed. See also #2987

#9 Updated by dexter over 2 years ago

The patch mentioned above is still in review. I have fixed the review issues now.

I also found out that we not really resetting the measurement states. Since the lchans are statically allocated (i think so, correct me if I am wrong) the states are not reset when the channel is re-opened by another subscriber. I now added a centralized function that resets everything and that is called from rsl.c when the channel is acknowledged.

See also: https://gerrit.osmocom.org/#/c/osmo-bts/+/10554/

#10 Updated by dexter over 2 years ago

Unfortunately change Gerrit change 10554 causes problems with TTCN3 tests TC_meas_res_sign_sdcch4 and TC_meas_res_sign_sdcch8. The test complains ("No MEAS RES received at all") that there were no measurement reports received but when checking the pcap files one can see that there are indeed measurement reports. Presumably there is (also) a problem with the test expectation.

While trying to fix the problems with the TTCN3 tests I still found some remaining problems that need to be fixed, see also:
https://gerrit.osmocom.org/10564

#11 Updated by dexter over 2 years ago

  • % Done changed from 50 to 90

All related patches are merged, unfortunately there is a problem now with the following to TTCN3 tests.

TC_meas_res_sign_sdcch4
TC_meas_res_sign_sdcch8

This is presumably a problem with the test expectation. Experiments show that even though the test is supposed to generate correct intervals the code always detects lost interval ends. Also TTCN3 complains that it would not see any measurement reports, but the pcap files show plenty of them. I also checked the numbering, it starts at 0 and looks good so far.

#12 Updated by daniel over 2 years ago

dexter wrote:

TC_meas_res_sign_sdcch4
TC_meas_res_sign_sdcch8

This is presumably a problem with the test expectation. Experiments show that even though the test is supposed to generate correct intervals the code always detects lost interval ends. Also TTCN3 complains that it would not see any measurement reports, but the pcap files show plenty of them. I also checked the numbering, it starts at 0 and looks good so far.

The pcap shows plenty measurement reports, but the ttcn3 log also shows quite a few being processed/received. After a while it seems the Measurement Report from LAPDm is not generating a new Measurement Report on RSL.

See https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/221/artifact/logs/bts-tester/BTS_Tests.TC_meas_res_sign_sdcch4.pcap (also attached)
Packet #281 is the last RSL MEAS Rep on RSL while more are coming in from the "MS".

It's easy to filter for measurement reports in wireshark like this:
(gsm_a.dtap.msg_rr_type == 0x15)

If you append && gsm_abis_rsl you can see that 16 measurement reports are being received for SDCCH/4,subchan 0 and then only one for subchan 1 (packet #281). After that any further measurement reports are ignored from the bts it seems.

Looking at the MS side there are 15 MEAS reports for subchan 0 as well as 15 for subchan 1. After timing out on subchan 1 the test aborts, so neither subchan 2 or 3 are attempted.

It's interesting that the RSL reports number 16 (Measurement result number 0 - 15) while the MS only sends 15.

#13 Updated by dexter over 2 years ago

I have found the problem now. I have confused Subslots and Timeslots for SDCCH/4 and SDCCH/8. This is now fixed and unit tests are added. The TTCN3 tests should be fine again when this is merged.

https://gerrit.osmocom.org/#/c/osmo-bts/+/10654 measurement: fix is_meas_overdue() and increase testcoverage

#14 Updated by dexter over 2 years ago

See also Ticket #3502 as the problem is closely linked to this one.

#15 Updated by dexter over 2 years ago

We have discussed the timing problem now and we came to the conclusion that one can not really rely on the ordering between SACCH and TCH voice since, those are different channels and it may be very vendor specific through which queues the blocks are sent. So at least a slight timing deviation must be accepted here. Unfortunately this renders my approach to detect the SACCH interval end useless.

The only way to fix this seems to be the usage of two buckets. We would collect measurements. By the frame number we can see if the measurement has to go into the bucket for the current interval or if it as to go into the bucket for the next interval. We would then notice the missed interval end by a timeout. If we start getting only measurements for the next-interval-bucket for some time we can flush the current-interval-bucket. This is of course a bit complex so we first need to see if there are other ways around.

Concerning osmo-bts-sysmo, there is good news. The phy has the option to space out unreadable bursts but we intentionally disabled this functionality, so in theory osmo-bts sysmo should never loose a block. Even when the no block is received it will still hand over a measurement and data of length zero. In order to verify that I made an experiment. I have set up a call and took the battery out of the phone. This is a measurement period from the time frame where the battery was already out:


<0004> measurement.c:442 025072/18/08/31/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=0
<0007> l1sap.c:1130 025072/18/08/31/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025077/18/13/36/37 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=1
<0007> l1sap.c:1130 025077/18/13/36/37 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025081/18/17/40/41 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=2
<0007> l1sap.c:1130 025081/18/17/40/41 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025085/18/21/44/45 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=3
<0007> l1sap.c:1130 025085/18/21/44/45 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025090/18/00/49/02 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=4
<0007> l1sap.c:1130 025090/18/00/49/02 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025094/18/04/02/06 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=5
<0007> l1sap.c:1130 025094/18/04/02/06 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025098/18/08/06/10 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=6
<0007> l1sap.c:1130 025098/18/08/06/10 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025103/18/13/11/15 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=7
<0007> l1sap.c:1130 025103/18/13/11/15 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025107/18/17/15/19 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=8
<0007> l1sap.c:1130 025107/18/17/15/19 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025111/18/21/19/23 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=9
<0007> l1sap.c:1130 025111/18/21/19/23 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025116/18/00/24/28 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=10
<0007> l1sap.c:1130 025116/18/00/24/28 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025120/18/04/28/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=11
<0007> l1sap.c:1130 025120/18/04/28/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025124/18/08/32/36 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=12
<0007> l1sap.c:1130 025124/18/08/32/36 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025129/18/13/37/41 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=13
<0007> l1sap.c:1130 025129/18/13/37/41 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025133/18/17/41/45 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=14
<0007> l1sap.c:1130 025133/18/17/41/45 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025137/18/21/45/49 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=15
<0007> l1sap.c:1130 025137/18/21/45/49 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025142/18/00/50/02 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=16
<0007> l1sap.c:1130 025142/18/00/50/02 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025146/18/04/03/06 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=17
<0007> l1sap.c:1130 025146/18/04/03/06 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025150/18/08/07/10 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=18
<0007> l1sap.c:1130 025150/18/08/07/10 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025155/18/13/12/15 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=19
<0007> l1sap.c:1130 025155/18/13/12/15 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025159/18/17/16/19 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=20
<0007> l1sap.c:1130 025159/18/17/16/19 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025163/18/21/20/23 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=21
<0007> l1sap.c:1130 025163/18/21/20/23 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025168/18/00/25/28 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=22
<0007> l1sap.c:1130 025168/18/00/25/28 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025172/18/04/29/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=23
<0007> l1sap.c:1130 025172/18/04/29/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025102/18/12/10/14 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=24
<0004> measurement.c:319 (bts=0,trx=0,ts=2,ss=0) meas period end fn:25102, fn_mod:12, status:1, pchan:TCH/F
<0004> measurement.c:658 (bts=0,trx=0,ts=2,ss=0) Calculating measurement results for physical channel:TCH/F
<0004> measurement.c:680 (bts=0,trx=0,ts=2,ss=0) received 25 UL measurements, expected 25
<0004> measurement.c:732 (bts=0,trx=0,ts=2,ss=0) received UL measurements contain 3 SUB measurements, expected 3
<0004> measurement.c:734 (bts=0,trx=0,ts=2,ss=0) replaced 0 measurements with dummy values, from which 0 were SUB measurements
<0004> measurement.c:773 (bts=0,trx=0,ts=2,ss=0) Computed TA256( 171798681) BER-FULL(10.16%), RSSI-FULL(-113dBm), BER-SUB(14.54%), RSSI-SUB(-114dBm)
<0004> measurement.c:786 (bts=0,trx=0,ts=2,ss=0) UL MEAS RXLEV_FULL(0), RXLEV_SUB(0),RXQUAL_FULL(6), RXQUAL_SUB(7), num_meas_sub(3), num_ul_meas(25) 

From what I can see this looks very good. All measurements are there and the period end is detected properly after the 25th measurement. I can not say to much about the computation result, but shouln't BER-FULL be somewhere near 100%. Maybe this needs to be checked. I don't know.

Note: What is valid for osmo-bts-sysmo is also valid for osmo-bts-litecell15.

For osmo-bts-trx the behavior is completely different. When I take the RX-Antenna of the USRP-B200 of and put the phone approx 1m away I can already see dropouts, also at the SACCH with all the consequences of missed measurement intervals.

Our Idea is now to realize something similar with osmo-bts-trx. We first need to pinpoint where the bursts/frames/blocks get spaced out. It could be that they are already spaced out at osmo-trx. An idea is to take a look at the mechanism that receives the UDP packets from the TRX and check for lost packets there. In case a packet is missing we could substitute it with a dummy. We think it is a good idea to make the substitution in osmo-bts-trx since there are already some variants of trx (e.g. fake-trx) around and checking and patching them all might not be such a good idea.

We will now take out the existing interval end detection logic and approach the problem as described above.

#16 Updated by dexter over 2 years ago

In order to have a functioning measurement reporting again I have removed fix is_meas_overdue() now.

https://gerrit.osmocom.org/#/c/osmo-bts/+/10814 measurement: remove missed interval end detection
https://gerrit.osmocom.org/#/c/osmo-bts/+/10815 measurement: fix unit-test test_lchan_meas_process_measurement

During our discussions we realized that a lot of the confusion we experience here comes from the way how measurement reports are handled in osmo-bts. The data and the measurement reports are handled on separate pathes, but it would be actually more natural to have both in one unit, handled on the same path. There is now an issue about that. See: #3530

#17 Updated by pespin over 2 years ago

  • Related to Bug #2700: Odd RTP behavior in case of bad / missing speech frames added

#18 Updated by fixeria over 2 years ago

  • Related to Feature #2977: OsmoBTS measurment processing at L1SAP too complex / pass measurements along with data added

#19 Updated by pespin over 2 years ago

  • Related to Bug #3665: TTCN3 BTS_Tests last SACCH burst received too late -> wrong fake uplink measurement report added

#20 Updated by pespin over 2 years ago

  • Related to Feature #3428: Implement handling of NOPE / IDLE indications from Transceiver added

#21 Updated by dexter over 2 years ago

  • Status changed from In Progress to Stalled

#22 Updated by dexter over 1 year ago

(I have re-tested this today. The problem is still present)

#23 Updated by fixeria over 1 year ago

  • Related to deleted (Feature #3428: Implement handling of NOPE / IDLE indications from Transceiver)

#24 Updated by fixeria over 1 year ago

  • Blocked by Feature #3428: Implement handling of NOPE / IDLE indications from Transceiver added

#25 Updated by fixeria over 1 year ago

General handling is implemented in https://gerrit.osmocom.org/c/osmo-bts/+/15989.

#26 Updated by dexter over 1 year ago

fixeria Thanks for pointing me to this. I have done an experiment with the code that already exists in master for osmo-trx and osmo-bts-trx. What I did was removing the BTS antenna until the reception got bad enough so that dropouts occurred. Unfortunately I never got any TRX_BI_F_NOPE_IND in trx_data_read_cb(). I wonder if the implementation in osmo-trx is even supports sending of sending TRX_BI_F_NOPE_IND or do I miss something here?

#27 Updated by fixeria over 1 year ago

I wonder if the implementation in osmo-trx is even supports sending of sending TRX_BI_F_NOPE_IND or do I miss something here?

I am pretty sure it does. See https://git.osmocom.org/osmo-bts/commit/?id=a1f2b6931ba0e095f571f9715601adb6a819cb63.
Let me check this again with the recent versions of osmo-trx and osmo-bts-trx.

#28 Updated by fixeria over 1 year ago

Let me check this again with the recent versions of osmo-trx and osmo-bts-trx.

Checked. I see NOPE / IDLE indications being sent one each timeslot when the MS is in IDLE mode. You should see them in Wireshark (decode-as OsmoTRXD protocol). This dissector is probably not yet available in the release version, so you can also use trx_toolkit/trx_sniff.py from OsmocomBB.

#29 Updated by dexter over 1 year ago

fixeria I have it running now and I now see NOPE indications as intended. In general I think your implementation should give me a good starting point to fix the measurement problems. I wonder if it is possible to set .nope_fn = .ul_fn. At least thats what I try at the moment with rx_data_fn. Of course if I do so the rx_data_fn() will fail to decode the frame but this can be catched and we can send an measurement indication up anyway. However this does not fix the missing measurement reports yet. There is still something stuck.

I also wonder if we could also risk another attempt to detect a missing SACCH frame by looking at the TCH frame numbers. The TCH frame numbers were incorrect last time. Now they are correct. However I think using nope_ind frames is the much cleaner solution but as far as I know this would not help with SC5, which still relies on V0 of the TRXD protocol.

#30 Updated by fixeria over 1 year ago

Hi,

I wonder if it is possible to set .nope_fn = .ul_fn. At least thats what I try at the moment with rx_data_fn.

yes, but there is an important detail: NOPE / IDLE indications do not carry a burst, only the measurements. This means that both bi->burst[] and bi->burst_len are not initialized (ASAN may not be happy). You probably need a wrapper-function (e.g. rx_nope_fn) that would initialize (memset(bi->burst, 0x00, ...) would be enough) them and call rx_data_fn().

However this does not fix the missing measurement reports yet. There is still something stuck.

This is odd. As far as I can see, rx_data_fn() does call l1if_process_meas_res() even if gsm0503_xcch_decode() fails...

Regarding the testing procedure, feel free to use fake_trx.py (https://osmocom.org/projects/baseband/wiki/FakeTRX). It features 'FAKE_DROP' TRXC command, that can omit a given amount of bursts. Please note that currently FakeTRX does not support sending of NOPE / IDLE indications, but you can easily modify FakeTRX::sim_burst_drop() to do that.

#31 Updated by fixeria over 1 year ago

See https://gerrit.osmocom.org/c/osmocom-bb/+/16092/ "trx_toolkit/fake_trx.py: send NOPE.ind in case of path loss simulation".

#32 Updated by dexter over 1 year ago

I have now analyzed the problem again. First of all it is important to understand
how the measurement result computation is triggered. There are two different
triggers required. We need to receive a measurement for the SACCH in order to
terminate the measurement interval and to compute the results. However this only
carries out the result computation but does not trigger sending the measurement
report via RSL. In order to get the result sent via RSL we need an intact SACCH
block. If we do not have that we will never see a measurement result on RSL.

When we use the NOPE indications we can make sure that we never miss a
measurement result. So from that perspective this solves the problems we have
with to little measurements or measurments leaking from one interval into the
next. However, we still have the problem on the RSL side unless we make up a
fake SACCH block in case the SAACH block we received is bad. This is probably
not the best solution. We might be able to circumvent this by triggering the
RSL report somehow when we notice that the inverval is complete, but no SAACH
was sent.

Some time ago there was an attempt to implement a detection logic that can
detect if a SAACH block was lost by looking at the TCH blocks and thier frame
numbers. This idea was discarded because we noticed that the frame numbers
somehow behaved strangely. I think this confusion came from the bugs in
osmo-bts-trx that calculated the frame numbers for the blocks wrongly. I have
now compared the behavior of osmo-bts-trx and osmo-bts-sysmo and I can see that
the frame numbers arrive exactly as expected. One can even see frame 99
arriving after frame 25 that concludes the interval, which is due to the
diagonal interleaving.

========================> NOT COMPLETE =====> fn=19218, fn%104=82
========================> NOT COMPLETE =====> fn=19222, fn%104=86
========================> NOT COMPLETE =====> fn=19227, fn%104=91
========================> NOT COMPLETE =====> fn=19231, fn%104=95
========================> COMPLETE =========> fn=19161, fn%104=25
========================> NOT COMPLETE =====> fn=19235, fn%104=99 <====
========================> NOT COMPLETE =====> fn=19240, fn%104=0
========================> NOT COMPLETE =====> fn=19244, fn%104=4
========================> NOT COMPLETE =====> fn=19248, fn%104=8

In theory we should be able to detect a missing SAACH by just observing the
frame numbers on the TCH.

Probably we should opt for both methods especially the SC5 will probably not
support the new protocol with the NOPE indications and therefore we will have
to provide enough robustness to fix the problem there.

It certainly makes sense to prvent wrong measurements results when there are
reception problems with the SACCH, but I am not sure if it makes sense to
generate artificial measurement results on total signal loss.

#33 Updated by laforge over 1 year ago

On Tue, Nov 19, 2019 at 03:00:28PM +0000, dexter [REDMINE] wrote:

Issue #2975 has been updated by dexter.

File TCH_F1_fn_samples_with_osmo-bts-sysmo.txt added
File TCH_F1_fn_samples_with_osmo-bts-trx.txt added
File TCH_H1-0_fn_samples_with_osmo-bts-sysmo.txt added
File TCH_H1-0_fn_samples_with_osmo-bts-trx.txt added
File TCH_H1-1_fn_samples_with_osmo-bts-sysmo.txt added
File TCH_H1-1_fn_samples_with_osmo-bts-trx.txt added

I have now analyzed the problem again. First of all it is important to understand
how the measurement result computation is triggered. There are two different
triggers required. We need to receive a measurement for the SACCH in order to
terminate the measurement interval and to compute the results. However this only
carries out the result computation but does not trigger sending the measurement
report via RSL. In order to get the result sent via RSL we need an intact SACCH
block. If we do not have that we will never see a measurement result on RSL.

When we use the NOPE indications we can make sure that we never miss a
measurement result. So from that perspective this solves the problems we have
with to little measurements or measurments leaking from one interval into the
next.

However, we still have the problem on the RSL side unless we make up a
fake SACCH block in case the SAACH block we received is bad. This is probably
not the best solution. We might be able to circumvent this by triggering the
RSL report somehow when we notice that the inverval is complete, but no SAACH
was sent.

I don't think having a 'bad frame indication' is a bad idea, like we
have for voice/TCH data? There we also let the RTP code know if the
received frame (codec frame instead of MAC block) was bad.

Some time ago there was an attempt to implement a detection logic that can
detect if a SAACH block was lost by looking at the TCH blocks and thier frame
numbers. This idea was discarded because we noticed that the frame numbers
somehow behaved strangely. I think this confusion came from the bugs in
osmo-bts-trx that calculated the frame numbers for the blocks wrongly. I have
now compared the behavior of osmo-bts-trx and osmo-bts-sysmo and I can see that
the frame numbers arrive exactly as expected. One can even see frame 99
arriving after frame 25 that concludes the interval, which is due to the
diagonal interleaving.

this is great.

In theory we should be able to detect a missing SAACH by just observing the
frame numbers on the TCH.

yes, but what if the TCH or SDCCH is in signaling mode and all of the
frames are bad? then the measurement is never sent.

I'm a big fan of event/clock driven design in a TDMA system. So at the
time the TDMA frame number reaches the point where the measurement
should be sent, we should send it - rather than covering up at a later
point in time.

Probably we should opt for both methods especially the SC5 will probably not
support the new protocol with the NOPE indications and therefore we will have
to provide enough robustness to fix the problem there.

I'm not sure if that is the best way to spend time on that, maybe the TRXDv1 can
simply be added there.

It certainly makes sense to prvent wrong measurements results when there are
reception problems with the SACCH, but I am not sure if it makes sense to
generate artificial measurement results on total signal loss.

I'm not following here. If there is 'total signal loss' then there will
be low RSSI and high BER, and that should be computed and reported as
normal. Why treat this situation different than any other situation?

#34 Updated by dexter over 1 year ago

  • Status changed from Stalled to In Progress

I have now implemented the approach the relys on NOPE / IDLE indications. In this mode we can be sure that if we loose a SACCH block we l1sap.c will be informed about this. We can also check if the SACCH was good or bad here (Its done already and seems to be required for channel timeout.) In cases where the SACCH is bad we will trigger the sending of the RSL measurement report from l1sap.c. The report than of course lacks the DTAP measurement report from the MS.

(A lacking DTAP measurement report in the RSL measurement report makes much more sense to me as it clearly indicates the total signal loss. The idea of faking a DTAP measurement report on total signal loss was a bit confusing.)

https://gerrit.osmocom.org/c/osmo-bts/+/16170 rsl: ensure measurement reports are sent

#35 Updated by laforge over 1 year ago

On Fri, Nov 22, 2019 at 01:31:36PM +0000, dexter [REDMINE] wrote:

(A lacking DTAP measurement report in the RSL measurement report makes much more sense to me as it clearly indicates the total signal loss. The idea of faking a DTAP measurement report on total signal loss was a bit confusing.)

this was a misunderstanding. I only suggested to have 'fake' uplink measurements in case we are
missing uplink blocks/measurements, since the BTS must always report its own uplink measurements.

The downlink measurements from the MS/UE are always optional and can simply be absent, as you wrote.

#36 Updated by dexter over 1 year ago

The patch is still in review, while everything looks fine with manual tests, the TTCN3 tests are not happy at all. Currently I am having problems with BTS_Tests.TC_meas_res_sign_tchf, which expects the measurement result number counting up from 0. When running the testsuite sometimes, in the very beginning I get a measurement result without the RR measurement report. This type of reports is not counted by the tests and so the next report that is complete with RR measurement report has a non matching measurement result number.

I have traced down the origin of this first incomplete measurement report. It is comming from the SACCH loss detection in scheduler_trx.c. I wonder if this is really a problem. Is it even guaranteed that when a TCH Channel is opened the SACCH is immediately present? What if the mobile starts transmitting just a bit later so that the first SACCH interval is bad? Should we suppress those messages or should we change the testcase?

Besides of that I also wonder when a TCH exactly starts. Is the beginning of the TCH somehow aligned with the SACCH interval or does it just start anywhere in the middle of the SACCH interval so that the first SACCH frame may be bad because the interval is chopped off?

#37 Updated by dexter over 1 year ago

Attached one finds a trace from the current situation: Packet 40 is the one that is emitted by the SACCH loss detection in scheduler_trx.c. Then one complete measurmenet follows but the tests stops then because the measurement result number is one off. (The test ignores measurement reports without RR measurement report)

#38 Updated by laforge over 1 year ago

On Thu, Nov 28, 2019 at 02:39:28PM +0000, dexter [REDMINE] wrote:

I have traced down the origin of this first incomplete measurement report. It is comming from the SACCH loss detection in scheduler_trx.c. I wonder if this is really a problem. Is it even guaranteed that when a TCH Channel is opened the SACCH is immediately present? What if the mobile starts transmitting just a bit later so that the first SACCH interval is bad? Should we suppress those messages or should we change the testcase?

the SACCH is activated on the BTS side immediately. However, the MS may need some additional time until it starts transmitting, both on the dedicated (TCH, SDCCH) as well as on the uplink SACCH.

Besides of that I also wonder when a TCH exactly starts. Is the beginning of the TCH somehow aligned with the SACCH interval or does it just start anywhere in the middle of the SACCH interval so that the first SACCH frame may be bad because the interval is chopped off?

I would suppose this can happen, I'm not aware of any alignment of TCH activation.

#39 Updated by dexter over 1 year ago

  • % Done changed from 90 to 100

The relevant patches are merged. I think we can close this now.

#40 Updated by dexter about 1 year ago

(This somehow can not be set to "Resolved")

#41 Updated by fixeria about 1 year ago

  • Status changed from In Progress to Resolved

It was blocked by #3428 which in its turn was blocked by #4006...

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)