Project

General

Profile

Bug #2975

OsmoBTS doesn't generate measurement indications in absence of uplink bursts

Added by laforge almost 2 years ago. Updated 7 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
osmo-bts-trx
Target version:
-
Start date:
02/21/2018
Due date:
% Done:

90%

Spec Reference:

Description

If there are no uplink bursts received (but timestamp indications sent) from OsmoTRX, osmo-bts-trx doesn't appear to generate RSL MEAS REP messages at every SACCH multiframe (103 frames), as expected. Odd.


Related issues

Related to OsmoBTS - Bug #2965: No measurement reports sent for channels other than TCHResolved02/19/2018

Related to OsmoBTS - Bug #2987: OsmoBTS RxQual/RxLev averaging broken if bursts are missignStalled02/23/2018

Related to OsmoBTS - Bug #2700: Odd RTP behavior in case of bad / missing speech framesClosed12/02/2017

Related to OsmoBTS - Feature #2977: OsmoBTS measurment processing at L1SAP too complex / pass measurements along with dataIn Progress02/21/2018

Related to OsmoBTS - Bug #3665: TTCN3 BTS_Tests last SACCH burst received too late -> wrong fake uplink measurement reportStalled10/23/2018

Blocked by OsmoBTS - Feature #3428: Implement handling of NOPE / IDLE indications from TransceiverStalled07/28/2018

History

#1 Updated by laforge almost 2 years ago

  • Related to Bug #2965: No measurement reports sent for channels other than TCH added

#2 Updated by laforge almost 2 years ago

The entire measurement computation + reporting process is driven by lchan_meas_check_compute(), which is only called from the l1sap whenever a PRIM_INFO_MEAS is reported up. In absence of bursts/blocks, this primitive is not reported and subsequently no measurement reports are generated.

What we should do instead is track the frame number and whenever the SACCH multiframe ends, we should trigger a RSL MEAS REP. the missing uplink bursts all have to count as erroneous, i.e. 100% bit errors.

The entire dualism of PH_DATA.ind / PH_TCH.ind containg (unsued) measurement data, but then having a separate PRIM_INFO_MEAS is odd to begin with. The measurements should always accompany the PH-DATA.ind / PH-TCH.ind and PRIM_INFO_MEAS should be abandoned.

#3 Updated by laforge almost 2 years ago

  • Status changed from New to In Progress

#4 Updated by laforge almost 2 years ago

#5 Updated by laforge almost 2 years ago

#6 Updated by laforge almost 2 years ago

  • Related to Bug #2987: OsmoBTS RxQual/RxLev averaging broken if bursts are missign added

#7 Updated by laforge over 1 year ago

  • Assignee set to dexter

#8 Updated by dexter over 1 year ago

  • % Done changed from 0 to 50

One of the most sensitive parts here is when the SACCH block drops out because then the measurement computation process is not triggered. As we receive measurement indications we need to compare the frame number from the currently received one against the frame number of the previous one in order to check if we already crossed the boundary of a SACCH interval. I have now added a patch that does exactly that. Now a dropout of the SACCH interval will not supress the measurement computation anymore.

See also: https://gerrit.osmocom.org/#/c/osmo-bts/+/10492

However, we are not done yet. When we get a complete dropout with no measurements at all (battery died, tunnel etc...) then we have a problem. For this I would propose to use the time indication to implement a timeout. When lets say a quarter of a SACCH interval has passed without executing the computation/measurement report we could forcefully trigger the computation to generate a report. Unfortunately we are still not good in handling intervals with no measurements so I think its better to wait until that is fixed. See also #2987

#9 Updated by dexter over 1 year ago

The patch mentioned above is still in review. I have fixed the review issues now.

I also found out that we not really resetting the measurement states. Since the lchans are statically allocated (i think so, correct me if I am wrong) the states are not reset when the channel is re-opened by another subscriber. I now added a centralized function that resets everything and that is called from rsl.c when the channel is acknowledged.

See also: https://gerrit.osmocom.org/#/c/osmo-bts/+/10554/

#10 Updated by dexter over 1 year ago

Unfortunately change Gerrit change 10554 causes problems with TTCN3 tests TC_meas_res_sign_sdcch4 and TC_meas_res_sign_sdcch8. The test complains ("No MEAS RES received at all") that there were no measurement reports received but when checking the pcap files one can see that there are indeed measurement reports. Presumably there is (also) a problem with the test expectation.

While trying to fix the problems with the TTCN3 tests I still found some remaining problems that need to be fixed, see also:
https://gerrit.osmocom.org/10564

#11 Updated by dexter over 1 year ago

  • % Done changed from 50 to 90

All related patches are merged, unfortunately there is a problem now with the following to TTCN3 tests.

TC_meas_res_sign_sdcch4
TC_meas_res_sign_sdcch8

This is presumably a problem with the test expectation. Experiments show that even though the test is supposed to generate correct intervals the code always detects lost interval ends. Also TTCN3 complains that it would not see any measurement reports, but the pcap files show plenty of them. I also checked the numbering, it starts at 0 and looks good so far.

#12 Updated by daniel over 1 year ago

dexter wrote:

TC_meas_res_sign_sdcch4
TC_meas_res_sign_sdcch8

This is presumably a problem with the test expectation. Experiments show that even though the test is supposed to generate correct intervals the code always detects lost interval ends. Also TTCN3 complains that it would not see any measurement reports, but the pcap files show plenty of them. I also checked the numbering, it starts at 0 and looks good so far.

The pcap shows plenty measurement reports, but the ttcn3 log also shows quite a few being processed/received. After a while it seems the Measurement Report from LAPDm is not generating a new Measurement Report on RSL.

See https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/221/artifact/logs/bts-tester/BTS_Tests.TC_meas_res_sign_sdcch4.pcap (also attached)
Packet #281 is the last RSL MEAS Rep on RSL while more are coming in from the "MS".

It's easy to filter for measurement reports in wireshark like this:
(gsm_a.dtap.msg_rr_type == 0x15)

If you append && gsm_abis_rsl you can see that 16 measurement reports are being received for SDCCH/4,subchan 0 and then only one for subchan 1 (packet #281). After that any further measurement reports are ignored from the bts it seems.

Looking at the MS side there are 15 MEAS reports for subchan 0 as well as 15 for subchan 1. After timing out on subchan 1 the test aborts, so neither subchan 2 or 3 are attempted.

It's interesting that the RSL reports number 16 (Measurement result number 0 - 15) while the MS only sends 15.

#13 Updated by dexter over 1 year ago

I have found the problem now. I have confused Subslots and Timeslots for SDCCH/4 and SDCCH/8. This is now fixed and unit tests are added. The TTCN3 tests should be fine again when this is merged.

https://gerrit.osmocom.org/#/c/osmo-bts/+/10654 measurement: fix is_meas_overdue() and increase testcoverage

#14 Updated by dexter about 1 year ago

See also Ticket #3502 as the problem is closely linked to this one.

#15 Updated by dexter about 1 year ago

We have discussed the timing problem now and we came to the conclusion that one can not really rely on the ordering between SACCH and TCH voice since, those are different channels and it may be very vendor specific through which queues the blocks are sent. So at least a slight timing deviation must be accepted here. Unfortunately this renders my approach to detect the SACCH interval end useless.

The only way to fix this seems to be the usage of two buckets. We would collect measurements. By the frame number we can see if the measurement has to go into the bucket for the current interval or if it as to go into the bucket for the next interval. We would then notice the missed interval end by a timeout. If we start getting only measurements for the next-interval-bucket for some time we can flush the current-interval-bucket. This is of course a bit complex so we first need to see if there are other ways around.

Concerning osmo-bts-sysmo, there is good news. The phy has the option to space out unreadable bursts but we intentionally disabled this functionality, so in theory osmo-bts sysmo should never loose a block. Even when the no block is received it will still hand over a measurement and data of length zero. In order to verify that I made an experiment. I have set up a call and took the battery out of the phone. This is a measurement period from the time frame where the battery was already out:


<0004> measurement.c:442 025072/18/08/31/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=0
<0007> l1sap.c:1130 025072/18/08/31/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025077/18/13/36/37 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=1
<0007> l1sap.c:1130 025077/18/13/36/37 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025081/18/17/40/41 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=2
<0007> l1sap.c:1130 025081/18/17/40/41 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025085/18/21/44/45 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=3
<0007> l1sap.c:1130 025085/18/21/44/45 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025090/18/00/49/02 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=4
<0007> l1sap.c:1130 025090/18/00/49/02 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025094/18/04/02/06 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=5
<0007> l1sap.c:1130 025094/18/04/02/06 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025098/18/08/06/10 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=6
<0007> l1sap.c:1130 025098/18/08/06/10 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025103/18/13/11/15 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=7
<0007> l1sap.c:1130 025103/18/13/11/15 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025107/18/17/15/19 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=8
<0007> l1sap.c:1130 025107/18/17/15/19 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025111/18/21/19/23 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=9
<0007> l1sap.c:1130 025111/18/21/19/23 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025116/18/00/24/28 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=10
<0007> l1sap.c:1130 025116/18/00/24/28 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025120/18/04/28/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=11
<0007> l1sap.c:1130 025120/18/04/28/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025124/18/08/32/36 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=12
<0007> l1sap.c:1130 025124/18/08/32/36 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025129/18/13/37/41 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=13
<0007> l1sap.c:1130 025129/18/13/37/41 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025133/18/17/41/45 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=14
<0007> l1sap.c:1130 025133/18/17/41/45 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025137/18/21/45/49 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=15
<0007> l1sap.c:1130 025137/18/21/45/49 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025142/18/00/50/02 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=16
<0007> l1sap.c:1130 025142/18/00/50/02 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025146/18/04/03/06 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=17
<0007> l1sap.c:1130 025146/18/04/03/06 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025150/18/08/07/10 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=18
<0007> l1sap.c:1130 025150/18/08/07/10 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025155/18/13/12/15 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=19
<0007> l1sap.c:1130 025155/18/13/12/15 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025159/18/17/16/19 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=20
<0007> l1sap.c:1130 025159/18/17/16/19 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025163/18/21/20/23 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=21
<0007> l1sap.c:1130 025163/18/21/20/23 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025168/18/00/25/28 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=22
<0007> l1sap.c:1130 025168/18/00/25/28 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025172/18/04/29/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=23
<0007> l1sap.c:1130 025172/18/04/29/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025102/18/12/10/14 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=24
<0004> measurement.c:319 (bts=0,trx=0,ts=2,ss=0) meas period end fn:25102, fn_mod:12, status:1, pchan:TCH/F
<0004> measurement.c:658 (bts=0,trx=0,ts=2,ss=0) Calculating measurement results for physical channel:TCH/F
<0004> measurement.c:680 (bts=0,trx=0,ts=2,ss=0) received 25 UL measurements, expected 25
<0004> measurement.c:732 (bts=0,trx=0,ts=2,ss=0) received UL measurements contain 3 SUB measurements, expected 3
<0004> measurement.c:734 (bts=0,trx=0,ts=2,ss=0) replaced 0 measurements with dummy values, from which 0 were SUB measurements
<0004> measurement.c:773 (bts=0,trx=0,ts=2,ss=0) Computed TA256( 171798681) BER-FULL(10.16%), RSSI-FULL(-113dBm), BER-SUB(14.54%), RSSI-SUB(-114dBm)
<0004> measurement.c:786 (bts=0,trx=0,ts=2,ss=0) UL MEAS RXLEV_FULL(0), RXLEV_SUB(0),RXQUAL_FULL(6), RXQUAL_SUB(7), num_meas_sub(3), num_ul_meas(25) 

From what I can see this looks very good. All measurements are there and the period end is detected properly after the 25th measurement. I can not say to much about the computation result, but shouln't BER-FULL be somewhere near 100%. Maybe this needs to be checked. I don't know.

Note: What is valid for osmo-bts-sysmo is also valid for osmo-bts-litecell15.

For osmo-bts-trx the behavior is completely different. When I take the RX-Antenna of the USRP-B200 of and put the phone approx 1m away I can already see dropouts, also at the SACCH with all the consequences of missed measurement intervals.

Our Idea is now to realize something similar with osmo-bts-trx. We first need to pinpoint where the bursts/frames/blocks get spaced out. It could be that they are already spaced out at osmo-trx. An idea is to take a look at the mechanism that receives the UDP packets from the TRX and check for lost packets there. In case a packet is missing we could substitute it with a dummy. We think it is a good idea to make the substitution in osmo-bts-trx since there are already some variants of trx (e.g. fake-trx) around and checking and patching them all might not be such a good idea.

We will now take out the existing interval end detection logic and approach the problem as described above.

#16 Updated by dexter about 1 year ago

In order to have a functioning measurement reporting again I have removed fix is_meas_overdue() now.

https://gerrit.osmocom.org/#/c/osmo-bts/+/10814 measurement: remove missed interval end detection
https://gerrit.osmocom.org/#/c/osmo-bts/+/10815 measurement: fix unit-test test_lchan_meas_process_measurement

During our discussions we realized that a lot of the confusion we experience here comes from the way how measurement reports are handled in osmo-bts. The data and the measurement reports are handled on separate pathes, but it would be actually more natural to have both in one unit, handled on the same path. There is now an issue about that. See: #3530

#17 Updated by pespin about 1 year ago

  • Related to Bug #2700: Odd RTP behavior in case of bad / missing speech frames added

#18 Updated by fixeria about 1 year ago

  • Related to Feature #2977: OsmoBTS measurment processing at L1SAP too complex / pass measurements along with data added

#19 Updated by pespin about 1 year ago

  • Related to Bug #3665: TTCN3 BTS_Tests last SACCH burst received too late -> wrong fake uplink measurement report added

#20 Updated by pespin about 1 year ago

  • Related to Feature #3428: Implement handling of NOPE / IDLE indications from Transceiver added

#21 Updated by dexter about 1 year ago

  • Status changed from In Progress to Stalled

#22 Updated by dexter 25 days ago

(I have re-tested this today. The problem is still present)

#23 Updated by fixeria 24 days ago

  • Related to deleted (Feature #3428: Implement handling of NOPE / IDLE indications from Transceiver)

#24 Updated by fixeria 24 days ago

  • Blocked by Feature #3428: Implement handling of NOPE / IDLE indications from Transceiver added

#25 Updated by fixeria 24 days ago

General handling is implemented in https://gerrit.osmocom.org/c/osmo-bts/+/15989.

#26 Updated by dexter 24 days ago

fixeria Thanks for pointing me to this. I have done an experiment with the code that already exists in master for osmo-trx and osmo-bts-trx. What I did was removing the BTS antenna until the reception got bad enough so that dropouts occurred. Unfortunately I never got any TRX_BI_F_NOPE_IND in trx_data_read_cb(). I wonder if the implementation in osmo-trx is even supports sending of sending TRX_BI_F_NOPE_IND or do I miss something here?

#27 Updated by fixeria 23 days ago

I wonder if the implementation in osmo-trx is even supports sending of sending TRX_BI_F_NOPE_IND or do I miss something here?

I am pretty sure it does. See https://git.osmocom.org/osmo-bts/commit/?id=a1f2b6931ba0e095f571f9715601adb6a819cb63.
Let me check this again with the recent versions of osmo-trx and osmo-bts-trx.

#28 Updated by fixeria 23 days ago

Let me check this again with the recent versions of osmo-trx and osmo-bts-trx.

Checked. I see NOPE / IDLE indications being sent one each timeslot when the MS is in IDLE mode. You should see them in Wireshark (decode-as OsmoTRXD protocol). This dissector is probably not yet available in the release version, so you can also use trx_toolkit/trx_sniff.py from OsmocomBB.

#29 Updated by dexter 22 days ago

fixeria I have it running now and I now see NOPE indications as intended. In general I think your implementation should give me a good starting point to fix the measurement problems. I wonder if it is possible to set .nope_fn = .ul_fn. At least thats what I try at the moment with rx_data_fn. Of course if I do so the rx_data_fn() will fail to decode the frame but this can be catched and we can send an measurement indication up anyway. However this does not fix the missing measurement reports yet. There is still something stuck.

I also wonder if we could also risk another attempt to detect a missing SACCH frame by looking at the TCH frame numbers. The TCH frame numbers were incorrect last time. Now they are correct. However I think using nope_ind frames is the much cleaner solution but as far as I know this would not help with SC5, which still relies on V0 of the TRXD protocol.

#30 Updated by fixeria 19 days ago

Hi,

I wonder if it is possible to set .nope_fn = .ul_fn. At least thats what I try at the moment with rx_data_fn.

yes, but there is an important detail: NOPE / IDLE indications do not carry a burst, only the measurements. This means that both bi->burst[] and bi->burst_len are not initialized (ASAN may not be happy). You probably need a wrapper-function (e.g. rx_nope_fn) that would initialize (memset(bi->burst, 0x00, ...) would be enough) them and call rx_data_fn().

However this does not fix the missing measurement reports yet. There is still something stuck.

This is odd. As far as I can see, rx_data_fn() does call l1if_process_meas_res() even if gsm0503_xcch_decode() fails...

Regarding the testing procedure, feel free to use fake_trx.py (https://osmocom.org/projects/baseband/wiki/FakeTRX). It features 'FAKE_DROP' TRXC command, that can omit a given amount of bursts. Please note that currently FakeTRX does not support sending of NOPE / IDLE indications, but you can easily modify FakeTRX::sim_burst_drop() to do that.

#31 Updated by fixeria 18 days ago

See https://gerrit.osmocom.org/c/osmocom-bb/+/16092/ "trx_toolkit/fake_trx.py: send NOPE.ind in case of path loss simulation".

#32 Updated by dexter 16 days ago

I have now analyzed the problem again. First of all it is important to understand
how the measurement result computation is triggered. There are two different
triggers required. We need to receive a measurement for the SACCH in order to
terminate the measurement interval and to compute the results. However this only
carries out the result computation but does not trigger sending the measurement
report via RSL. In order to get the result sent via RSL we need an intact SACCH
block. If we do not have that we will never see a measurement result on RSL.

When we use the NOPE indications we can make sure that we never miss a
measurement result. So from that perspective this solves the problems we have
with to little measurements or measurments leaking from one interval into the
next. However, we still have the problem on the RSL side unless we make up a
fake SACCH block in case the SAACH block we received is bad. This is probably
not the best solution. We might be able to circumvent this by triggering the
RSL report somehow when we notice that the inverval is complete, but no SAACH
was sent.

Some time ago there was an attempt to implement a detection logic that can
detect if a SAACH block was lost by looking at the TCH blocks and thier frame
numbers. This idea was discarded because we noticed that the frame numbers
somehow behaved strangely. I think this confusion came from the bugs in
osmo-bts-trx that calculated the frame numbers for the blocks wrongly. I have
now compared the behavior of osmo-bts-trx and osmo-bts-sysmo and I can see that
the frame numbers arrive exactly as expected. One can even see frame 99
arriving after frame 25 that concludes the interval, which is due to the
diagonal interleaving.

========================> NOT COMPLETE =====> fn=19218, fn%104=82
========================> NOT COMPLETE =====> fn=19222, fn%104=86
========================> NOT COMPLETE =====> fn=19227, fn%104=91
========================> NOT COMPLETE =====> fn=19231, fn%104=95
========================> COMPLETE =========> fn=19161, fn%104=25
========================> NOT COMPLETE =====> fn=19235, fn%104=99 <====
========================> NOT COMPLETE =====> fn=19240, fn%104=0
========================> NOT COMPLETE =====> fn=19244, fn%104=4
========================> NOT COMPLETE =====> fn=19248, fn%104=8

In theory we should be able to detect a missing SAACH by just observing the
frame numbers on the TCH.

Probably we should opt for both methods especially the SC5 will probably not
support the new protocol with the NOPE indications and therefore we will have
to provide enough robustness to fix the problem there.

It certainly makes sense to prvent wrong measurements results when there are
reception problems with the SACCH, but I am not sure if it makes sense to
generate artificial measurement results on total signal loss.

#33 Updated by laforge 15 days ago

On Tue, Nov 19, 2019 at 03:00:28PM +0000, dexter [REDMINE] wrote:

Issue #2975 has been updated by dexter.

File TCH_F1_fn_samples_with_osmo-bts-sysmo.txt added
File TCH_F1_fn_samples_with_osmo-bts-trx.txt added
File TCH_H1-0_fn_samples_with_osmo-bts-sysmo.txt added
File TCH_H1-0_fn_samples_with_osmo-bts-trx.txt added
File TCH_H1-1_fn_samples_with_osmo-bts-sysmo.txt added
File TCH_H1-1_fn_samples_with_osmo-bts-trx.txt added

I have now analyzed the problem again. First of all it is important to understand
how the measurement result computation is triggered. There are two different
triggers required. We need to receive a measurement for the SACCH in order to
terminate the measurement interval and to compute the results. However this only
carries out the result computation but does not trigger sending the measurement
report via RSL. In order to get the result sent via RSL we need an intact SACCH
block. If we do not have that we will never see a measurement result on RSL.

When we use the NOPE indications we can make sure that we never miss a
measurement result. So from that perspective this solves the problems we have
with to little measurements or measurments leaking from one interval into the
next.

However, we still have the problem on the RSL side unless we make up a
fake SACCH block in case the SAACH block we received is bad. This is probably
not the best solution. We might be able to circumvent this by triggering the
RSL report somehow when we notice that the inverval is complete, but no SAACH
was sent.

I don't think having a 'bad frame indication' is a bad idea, like we
have for voice/TCH data? There we also let the RTP code know if the
received frame (codec frame instead of MAC block) was bad.

Some time ago there was an attempt to implement a detection logic that can
detect if a SAACH block was lost by looking at the TCH blocks and thier frame
numbers. This idea was discarded because we noticed that the frame numbers
somehow behaved strangely. I think this confusion came from the bugs in
osmo-bts-trx that calculated the frame numbers for the blocks wrongly. I have
now compared the behavior of osmo-bts-trx and osmo-bts-sysmo and I can see that
the frame numbers arrive exactly as expected. One can even see frame 99
arriving after frame 25 that concludes the interval, which is due to the
diagonal interleaving.

this is great.

In theory we should be able to detect a missing SAACH by just observing the
frame numbers on the TCH.

yes, but what if the TCH or SDCCH is in signaling mode and all of the
frames are bad? then the measurement is never sent.

I'm a big fan of event/clock driven design in a TDMA system. So at the
time the TDMA frame number reaches the point where the measurement
should be sent, we should send it - rather than covering up at a later
point in time.

Probably we should opt for both methods especially the SC5 will probably not
support the new protocol with the NOPE indications and therefore we will have
to provide enough robustness to fix the problem there.

I'm not sure if that is the best way to spend time on that, maybe the TRXDv1 can
simply be added there.

It certainly makes sense to prvent wrong measurements results when there are
reception problems with the SACCH, but I am not sure if it makes sense to
generate artificial measurement results on total signal loss.

I'm not following here. If there is 'total signal loss' then there will
be low RSSI and high BER, and that should be computed and reported as
normal. Why treat this situation different than any other situation?

#34 Updated by dexter 14 days ago

  • Status changed from Stalled to In Progress

I have now implemented the approach the relys on NOPE / IDLE indications. In this mode we can be sure that if we loose a SACCH block we l1sap.c will be informed about this. We can also check if the SACCH was good or bad here (Its done already and seems to be required for channel timeout.) In cases where the SACCH is bad we will trigger the sending of the RSL measurement report from l1sap.c. The report than of course lacks the DTAP measurement report from the MS.

(A lacking DTAP measurement report in the RSL measurement report makes much more sense to me as it clearly indicates the total signal loss. The idea of faking a DTAP measurement report on total signal loss was a bit confusing.)

https://gerrit.osmocom.org/c/osmo-bts/+/16170 rsl: ensure measurement reports are sent

#35 Updated by laforge 13 days ago

On Fri, Nov 22, 2019 at 01:31:36PM +0000, dexter [REDMINE] wrote:

(A lacking DTAP measurement report in the RSL measurement report makes much more sense to me as it clearly indicates the total signal loss. The idea of faking a DTAP measurement report on total signal loss was a bit confusing.)

this was a misunderstanding. I only suggested to have 'fake' uplink measurements in case we are
missing uplink blocks/measurements, since the BTS must always report its own uplink measurements.

The downlink measurements from the MS/UE are always optional and can simply be absent, as you wrote.

#36 Updated by dexter 8 days ago

The patch is still in review, while everything looks fine with manual tests, the TTCN3 tests are not happy at all. Currently I am having problems with BTS_Tests.TC_meas_res_sign_tchf, which expects the measurement result number counting up from 0. When running the testsuite sometimes, in the very beginning I get a measurement result without the RR measurement report. This type of reports is not counted by the tests and so the next report that is complete with RR measurement report has a non matching measurement result number.

I have traced down the origin of this first incomplete measurement report. It is comming from the SACCH loss detection in scheduler_trx.c. I wonder if this is really a problem. Is it even guaranteed that when a TCH Channel is opened the SACCH is immediately present? What if the mobile starts transmitting just a bit later so that the first SACCH interval is bad? Should we suppress those messages or should we change the testcase?

Besides of that I also wonder when a TCH exactly starts. Is the beginning of the TCH somehow aligned with the SACCH interval or does it just start anywhere in the middle of the SACCH interval so that the first SACCH frame may be bad because the interval is chopped off?

#37 Updated by dexter 8 days ago

Attached one finds a trace from the current situation: Packet 40 is the one that is emitted by the SACCH loss detection in scheduler_trx.c. Then one complete measurmenet follows but the tests stops then because the measurement result number is one off. (The test ignores measurement reports without RR measurement report)

#38 Updated by laforge 7 days ago

On Thu, Nov 28, 2019 at 02:39:28PM +0000, dexter [REDMINE] wrote:

I have traced down the origin of this first incomplete measurement report. It is comming from the SACCH loss detection in scheduler_trx.c. I wonder if this is really a problem. Is it even guaranteed that when a TCH Channel is opened the SACCH is immediately present? What if the mobile starts transmitting just a bit later so that the first SACCH interval is bad? Should we suppress those messages or should we change the testcase?

the SACCH is activated on the BTS side immediately. However, the MS may need some additional time until it starts transmitting, both on the dedicated (TCH, SDCCH) as well as on the uplink SACCH.

Besides of that I also wonder when a TCH exactly starts. Is the beginning of the TCH somehow aligned with the SACCH interval or does it just start anywhere in the middle of the SACCH interval so that the first SACCH frame may be bad because the interval is chopped off?

I would suppose this can happen, I'm not aware of any alignment of TCH activation.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)