Project

General

Profile

Actions

Bug #2662

closed

osmo-gsm-tester: Test voice:trx-sysmocell5000 sometimes fails since update to use osmo-mgw

Added by pespin over 6 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
11/20/2017
Due date:
% Done:

100%

Spec Reference:

Description

Still need to investigate if it's related to the modem registering or to some issue in the call.

Actions #1

Updated by pespin over 6 years ago

  • Assignee set to 55360
Actions #2

Updated by pespin over 6 years ago

It seems during the failing tests, the MS is never sending "CM Server Request" after receiving the CCCH Immediate Assignment from the BTS (seen on GSMTAP).

Still, I also see some behavior differences in the BTS between the successful test and the failing test.

Successful:

Um uplink RACH
RSL uplink CHANnel ReQuireD
RSL downlink CHANel ACTIVation
RSL uplink CHANnel ACTIvation ACKnowledge
Um downlink CCCH Immediate Assignment
Um uplink CM Servier Request
RSL uplink CM Service Request
RSL downlink CM Service Accept
Um downlnk CM Service Accept

However, on the failing test run, I first see that BSC send to BTS (downlink) a "Immediate Assignment Reject" as a response to the uplink "CHANnelReQuireD", and that Reject is sent over from BTS to the MS. After that, a few secs later the MS retries again and it receives successfully a "Immediate Assignment", but the MS doesn't continue with "Um uplink CM Servier Request", and eventually the channel is released after around half a second.

So the main difference between working and not working tests is that in the non-working one, the BSC/BTS sends to the MS one or more "Immediate Assignment Reject" at some point.

Actions #3

Updated by neels over 6 years ago

since update to use osmo-mgw

can we verify in a simple way whether it is actually related to the MGW change, or whether that was just coincidence?

Actions #4

Updated by pespin over 6 years ago

I recall seeing it failing since then, but it not be the real cause, for instance it could be a timing issue between components being setup correctly.

Actions #5

Updated by pespin over 6 years ago

By the way, looking at STP side, in both successful and failing tests I see that the first "Location Updating Request" is first answered with "Identity Request" and after 5 seconds of no receiving response, a "location Updating Reject" is sent with "Reject cause: Congestion (22)". Then after a 5 more seconds, we get a new "Location Updating Request" which works fine. Definitely something worth looking at but doesn't seem important for this failure.

Actions #6

Updated by pespin over 6 years ago

I don't see anything different/wrong in STP, so the issue so far seems to come from osmo-bsc sometimes rejecting to allocate channels for some unknown reason ("Immediate Assign Reject", gsm_a.dtap.msg_rr_type == 0x3a)

Actions #7

Updated by pespin over 6 years ago

Here we see (trial #3999, failing running job) the BTS log during some rejects:


20171129072139119 DRSL <0004> abis_rsl.c:1754 (bts=0,trx=0,ts=0,ss=0) SACCH deactivation timeout.
20171129072139119 DRSL <0004> abis_rsl.c:868 (bts=0,trx=0,ts=0,ss=0) RF Channel Release due to error: 1
20171129072139119 DRLL <0000> abis_rsl.c:1202 (bts=0,trx=0,ts=0,ss=0) RSL RLL RELEASE REQ (link_id=0x00, reason=1)
20171129072139119 DRSL <0004> abis_rsl.c:899 (bts=0,trx=0,ts=0,ss=0) state RELEASE REQUESTED -> RELEASE DUE ERROR
20171129072139121 DRLL <0000> abis_rsl.c:2138 (bts=0,trx=0,ts=0,ss=0) SAPI=0 RELEASE CONFIRMATION
20171129072139121 DRSL <0004> abis_rsl.c:940 (bts=0,trx=0,ts=0,ss=0) RF CHANNEL RELEASE ACK
20171129072139519 DRSL <0004> abis_rsl.c:1865 BTS 0 CHAN RQD: reason: Location updating (ra=0x09, neci=0x01, chreq_reason=0x03)
20171129072139519 DRLL <0000> chan_alloc.c:353 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) Allocating lchan=1 as SDCCH
20171129072139519 DRSL <0004> abis_rsl.c:1935 (bts=0,trx=0,ts=0,ss=1) Activating ARFCN(868) SS(1) lctype SDCCH r=LOCATION_UPDATE ra=0x09 ta=12
20171129072139520 DRSL <0004> abis_rsl.c:596 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) Tx RSL Channel Activate with act_type=INITIAL
20171129072139520 DRSL <0004> abis_rsl.c:625 (bts=0,trx=0,ts=0,ss=1) state NONE -> ACTIVATION REQUESTED
20171129072139521 DRSL <0004> abis_rsl.c:1616 (bts=0,trx=0,ts=0,ss=1) CHANNEL ACTIVATE ACK
20171129072139521 DRSL <0004> abis_rsl.c:1286 (bts=0,trx=0,ts=0,ss=1) state ACTIVATION REQUESTED -> ACTIVE
20171129072140083 DRSL <0004> abis_rsl.c:1734 (bts=0,trx=0,ts=3,ss=0) T3101 expired: no response to IMMEDIATE ASSIGN
20171129072140083 DRSL <0004> abis_rsl.c:868 (bts=0,trx=0,ts=3,ss=0) RF Channel Release due to error: 1
20171129072140083 DRSL <0004> abis_rsl.c:778 (bts=0,trx=0,ts=3,ss=0) DEACTivate SACCH CMD
20171129072140083 DRSL <0004> abis_rsl.c:899 (bts=0,trx=0,ts=3,ss=0) state ACTIVE -> RELEASE DUE ERROR
20171129072140084 DRSL <0004> abis_rsl.c:940 (bts=0,trx=0,ts=3,ss=0) RF CHANNEL RELEASE ACK
20171129072140618 DRSL <0004> abis_rsl.c:1734 (bts=0,trx=0,ts=5,ss=0) T3101 expired: no response to IMMEDIATE ASSIGN
20171129072140618 DRSL <0004> abis_rsl.c:868 (bts=0,trx=0,ts=5,ss=0) RF Channel Release due to error: 1
20171129072140618 DRSL <0004> abis_rsl.c:778 (bts=0,trx=0,ts=5,ss=0) DEACTivate SACCH CMD
20171129072140618 DRSL <0004> abis_rsl.c:899 (bts=0,trx=0,ts=5,ss=0) state ACTIVE -> RELEASE DUE ERROR
20171129072140659 DRSL <0004> abis_rsl.c:940 (bts=0,trx=0,ts=5,ss=0) RF CHANNEL RELEASE ACK
20171129072141052 DRSL <0004> abis_rsl.c:827 (bts=0,trx=0,ts=2,ss=0) is back in operation.
20171129072141052 DRSL <0004> abis_rsl.c:828 (bts=0,trx=0,ts=2,ss=0) state RELEASE DUE ERROR -> NONE
20171129072141158 DRSL <0004> abis_rsl.c:1865 BTS 0 CHAN RQD: reason: answer to paging (ra=0x99, neci=0x01, chreq_reason=0x01)
20171129072141158 DRLL <0000> chan_alloc.c:353 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) Allocating lchan=2 as SDCCH
20171129072141158 DRSL <0004> abis_rsl.c:1935 (bts=0,trx=0,ts=0,ss=2) Activating ARFCN(868) SS(2) lctype SDCCH r=PAGING ra=0x99 ta=24
20171129072141158 DRSL <0004> abis_rsl.c:596 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) Tx RSL Channel Activate with act_type=INITIAL
20171129072141158 DRSL <0004> abis_rsl.c:625 (bts=0,trx=0,ts=0,ss=2) state NONE -> ACTIVATION REQUESTED
20171129072141159 DRSL <0004> abis_rsl.c:1616 (bts=0,trx=0,ts=0,ss=2) CHANNEL ACTIVATE ACK
20171129072141159 DRSL <0004> abis_rsl.c:1286 (bts=0,trx=0,ts=0,ss=2) state ACTIVATION REQUESTED -> ACTIVE
20171129072141915 DRSL <0004> abis_rsl.c:1865 BTS 0 CHAN RQD: reason: emergency call (ra=0xa3, neci=0x01, chreq_reason=0x00)
20171129072141915 DRLL <0000> chan_alloc.c:353 (bts=0,trx=0,ts=2,pchan=TCH/F) Allocating lchan=0 as TCH_F
20171129072141915 DRSL <0004> abis_rsl.c:1935 (bts=0,trx=0,ts=2,ss=0) Activating ARFCN(868) SS(0) lctype TCH_F r=EMERGENCY ra=0xa3 ta=12
20171129072141915 DRSL <0004> abis_rsl.c:596 (bts=0,trx=0,ts=2,pchan=TCH/F) Tx RSL Channel Activate with act_type=INITIAL
20171129072141915 DRSL <0004> abis_rsl.c:625 (bts=0,trx=0,ts=2,ss=0) state NONE -> ACTIVATION REQUESTED
20171129072141916 DRSL <0004> abis_rsl.c:1616 (bts=0,trx=0,ts=2,ss=0) CHANNEL ACTIVATE ACK
20171129072141916 DRSL <0004> abis_rsl.c:1286 (bts=0,trx=0,ts=2,ss=0) state ACTIVATION REQUESTED -> ACTIVE
20171129072143122 DRSL <0004> abis_rsl.c:827 (bts=0,trx=0,ts=0,ss=0) is back in operation.
20171129072143122 DRSL <0004> abis_rsl.c:828 (bts=0,trx=0,ts=0,ss=0) state RELEASE DUE ERROR -> NONE
20171129072144086 DRSL <0004> abis_rsl.c:827 (bts=0,trx=0,ts=3,ss=0) is back in operation.
20171129072144086 DRSL <0004> abis_rsl.c:828 (bts=0,trx=0,ts=3,ss=0) state RELEASE DUE ERROR -> NONE
20171129072144660 DRSL <0004> abis_rsl.c:827 (bts=0,trx=0,ts=5,ss=0) is back in operation.
20171129072144660 DRSL <0004> abis_rsl.c:828 (bts=0,trx=0,ts=5,ss=0) state RELEASE DUE ERROR -> NONE
20171129072148206 DRSL <0004> abis_rsl.c:1865 BTS 0 CHAN RQD: reason: call re-establishment (ra=0x43, neci=0x01, chreq_reason=0x02)
20171129072148206 DRLL <0000> chan_alloc.c:353 (bts=0,trx=0,ts=3,pchan=TCH/F) Allocating lchan=0 as TCH_F
20171129072148206 DRSL <0004> abis_rsl.c:1935 (bts=0,trx=0,ts=3,ss=0) Activating ARFCN(868) SS(0) lctype TCH_F r=CALL ra=0x43 ta=1
20171129072148206 DRSL <0004> abis_rsl.c:596 (bts=0,trx=0,ts=3,pchan=TCH/F) Tx RSL Channel Activate with act_type=INITIAL
20171129072148206 DRSL <0004> abis_rsl.c:625 (bts=0,trx=0,ts=3,ss=0) state NONE -> ACTIVATION REQUESTED
20171129072148207 DRSL <0004> abis_rsl.c:1616 (bts=0,trx=0,ts=3,ss=0) CHANNEL ACTIVATE ACK
20171129072148208 DRSL <0004> abis_rsl.c:1286 (bts=0,trx=0,ts=3,ss=0) state ACTIVATION REQUESTED -> ACTIVE
20171129072149522 DRSL <0004> abis_rsl.c:1734 (bts=0,trx=0,ts=0,ss=1) T3101 expired: no response to IMMEDIATE ASSIGN
20171129072149522 DRSL <0004> abis_rsl.c:868 (bts=0,trx=0,ts=0,ss=1) RF Channel Release due to error: 1
20171129072149522 DRSL <0004> abis_rsl.c:778 (bts=0,trx=0,ts=0,ss=1) DEACTivate SACCH CMD
20171129072149522 DRSL <0004> abis_rsl.c:899 (bts=0,trx=0,ts=0,ss=1) state ACTIVE -> RELEASE DUE ERROR
20171129072149524 DRSL <0004> abis_rsl.c:940 (bts=0,trx=0,ts=0,ss=1) RF CHANNEL RELEASE ACK
20171129072151160 DRSL <0004> abis_rsl.c:1734 (bts=0,trx=0,ts=0,ss=2) T3101 expired: no response to IMMEDIATE ASSIGN
20171129072151160 DRSL <0004> abis_rsl.c:868 (bts=0,trx=0,ts=0,ss=2) RF Channel Release due to error: 1
20171129072151160 DRSL <0004> abis_rsl.c:778 (bts=0,trx=0,ts=0,ss=2) DEACTivate SACCH CMD
20171129072151160 DRSL <0004> abis_rsl.c:899 (bts=0,trx=0,ts=0,ss=2) state ACTIVE -> RELEASE DUE ERROR
20171129072151199 DRSL <0004> abis_rsl.c:940 (bts=0,trx=0,ts=0,ss=2) RF CHANNEL RELEASE ACK

Interestingly, at some point the message "is back in operation." is printed and afterwards TCH channels can be allocated (and of course see in wireshark at that time it answers the Immendiate Assignments correctly without rejecting).

Actions #8

Updated by pespin over 6 years ago

It seems the modem is behaving in a weird way, creating a RACH asking for a TCH channel every aprox 4 seconds with a different RACH identifier, which makes the BTS/BSC allocate a new chan each time.

There can be several possible issues:
- Modem not working properly and not answering to the "Immediate Assignment" message.
- Modem not receiving the "Immediate Assignment" message due to bad signal.
- BTS not receiving the SABM CM Service Request message due to bad receiving sensibility, or modem not using enough power to transmit it.
- bad clock sync of bts and ms?

VTY config "ms max power 33" ensure we allow the modem send all power required, and measurements show the ms is using "31", which according to TS 05.05 page 12 it translates to 32dBm in DCS1800 (we use arfcn 868).

Harald suggested looking at the power measurements in Uplink which are provided in the MEASurement REsult (gsm_abis_rsl.msg_type == 0x28) messages, and compare those in successful and failing tests, or between different BTS. The numbers in there probably need to be translated to dBm according to some spec.

Alternatively they can also be seen more clearly with "show lchan" in VTY. There also exist "feas_feed" vty option to set up a udp stream with the measurements, and there are cmdline tools to look at the results from the socket (meas_viz).

Actions #9

Updated by pespin over 6 years ago

From TS 48.058 section "9.3.25 Uplink Measurements":

This element is used to report the results of the TRX measurements on the uplink radio path of an activated basic radio
channel. It is of variable length.

The Length field indicates in binary the remaining length of the element (octets 3-N).
Octets 3-5 contain results from measurements made by TRX on the uplink.
The Supplementary Measurement Information field (octets 6-N) may carry additional operator dependent information.
The DTXd field (octet 3) indicates whether DTX was employed by TRX on the downlink transmission during the
measurement period.
The RXLEV-FULL-up and RXLEV-SUB-up fields (octets 3-4) report the average uplink signal level as measured by
TRX over the measurement period on all slots and on a subset of the slots respectively (see 3GPP TS 45.008). Each
field is coded as defined in 3GPP TS 45.008.
The RXQUAL-FULL-up and RXQUAL-SUB-up fields (octet 5) report the average uplink quality as measured by TRX
over the measurement period on all slots and on a subset of the slots respectively (see 3GPP TS 45.008). Each field is
coded as defined in 3GPP TS 45.008.

On 45.008, I think the related section is "8.4 Measurement reporting".

In sucessful voice test of sysmocell500 I see Mearurement Results with the following values:

{RXLEV-FULL, RXLEV-SUB, RXQUAL-FULL, RXQUAL-SUB}
{13,13,7,7}
{63,63,0,0}
{63,63,0,0}
{57,57,5,5}
{57,57,5,5}
{53,53,0,0}
{58,58,0,0}

Looking at results from a voice test using sysmobts, I get RXLEV values between 17 and 30, and RXQUAL is always 0.
With B200, I get usually RXLEV 63 but it drops once to 43 with QUAL=7. Qual moves from 0 to 7.
Octasic: RXLEV ranges 0-45, QUAL ranges 0-7.

Actions #10

Updated by pespin over 6 years ago

In 45.008, section 8.1.4 get can get the RXLEV table:

Range of parameter
The measured signal level shall be mapped to an RXLEV value between 0 and 63, as follows:
RXLEV 0 = less than -110 dBm + SCALE.
RXLEV 1 = -110 dBm + SCALE to -109 dBm + SCALE.
RXLEV 2 = -109 dBm + SCALE to -108 dBm + SCALE.
:
:
RXLEV 62 = -49 dBm + SCALE to -48 dBm + SCALE.
RXLEV 63 = greater than -48 dBm + SCALE.

where SCALE is an offset that is used in the ENHANCED MEASUREMENT REPORT and PACKET ENHANCED
MEASUREMENT REPORT messages, otherwise it is set to 0.

And in section 8.2.4 the RXQUAL table:

Range of parameter RXQUAL
When the quality is assessed over the full-set and sub-set of frames defined in subclause 8.4, eight levels of RXQUAL
are defined and shall be mapped to the equivalent BER before channel decoding as follows:

[Not copied here due to wrong copy-paste format]

For a failing test run, I get the following measurement report values:

{RXLEV-FULL, RXLEV-SUB, RXQUAL-FULL, RXQUAL-SUB}
{13,13,7,7} -->  signal = -97dBm, BER > 12.8%
{63,63,0,0} -->  signal > -48dBm, BER <0.2%
{52,52,5,5} -->  signal = -58 dBm, BER between 3.2% and 6.4%

Which look similar to the working scenario (first measurement being quite bad but following ones being good).

Actions #11

Updated by pespin over 6 years ago

I submitted a patch to wireshark's gerrit to add translation of the Uplink Measurements values as it is already done in Measurement Results: https://code.wireshark.org/review/#/c/24663

Actions #12

Updated by pespin over 6 years ago

Regarding the downlink measurements done by the MS, which can be found in the same MEASurement Report (field RXLEV-FULL-SERVING-CELL and RXQUAL-FULL-SERVING-CELL, following same range 0-63), usually they contain the value 43 which relates to -68dBm <= x < -67dBm, which is a good level, and quality is usually 0, meaning BER < 0.2%.

Looking at other failed cases, it can be seen that not always fails at the same state of call setup. Sometimes (eg. run #4000) it goes well up to (CC) Call Confirmed, but then when MSC attemps to assign a channel by means of "Assignment Request", BSC answers to it with "Assignment Failure" and BSSMAP cause "Radio interface message failure".

If we look at the osmo-bsc log during that transaction, we see the following:

20171129092144498 DMGCP <000b> osmo_bsc_mgcp.c:302 MGW(MGW_4)[0x1a56cd0]{ST_ASSIGN_PROC}: MGW proceeding assignment request...
20171129092144498 DRLL <0000> chan_alloc.c:366 Failed to allocate TCH_H channel
20171129092144498 DMSC <000a> bsc_api.c:221 No free channel.
20171129092144498 DMSC <000a> osmo_bsc_api.c:471 Tx MSC ASSIGN FAIL
20171129092144498 DMSC <000a> osmo_bsc_sigtran.c:340 Tx MSC ASSIGNMENT FAIL
20171129092144498 DMSC <000a> osmo_bsc_sigtran.c:361 Sending connection (id=4) oriented data to MSC: RI=SSN_PC,PC=0.0.1,SSN=BSSAP,GTI=NO_GT (00 04 03 04 01 00 )

That's most probably due to the same issues seen before in which the modem is requesting channels all the time and then doing nothing after Immediate Assignment.

Worth remark though: When the BSC sends the "Assignment Failure to the MSC", nothing is done to terminate the call or re-try again. Instead, nothing really happens during 30 seconds, time after which the MS decides to send a (CC) Disconnect message to the BTS which get up to the MSC and down again to the other MS. It seems someone should have a timer to re-try during this 30 seconds, but doesn't seem to happen.

Actions #13

Updated by pespin about 6 years ago

  • Subject changed from osmo-gsm-tester: Test voice:trx-sysmocell5000 sometimes since update to use osmo-mgw to osmo-gsm-tester: Test voice:trx-sysmocell5000 sometimes fails since update to use osmo-mgw
Actions #14

Updated by laforge about 4 years ago

is this issue still present? Do we think it's an osmo-gsm-tester bug or a bug in the osmocom stack?

Actions #15

Updated by pespin about 4 years ago

  • Status changed from New to Resolved
  • Assignee changed from 55360 to pespin
  • % Done changed from 0 to 100

Voicecalls looks stable now after I reenable ngihtly jobs for osmo-gsm-tester a few days ago. It probably got fixed with the new firmware we flashed plus any other osmocom fixes.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)