Project

General

Profile

Bug #2898

OsmoBSC can generate BSSMAP ASSIGNMENT FAIL after BSSMAP ASSIGNMENT COMPLETE

Added by laforge over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
01/29/2018
Due date:
% Done:

0%

Spec Reference:

Description

I've just observed that there is at least one scenario in which we are sending a BSSMAP ASSIGNMENT FAIL after we've already confirmed it's successful
(this is with laforge/fsm branch):

Mon Jan 29 22:22:17 2018 DLINP <0013> input/ipa.c:263 accept()ed new link from 127.0.0.1 to port 3003
Mon Jan 29 22:22:17 2018 DRSL <0004> bsc_init.c:332 bootstrapping RSL for BTS/TRX (0/0) on ARFCN 871 using MCC=1 MNC=1 LAC=1 CID=0 BSIC=63
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:1891 BTS 0 CHAN RQD: reason: answer to paging (ra=0x23, neci=0x01, chreq_reason=0x01)
Mon Jan 29 22:22:19 2018 DRLL <0000> chan_alloc.c:412 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) Allocating lchan=0 as SDCCH
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:1973 (bts=0,trx=0,ts=0,ss=0) Activating ARFCN(871) SS(0) lctype SDCCH r=PAGING ra=0x23 ta=0
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:597 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) Tx RSL Channel Activate with act_type=INITIAL
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:626 (bts=0,trx=0,ts=0,ss=0) state NONE -> ACTIVATION REQUESTED
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:1637 (bts=0,trx=0,ts=0,ss=0) CHANNEL ACTIVATE ACK
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:1289 (bts=0,trx=0,ts=0,ss=0) state ACTIVATION REQUESTED -> ACTIVE
Mon Jan 29 22:22:19 2018 DRLL <0000> abis_rsl.c:2177 (bts=0,trx=0,ts=0,ss=0) SAPI=0 ESTABLISH INDICATION
Mon Jan 29 22:22:19 2018 DMSC <0008> fsm.c:243 SUBSCR_CONN[0x5631ee52f2f0]{INIT}: Allocated
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_api.c:225 Tx MSC COMPL L3
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:251 Initializing resources for new SIGTRAN connection to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP...
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_api.c:303 SUBSCR_CONN[0x5631ee52f2f0]{INIT}: Received Event MO-CONNECT.req
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:293 Allocated new connection id: 39
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:297 Opening new SIGTRAN connection (id=39) to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP
Mon Jan 29 22:22:19 2018 DMSC <0008> bsc_subscr_conn_fsm.c:106 SUBSCR_CONN[0x5631ee52f2f0]{INIT}: state_chg to WAIT_CC
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:187 N-CONNECT.cnf(39, )
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:190 SUBSCR_CONN[0x5631ee52f2f0]{WAIT_CC}: Received Event MO-CONNET.cfm
Mon Jan 29 22:22:19 2018 DMSC <0008> bsc_subscr_conn_fsm.c:138 SUBSCR_CONN[0x5631ee52f2f0]{WAIT_CC}: state_chg to ACTIVE
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:204 N-DATA.ind(39, 00 11 01 0b 03 01 08 01 7c 06 01 02 03 04 09 26 7d 01 80 )
Mon Jan 29 22:22:19 2018 DMSC <0008> a_reset.c:187 A-RESET(msc-0)[0x5631ee52d8f0]{CONN}: Received Event 2
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_bssap.c:917 Rx MSC DT1 BSSMAP ASSIGNMENT REQ
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_bssap.c:817 Found matching audio type: full rate SPEECH_V1 for channel_type = { ch_indctr=0x1, ch_rate_type=0x8, perm_spch=[ 01 ] }
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_mgcp.c:283 SUBSCR_CONN[0x5631ee52f2f0]{ACTIVE}: Received Event ASSIGNMENT_CMD
Mon Jan 29 22:22:19 2018 DMSC <0008> bsc_subscr_conn_fsm.c:164 SUBSCR_CONN[0x5631ee52f2f0]{ACTIVE}: ASSIGNMENT_CMD(full_rate=1, chan_mode=1)
Mon Jan 29 22:22:19 2018 DRLL <0000> chan_alloc.c:412 (bts=0,trx=0,ts=1,pchan=TCH/F) Allocating lchan=0 as TCH_F
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:597 (bts=0,trx=0,ts=1,pchan=TCH/F) Tx RSL Channel Activate with act_type=INTRA_NORM_ASS
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:626 (bts=0,trx=0,ts=1,ss=0) state NONE -> ACTIVATION REQUESTED
Mon Jan 29 22:22:19 2018 DMSC <0008> bsc_subscr_conn_fsm.c:167 SUBSCR_CONN[0x5631ee52f2f0]{ACTIVE}: state_chg to WAIT_ASS_CMPL
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:1637 (bts=0,trx=0,ts=1,ss=0) CHANNEL ACTIVATE ACK
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:1289 (bts=0,trx=0,ts=1,ss=0) state ACTIVATION REQUESTED -> ACTIVE
Mon Jan 29 22:22:19 2018 DMSC <0008> bsc_api.c:916 Sending assignment on chan: 0x7f882be7d328
Mon Jan 29 22:22:19 2018 DRLL <0000> abis_rsl.c:2177 (bts=0,trx=0,ts=1,ss=0) SAPI=0 DATA INDICATION
Mon Jan 29 22:22:19 2018 DRLL <0000> chan_alloc.c:538 (bts=0,trx=0,ts=0,ss=0) starting release sequence
Mon Jan 29 22:22:19 2018 DRSL <0004> chan_alloc.c:539 (bts=0,trx=0,ts=0,ss=0) state ACTIVE -> RELEASE REQUESTED
Mon Jan 29 22:22:19 2018 DRLL <0000> abis_rsl.c:1205 (bts=0,trx=0,ts=0,ss=0) RSL RLL RELEASE REQ (link_id=0x00, reason=1)
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:2400 (bts=0,trx=0,ts=1,ss=0) IPAC_BIND speech_mode=0x10 RTP_PAYLOAD=3
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_api.c:443 POSTPONE MSC ASSIGN COMPL
Mon Jan 29 22:22:19 2018 DRLL <0000> abis_rsl.c:2177 (bts=0,trx=0,ts=0,ss=0) SAPI=0 RELEASE CONFIRMATION
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:2567 (bts=0,trx=0,ts=1,ss=0) IPAC_CRCX_ACK LOCAL_IP=127.0.0.2 LOCAL_PORT=9002 CON_ID=1 RTP_PAYLOAD2=0x03 
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_audio.c:55 Connecting BTS to port: 10002 conn: 1
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:2439 (bts=0,trx=0,ts=1,ss=0) IPAC_MDCX IP=127.0.0.3 PORT=10002 RTP_PAYLOAD=3 RTP_PAYLOAD2=3 CONN_ID=1 speech_mode=0x00
Mon Jan 29 22:22:19 2018 DRSL <0004> abis_rsl.c:2578 (bts=0,trx=0,ts=1,ss=0) IPAC_MDCX_ACK CON_ID=1 RTP_PAYLOAD2=0x03 REMOTE_IP=127.0.0.3 REMOTE_PORT=10002 
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_audio.c:94 Tx MSC ASSIGN COMPL (POSTPONED)
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_bssap.c:1055 Sending assignment complete message... (id=39)
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:326 Tx MSC ASSIGNMENT COMPL
Mon Jan 29 22:22:19 2018 DMSC <0008> osmo_bsc_sigtran.c:347 Sending connection (id=39) oriented data to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP (00 14 02 15 00 21 98 2c 01 40 01 7c 06 7f 00 00 03 2a fa 7e 01 80 )
Mon Jan 29 22:22:21 2018 DRSL <0004> abis_rsl.c:1771 (bts=0,trx=0,ts=0,ss=0) T3111 expired: releasing RF Channel
Mon Jan 29 22:22:21 2018 DRSL <0004> abis_rsl.c:873 (bts=0,trx=0,ts=0,ss=0) RF Channel Release
Mon Jan 29 22:22:21 2018 DRSL <0004> abis_rsl.c:943 (bts=0,trx=0,ts=0,ss=0) RF CHANNEL RELEASE ACK
Mon Jan 29 22:22:21 2018 DRSL <0004> abis_rsl.c:86 (bts=0,trx=0,ts=0,ss=0) state RELEASE REQUESTED -> NONE
Mon Jan 29 22:22:25 2018 DMSC <0008> fsm.c:182 SUBSCR_CONN[0x5631ee52f2f0]{WAIT_ASS_CMPL}: Timeout of T10
Mon Jan 29 22:22:25 2018 DMSC <0008> osmo_bsc_sigtran.c:326 Tx MSC ASSIGNMENT FAIL
Mon Jan 29 22:22:25 2018 DMSC <0008> osmo_bsc_sigtran.c:347 Sending connection (id=39) oriented data to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP (00 04 03 04 01 01 )
Mon Jan 29 22:22:25 2018 DMSC <0008> bsc_subscr_conn_fsm.c:447 SUBSCR_CONN[0x5631ee52f2f0]{WAIT_ASS_CMPL}: state_chg to ACTIVE
Mon Jan 29 22:22:31 2018 DMSC <0008> osmo_bsc_sigtran.c:204 N-DATA.ind(39, 00 0c 53 0a 09 01 00 01 02 03 04 05 06 07 )
Mon Jan 29 22:22:31 2018 DMSC <0008> a_reset.c:187 A-RESET(msc-0)[0x5631ee52d8f0]{CONN}: Received Event 2
Mon Jan 29 22:22:31 2018 DMSC <0008> osmo_bsc_bssap.c:917 Rx MSC DT1 BSSMAP CIPHER MODE CMD
Mon Jan 29 22:22:31 2018 DRSL <0004> bsc_api.c:774 (subscr unknown) Cipher Mode: cipher=0 key=0001020304050607 include_imeisv=1
Mon Jan 29 22:22:31 2018 DRLL <0000> abis_rsl.c:2177 (bts=0,trx=0,ts=1,ss=0) SAPI=0 DATA INDICATION
Mon Jan 29 22:22:31 2018 DMSC <0008> osmo_bsc_api.c:167 CIPHER MODE COMPLETE from MS, forwarding to MSC
Mon Jan 29 22:22:31 2018 DMSC <0008> osmo_bsc_sigtran.c:326 Tx MSC CIPHER MODE COMPLETE
Mon Jan 29 22:22:31 2018 DMSC <0008> osmo_bsc_sigtran.c:347 Sending connection (id=39) oriented data to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP (00 03 55 2c 01 )

As the log shows, this is related to the fact that T10 is expiring, and also due to the fact that the MGCP FSM is doing things that are none of its concern (such as delaying the ASSIGNMENT COMPLETE). This needs to be untangled.

20180129-osmobsc-assignment-compl-fail.pcap 20180129-osmobsc-assignment-compl-fail.pcap 4.96 KB pcap file showing the problem laforge, 01/29/2018 09:43 PM

History

#1 Updated by laforge over 1 year ago

  • Priority changed from Normal to Urgent

#2 Updated by laforge over 1 year ago

  • Assignee changed from sysmocom to dexter

#3 Updated by dexter over 1 year ago

I have checked this back. The current state of laforge/fsm is still uses the old implementation in osmo_bsc_mgcp.c, its very likely that this implementation produces wired side effects. However, I did not go too much into details there since the implementation is obsolete.

Instead I checked the code pathes in pmaier/fsm.

  • There we GSM0808_T10_TIMER_NR is used only once. Thats on the transition from ST_WAIT_CRCX_BTS to ST_WAIT_ASS_CMPL. In gscon_fsm_wait_ass_cmpl() there is no assignment complete generated so far. So there it can not generate an early ASSIGNMENT COMPLETE.
  • I also checked if it is possible to generate an assignment failure after the ASSIGNMENT COMPLETE has been sent. The ASSIGNMENT COMPLETE is generated by send_ass_compl(). The call is done when the MDCX for the MSC side completes in gscon_fsm_wait_crcx_msc(). Should the MGW time out we should never get a GSCON_EV_MGW_CRCX_RESP_MSC event and the ASSIGNMENT COMPLETE should never be generated. Instead the timeout handler takes responsibility of that and immediately sends the FSM to ST_ACTIVE. In case the MGW is just slow and we should get the GSCON_EV_MGW_CRCX_RESP_MSC evet in ST_ACTIVE, the event should be tossed and nothing should happen.

For now we should be safe but we should keep this open since the MGCP stuff is not yet sufficiently untangled from the MGCP handling.

#4 Updated by dexter over 1 year ago

Note: This problem should be fixed with the GSCON FSM.

#5 Updated by laforge over 1 year ago

please investigate if this is still an issue after the fsm branch has been merged. Might be good to add a test?

#6 Updated by dexter over 1 year ago

I had another look on this. The function send_ass_compl() is responsible for sending the ASSIGNMENT COMPLETE, while gsm0808_create_assignment_failure() followed by a sigtran_send() sends the ASSSIGNMENT FAILURE.

As far as I can see here the two functions are not called one after another in the code path of any state. Both functions are called when a state change to ST_ACTIVE happens, unless we are already in ST_ACTIVE, then the state remains unmodified. Once an assignment complete is done, the only way to get an assignment failure would be to perform another assignment which fails. Also important to note is that we change to ST_ACTIVE immediately after we call send_ass_compl(), there is no timeout that can strike and possibly cause an out of order ASSIGNMENT FAILURE.

From the TTCN3 test point of view we could simulate some failure cases for the ASSIGNMENT procedure. We would try all sorts of failures (MGCP) in order to see that the FSM always returns with an ASSIGNMENT FAILURE properly.

#7 Updated by dexter about 1 year ago

  • Status changed from New to Resolved

#8 Updated by pespin 10 months ago

The error cause of the message is "RR cause value: Protocol error unspecified (111)". As per "3GPP TS 04.08" "3.4.3.3 Abnormal cases":

On the mobile station side, if a lower layer failure happens on the new channel before the ASSIGNMENT COMPLETE 
message has been sent, the mobile station deactivates the new channels, reactivates the old channels, reconnects the 
TCHs if any and triggers the establishment of the main signalling link. It then sends a ASSIGNMENT FAILURE 
message, cause "protocol error unspecified" on the main DCCH and resumes the normal operation, as if no assignment 
attempt had occurred. The operational parameters (e.g. ciphering mode) when returning on the old channel are those 
applied before the procedure. 

Not sure what the issue may be though. Maybe the clocks of the 2 nanoBTS boards are not synced correctly? I am not sure how are the clocks calibrated or if they need calibration at all in nanoBTS.

#9 Updated by laforge 10 months ago

On Thu, Oct 04, 2018 at 10:29:29AM +0000, pespin [REDMINE] wrote:

Not sure what the issue may be though. Maybe the clocks of the 2 nanoBTS boards are not synced correctly? I am not sure how are the clocks calibrated or if they need calibration at all in nanoBTS.

they need to be calibrated, it can be triggered/executed via OML, similar to ipaccess-config.
I thought we had that implemented. It's part of the "perform test" functions, IIRC.

To my surprise, the wiki doesn't seem to contain any related information, which is sad.

However, in any case, the multi-TRX setup uses only one clock anyway as it is interconnected
by the TIB (timing interface bus). So for multi-TRX it doesn't matter.

However, for hand-over between two distinct BTSs, they should all have their OCXO
calibrated to the same reference (whether sysmoBTS or nanoBTS).

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)