Bug #3740

Figure out best fix for where some MS fails to register

Added by pespin 28 days ago.

Target version:
Start date:
Due date:
% Done:


Spec Reference:


By looking at osmo-gsm-tester_run_prod history of "gprs" testsuite when used with "sysmobts" in all its variant, it can be seen that test "iperf3m4" fails sporadically with at least 1 MS being unable to register to the network:
"osmo_gsm_tester.log.Error: Failed to find Network Operator {attempts=4, mcc_mnc=('901', '70')}"

Attached all required information to showcase what's going on: pcap file, osmo-gsm-tester log, etc.

These are the MS being used and their IMSIs:

gobi_1 901700000010978
gobi_2 901700000023847
gobi_3 901700000023848
gobi_4 901700000023849

Events as seen from pcap:
1- Some MS try to register with TMSI instead of IMSI and their are Identity Requested but they don't answer and close the channel. We don't care about this right now.
1- gobi_1 LU accepted (fine)
2- gobi_3 LU accepted (but the MS doesn't send back on the uplink a "TMSI Reallocation Complete" message, so channel is left open for 5 second, .45 to .50)
3- gobi_2 LU accepted (fine)
4- gobi_3 LU rejected (cause=congestion, a reject is sent at .50, frame-nr 1353, probably due to 5 seconds timeout without "TMSI Reallocation Complete")
5- gobi_4 accepted
6- gobi_3 tries LU again (probably after congestion period), but it's too late (test is already finishing) and never gets a response.

Why "TMSI Reallocation Complete" is never sent MS->BTS for gobi_3 modem:

According to timestamps in osmo-gsm-tester and pcap file, this is what happens:
1- ogt: 03:26:43.328617 - 03:26:44.403392: during this period of time, gobi_3 modem is set "Online", which means (found experimentally) it can start scanning and/or registering to any network starting from this point
2- pcap: 03:26:45.014113: RACH request from gobi_3 is received on Um (GSMTAP), then channel is allocated
3- pcap: 03:26:45.319336: LU Request from gobi_3 is received on Um (GSMTAP)
4- ogt: 03:26:45.414694: gobi_3 is set by ofono (by osmo-gsm-tester) to start scanning, which means it will start jumping arfcns.
5- pcap: 03:26:45.319336: LU Accept is sent to gobi_3 on Um (GSMTAP, fr-nr 1216). MS probably never receives it or doesn't send a TMSI Reallocation Update because it's already in scan mode.
6- pcap: 03:26:50.336382: After 5 seconds, MSC decides to LU Reject and tears down the channel

Interestingly, after start scanning, /gobi_3 is unable to find the network but all others modems can find it at roughtly the same time (before and after):

03:27:00.935287 tst                          /gobi_3: DBG: scanned operators:  [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': '', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})]  [trial-1698↪gprs:sysmo↪/gobi_3]  []
03:27:08.901940 tst                          /gobi_3: DBG: scanned operators:  [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': '', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})]  [trial-1698↪gprs:sysmo↪/gobi_3]  []
03:27:17.157086 tst                          /gobi_3: DBG: scanned operators:  [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': '', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})]  [trial-1698↪gprs:sysmo↪/gobi_3]  []
03:27:25.285414 tst                          /gobi_3: DBG: scanned operators:  [('/gobi_3/operator/26201', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': 'TDG', 'MobileNetworkCode': '01', 'MobileCountryCode': '262'}), ('/gobi_3/operator/26202', {'Status': 'forbidden', 'Technologies': ['gsm', 'umts', 'lte'], 'Name': '', 'MobileNetworkCode': '02', 'MobileCountryCode': '262'})]  [trial-1698↪gprs:sysmo↪/gobi_3]  []

However, it finally finds the network afterwards, at 03:27:27.537501 it sends a RACH req and at 03:27:27.688234000 it sends an LU Request (IMSI can be checked here).

So, after whole explanation, some questions regarding what's the best way to improve this:
1- Is behavior of the network correct? sending a Reject after an Accept if TMSI Reallocation Complete is not received.
2- As finally the MS seems to be able to find the network ,probably increasing the amount of scan+regsiter attempts in should be enough to have the MS finally registering to the network (see NETREG_MAX_REGISTER_ATTEMPTS = 3 there).

merged.pcap merged.pcap 231 KB pespin, 12/20/2018 05:17 PM
ogt.log.txt ogt.log.txt 130 KB pespin, 12/20/2018 05:36 PM

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)