Bug #4703
closedttcn3-bsc-test-latest: 108 of 196 tests fail reproducably since build nr 673
100%
Description
I recently broke the latest tests by introducing vty handling bugs (build 671 and 672),
and then after I fixed that, I see a different massive test failure of 'latest' osmo-bsc.
This time it is reproducable (673, 674) and seems unrelated to the vty, at least at first glance.
Updated by neels over 3 years ago
It seems that sometimes osmo-bsc drops a message because it has no MSC connection:
var octetstring l3 := '00010203040506'O f_ipa_tx(0, ts_RSL_EST_IND(chan_nr, valueof(ts_RslLinkID_DCCH(0)), l3));
20200812212927495 DCHAN DEBUG lchan(0-0-0-CCCH_SDCCH4-0)[0x55f9b4f65630]{WAIT_RLL_RTP_ESTABLISH}: (type=SDCCH) SAPI=0 ESTABLISH INDICATION (abis_rsl.c:1647) 20200812212927495 DCHAN DEBUG lchan(0-0-0-CCCH_SDCCH4-0)[0x55f9b4f65630]{WAIT_RLL_RTP_ESTABLISH}: Received Event LCHAN_EV_RLL_ESTABLISH_IND (abis_rsl.c:1680) 20200812212927495 DCHAN DEBUG lchan(0-0-0-CCCH_SDCCH4-0)[0x55f9b4f65630]{WAIT_RLL_RTP_ESTABLISH}: state_chg to ESTABLISHED (lchan_fsm.c:805) 20200812212927495 DMSC DEBUG SUBSCR_CONN[0x55f9b4f64480]{INIT}: Allocated (fsm.c:461) 20200812212927495 DLCLS DEBUG LCLS[0x55f9b4f64640]{NO_LCLS}: Allocated (fsm.c:461) 20200812212927495 DLCLS DEBUG LCLS[0x55f9b4f64640]{NO_LCLS}: is child of SUBSCR_CONN[0x55f9b4f64480] (fsm.c:491) 20200812212927495 DMSC INFO Tx MSC COMPL L3 (gsm_08_08.c:369) 20200812212927495 DMSC NOTICE Initializing resources for new SIGTRAN connection to MSC: RI=NONE... (osmo_bsc_sigtran.c:306) 20200812212927495 DMSC ERROR MSC is not connected. Dropping. (osmo_bsc_sigtran.c:309)
Then the BSSAP.receive(tr_BSSAP_CONNECT_ind(?, ?, tr_BSSMAP_ComplL3(l3))); of course waits for ever for the COMPL L3 that was dropped.
Now to figure out why osmo-bsc latest sporadically considers MSCs as not connected.
Interesting is that the target MSC is usually is indicated by the SCCP address, but here it just says "RI=NONE":
20200812212927495 DMSC NOTICE Initializing resources for new SIGTRAN connection to MSC: RI=NONE... (osmo_bsc_sigtran.c:306)
The successful RESET->ACK happend some two seconds earlier:
20200812212925456 DMSC INFO Rx MSC UDT BSSMAP RESET (osmo_bsc_bssap.c:1013) 20200812212925456 DMSC NOTICE RESET from MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP (osmo_bsc_bssap.c:96) 20200812212925456 DMSC NOTICE Sending RESET ACK to MSC: RI=SSN_PC,PC=0.23.1,SSN=BSSAP (osmo_bsc_sigtran.c:121)
So I would have expected the msc to print PC=0.23.1 upon forwarding the EST IND.
Instead osmo-bsc logs an empty SCCP address and concludes that that MSC is not connected.
(this log is from a manual test, but it is also visible in e.g. https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/674/artifact/logs/bsc-tester/BSC_Tests.TC_assignment_codec_amr_f.pcap.gz looking at 'gsm_abis_rsl || bssap || gsmtap_log.ident == OsmoBSC' )
Updated by neels over 3 years ago
OK damn, I figured it out. The failure DOES originate from the VTY changes in this way:
I set 'msc N' / 'allow-attach' flags and rely on the VTY handling code to ignore the 'allow-attach' command, which is not yet in the 'latest' build.
However, to get there, I enter the 'msc 0', 'msc 1' and 'msc 2' config scopes,
which actually then creates three MSCs to be available to osmo-bsc.
Since osmo-bsc latest does not yet properly ignore MSCs that are not even connected, the tests try to contact msc 1 and msc 2 in a sort of round-robin way,
i.e. establishing a new L3 fails two out of three times.
Now figuring out the simplest way to finally get a working 'latest' build...
Updated by neels over 3 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100