Project

General

Profile

Bug #4927

paging related osmo-pcu ttcn3 tests have plenty of sporadic failures

Added by laforge about 2 months ago. Updated about 1 month ago.

Status:
In Progress
Priority:
Normal
Assignee:
Target version:
-
Start date:
12/29/2020
Due date:
% Done:

0%

Spec Reference:

pcufail.png View pcufail.png 120 KB laforge, 12/29/2020 05:56 PM
4457

History

#1 Updated by laforge about 2 months ago

4457

#2 Updated by fixeria about 2 months ago

Jenkins is not always telling us why a given test case did not pass, so I did some analysis.

TTCN3-centos / build 218

https://jenkins.osmocom.org/jenkins/view/TTCN3-centos/job/TTCN3-centos-pcu-test/lastBuild/

TC_paging_ps_from_sgsn_sign_ptmsi

05:04:12.675654 263 BSSGP_Emulation.ttcnpp:1132 Dynamic test case error: Sending data on the connection of port BVC to 261:BVC failed. (Broken pipe)
05:04:12.675663 260 - Final verdict of PTC: none
05:04:12.675691 263 BSSGP_Emulation.ttcnpp:1132 setverdict(error): none -> error
05:04:12.675716 263 BSSGP_Emulation.ttcnpp:1132 Performing error recovery.

TC_paging_ps_from_sgsn_ptp

05:04:22.755280 283 - Terminating component type PCUIF_Components.RAW_PCUIF_CT.
05:04:22.755288 mtc GPRS_Components.ttcn:220 Connection of port BSSGP_GLOBAL[0] to 279:GLOBAL was closed unexpectedly by the peer.
05:04:22.755298 281 BSSGP_Emulation.ttcnpp:1132 Dynamic test case error: Sending data on the connection of port BVC to 279:BVC failed. (Broken pipe)
05:04:22.755305 283 - Removing unterminated mapping between port PCU and system:PCU.
05:04:22.755312 282 - Port NSE was stopped.
05:04:22.755314 279 - Disconnected from MC.
05:04:22.755318 mtc GPRS_Components.ttcn:220 Port BSSGP_GLOBAL[0] was disconnected from 279:GLOBAL.
05:04:22.755325 280 - Terminating component type NS_Emulation.NSVC_CT.
05:04:22.755325 282 - Removing unterminated mapping between port IPL4 and system:IPL4.
05:04:22.755327 281 BSSGP_Emulation.ttcnpp:1132 setverdict(error): none -> error
05:04:22.755332 279 - TTCN-3 Parallel Test Component finished.
05:04:22.755345 281 BSSGP_Emulation.ttcnpp:1132 Performing error recovery.

TTCN3-debian / builds 698, 699, 700

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-pcu-test/698/testReport/(root)/PCU_Tests/TC_paging_cs_from_sgsn_ptp/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-pcu-test/699/testReport/(root)/PCU_Tests/TC_paging_cs_from_sgsn_ptp/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-pcu-test/700/testReport/(root)/PCU_Tests/TC_paging_cs_from_sgsn_sign/

Failed to match Packet Paging Request: { ctrl := { mac_hdr := { payload_type := MAC_PT_RLCMAC_NO_OPT (1), rrbp := RRBP_Nplus13_mod_2715648 (0), rrbp_valid := false, usf := 0 }, opt := omit, payload := { msg_type := PACKET_DL_DUMMY_CTRL (37), u := { dl_dummy := { page_mode := PAGE_MODE_NORMAL (0), persistence_levels_present := '0'B, persistence_levels := omit } } } } } vs { ctrl := { mac_hdr := { payload_type := MAC_PT_RLCMAC_NO_OPT (1), rrbp := ?, rrbp_valid := ?, usf := ? }, opt := *, payload := { msg_type := PACKET_PAGING_REQUEST (34), u := { paging := { page_mode := ?, persistence_levels_present := ?, persistence_levels := *, nln_present := ?, nln := *, repeated_pageinfo := *, repeated_pageinfo_term := '0'B } } } } }
      PCU_Tests.ttcn:3527 PCU_Tests control part
      PCU_Tests.ttcn:2391 TC_paging_cs_from_sgsn_sign testcase

This is caused by a race condition problem that I tried to fix in:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/20461 pcu/GPRS_Components: work around a race condition in f_rx_rlcmac_dl_block()

but never had time to finish. This change makes the situation even worse :/

#3 Updated by pespin about 2 months ago

The tests look more stable over last days, probably something was taking resources and disrupting normal operation. Until know we were lucky tests are mainly passing fine in jenkins slave. I agree though this needs to be fixed, but it will require some dev time.

That's indeed a know problem which I think we agreed should be solved by moving PCU_Tests infrastructure to use alt steps instead of functions to be able to cope better with that kind of timing issues.

I started to play with some ideas regarding that in osmo-ttcn3-hacks.git branch "pespin/pcu-altstep" but nothing really usable yet. I'll probably need to discuss ideas with fixeria too since he's also more used to using altstep features right now.

#4 Updated by pespin about 1 month ago

  • Status changed from New to In Progress

This patch should hopefully fix most of the issues we see, which usually happen in tests when expecting an Assignment Requet on PACH, due to the PCUIF RTS is sent too early, before the PCU ctually received the BSSGP message we sent to it.

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/22313

The main problem towards moving current tests to altsteps is that BTS.receive() provides use with PCUIF_Message tr_PCUIF_DATA_REQ() for all RLCMAC blocks, and then in a 2nd step we need to call dec_RlcmacDlBlock(pcu_msg.u.data_req.data). We should instead be able to decode pcu_msg.u.data_req.data automatically and be able to match those through altsteps providing templates.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)