Project

General

Profile

Bug #4833

ttcn3-pcu-test-sns timeout on PCU connect

Added by lynxis about 1 month ago. Updated 29 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
10/26/2020
Due date:
% Done:

100%

Spec Reference:

Description

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-pcu-test-sns/603/

The pcap files of RACH and chg_weight are empty.

History

#1 Updated by lynxis about 1 month ago

  • Subject changed from ttcn3-pcu-test-sns crashs to ttcn3-pcu-test-sns timeout on PCU connect

The ttcn3 tests doesn't crash. The PCU just doesn't connect.
Both tests are on the end of the test case. Maybe the respawn.sh already exited?

#2 Updated by lynxis about 1 month ago

Somehow the PCU is stuck at 20201026052041312 -> 05:20:41
but the test is 05:21:45.409276.

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-pcu-test-sns/603/

#3 Updated by lynxis about 1 month ago

  • Status changed from New to Stalled

centos is fine. only debian builds fail.

I've no idea what's the problem is. Maybe we should change the order of the tests? Maybe some other jenkins process is killing the process? We could make the pcu container persistent so we can collect the logs.

#4 Updated by pespin about 1 month ago

Using my own osmo-pcu branch pespin/alloc (so some line numbers may change a bit from master), rebuilt everything using --enable-asan.

20201027114414575 DLNS DEBUG NSEI=1234 Tx NS ALIVE (NSVCI=0) (gprs_ns2_message.c:373)
20201027114414575 DLNS DEBUG NSEI=1234 Tx NS ALIVE (NSVCI=0) (gprs_ns2_message.c:373)
20201027114414577 DLNS DEBUG GPRS-NS2-VC[0x612000007720]{UNBLOCKED}: Received Event ALIVE_ACK (gprs_ns2_vc_fsm.c:644)
20201027114414579 DBSSGP INFO Sending reset on BVCI 0 (gprs_bssgp_pcu.cpp:894)
20201027114414579 DBSSGP NOTICE BSSGP (BVCI=0) Tx BVC-RESET CAUSE=O&M intervention (gprs_bssgp_bss.c:302)
20201027114414580 DLNS DEBUG GPRS-NS2-VC[0x612000007a20]{UNBLOCKED}: Received Event ALIVE_ACK (gprs_ns2_vc_fsm.c:644)
20201027114414590 DLNS DEBUG GPRS-NS2-VC[0x612000007a20]{UNBLOCKED}: Received Event UNITDATA (gprs_ns2_vc_fsm.c:647)
20201027114414590 DBSSGP DEBUG rx BVCI_SIGNALLING gprs_bssgp_rx_sign (gprs_bssgp_pcu.cpp:473)
20201027114414590 DBSSGP NOTICE Rx BSSGP BVCI=0 (SIGN) BVC_RESET_ACK (gprs_bssgp_pcu.cpp:342)
20201027114414590 DBSSGP INFO Sending reset on BVCI 1234 (gprs_bssgp_pcu.cpp:903)
20201027114414590 DBSSGP NOTICE BSSGP (BVCI=1234) Tx BVC-RESET CAUSE=O&M intervention (gprs_bssgp_bss.c:302)
20201027114414595 DLNS DEBUG GPRS-NS2-VC[0x612000007a20]{UNBLOCKED}: Received Event UNITDATA (gprs_ns2_vc_fsm.c:647)
20201027114414595 DBSSGP DEBUG rx BVCI_SIGNALLING gprs_bssgp_rx_sign (gprs_bssgp_pcu.cpp:473)
20201027114414595 DBSSGP NOTICE Rx BSSGP BVCI=0 (SIGN) BVC_RESET_ACK (gprs_bssgp_pcu.cpp:342)
20201027114414595 DBSSGP INFO Sending unblock on BVCI 1234 (gprs_bssgp_pcu.cpp:912)
20201027114414595 DBSSGP NOTICE BSSGP (BVCI=1234) Tx BVC-UNBLOCK (gprs_bssgp_bss.c:282)
20201027114414599 DLNS DEBUG GPRS-NS2-VC[0x612000007a20]{UNBLOCKED}: Received Event UNITDATA (gprs_ns2_vc_fsm.c:647)
20201027114414599 DBSSGP DEBUG rx BVCI_SIGNALLING gprs_bssgp_rx_sign (gprs_bssgp_pcu.cpp:473)
20201027114414599 DBSSGP NOTICE Rx BSSGP BVCI=0 (SIGN) BVC_UNBLOCK_ACK (gprs_bssgp_pcu.cpp:356)
20201027114414599 DBSSGP DEBUG Sending flow control info on BVCI 1234 (gprs_bssgp_pcu.cpp:920)
20201027114414600 DBSSGP DEBUG Computed BVC leak rate = 59200, num_pdch = 8, cs = MCS-9 (gprs_bssgp_pcu.cpp:829)
20201027114414600 DBSSGP DEBUG Computed MS default leak rate = 29600, ms_num_pdch = 4, cs = MCS-9 (gprs_bssgp_pcu.cpp:853)
20201027114414600 DBSSGP DEBUG Sending FLOW CONTROL BVC, Bmax = 592000, R = 59200, Bmax_MS = 296000, R_MS = 29600, avg_dly = 0 (gprs_bssgp_pcu.cpp:882)
20201027114414604 DLNS DEBUG GPRS-NS2-VC[0x612000007720]{UNBLOCKED}: Received Event UNITDATA (gprs_ns2_vc_fsm.c:647)
20201027114414605 DBSSGP DEBUG rx BVCI_PTP=1234 gprs_bssgp_rx_ptp (gprs_bssgp_pcu.cpp:483)
20201027114414605 DBSSGP DEBUG Rx BSSGP BVCI=1234 (PTP) FLOW-CONTROL-BVC-ACK (gprs_bssgp_pcu.cpp:280)
20201027114414606 DLNS DEBUG NSEI=1234 Rx SNS PDU type SNS-DELETE (gprs_ns2_sns.c:1389)
20201027114414606 DLNS DEBUG GPRS-NS2-SNS-BSS[0x6120000078a0]{CONFIGURED}: Received Event DELETE (gprs_ns2_sns.c:1414)
20201027114414606 DLNS INFO GPRS-NS2-SNS-BSS[0x6120000078a0]{CONFIGURED}: DELETE NS-VC udp)[0.0.0.0]:23000<>[172.18.14.10]:23001 (gprs_ns2_sns.c:580)
20201027114414606 DPCU NOTICE NS-NSE 1234 became unavailable (gprs_bssgp_pcu.cpp:571)
20201027114414606 DLNS DEBUG GPRS-NS2-VC[0x612000007720]{UNBLOCKED}: Terminating (cause = OSMO_FSM_TERM_REQUEST) (gprs_ns2.c:425)
20201027114414607 DLNS DEBUG GPRS-NS2-VC[0x612000007720]{UNBLOCKED}: Freeing instance (gprs_ns2.c:425)
20201027114414607 DLNS DEBUG GPRS-NS2-VC[0x612000007720]{UNBLOCKED}: Deallocated (fsm.c:573)
=================================================================
==7==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000002a00 at pc 0x7f057a437430 bp 0x7ffe594663d0 sp 0x7ffe594663c8
READ of size 8 at 0x611000002a00 thread T0
    #0 0x7f057a43742f in rb_first /tmp/libosmocore/src/rbtree.c:296
    #1 0x7f057a3f3ebf in osmo_timers_prepare /tmp/libosmocore/src/timer.c:226
    #2 0x7f057a3f5934 in _osmo_select_main /tmp/libosmocore/src/select.c:396
    #3 0x7f057a3f8bda in osmo_select_main /tmp/libosmocore/src/select.c:417
    #4 0x55abf125d41a in main (/usr/local/bin/osmo-pcu+0x1b041a)
    #5 0x7f05787f72e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #6 0x55abf125a829 in _start (/usr/local/bin/osmo-pcu+0x1ad829)

0x611000002a00 is located 128 bytes inside of 216-byte region [0x611000002980,0x611000002a58)
freed by thread T0 here:
    #0 0x7f057be24a10 in free (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1a10)
    #1 0x7f057afa1712  (/usr/lib/x86_64-linux-gnu/libtalloc.so.2+0x2712)

previously allocated by thread T0 here:
    #0 0x7f057be24d28 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.3+0xc1d28)
    #1 0x7f057afa5acd in _talloc_zero (/usr/lib/x86_64-linux-gnu/libtalloc.so.2+0x6acd)

SUMMARY: AddressSanitizer: heap-use-after-free /tmp/libosmocore/src/rbtree.c:296 in rb_first
Shadow bytes around the buggy address:
  0x0c227fff84f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8500: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c227fff8510: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8520: fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8530: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c227fff8540:[fd]fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c227fff8550: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c227fff8560: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8570: fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8580: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8590: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==7==ABORTING
0: stopped pid 7 with status 1

#5 Updated by pespin about 1 month ago

So I think it's clear there's some timer being freed but not being removed from the osmocom loop's timer list.

Probably missing a osmo_timer_del() somewhere in this code path:

do_sns_delete()
  gprs_ns2_free_nsvc()
    ns2_prim_status_ind(nsvc->nse, 0, NS_AFF_CAUSE_VC_FAILURE); -> calls osmo-pcu gprs_ns_prim_cb() -> gprs_ns_prim_status_cb()
    ns2_nse_notify_unblocked(nsvc, false);
    ns2_sns_free_nsvc(nsvc);
    osmo_fsm_inst_term(nsvc->fi, OSMO_FSM_TERM_REQUEST, NULL);
    nsvc->bind->free_vc(nsvc);

#6 Updated by pespin about 1 month ago

  • Status changed from Stalled to In Progress
  • Assignee set to lynxis

Issue seems to be fixed by proposed patch:
https://gerrit.osmocom.org/c/libosmocore/+/20933

I also submitted a couple patches fixing up smaller issues I noticed:
https://gerrit.osmocom.org/c/libosmocore/+/20934 gb: ns2_sns: Fix missing trailing newline char in log line
https://gerrit.osmocom.org/c/libosmocore/+/20935 gb: ns2_sns: Add missing value_string entry for GPRS_SNS_EV_NO_NSVC

Also detected by ASan and not yet fixed, on every INFO_IND received in PCUIF:

pcu_l1_if.cpp:510:56: runtime error: reference binding to misaligned address 0x7fff10a2079e for type 'const struct in_addr', which requires 4 byte alignment
0x7fff10a2079e: note: pointer points here
00 00 04 00 ac 12  0e 0a 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00
            ^
pcu_l1_if.cpp:510:29: runtime error: load of misaligned address 0x7fff10a2079e for type 'const struct in_addr', which requires 4 byte alignment
0x7fff10a2079e: note: pointer points here
00 00 04 00 ac 12  0e 0a 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00
            ^

#7 Updated by pespin 29 days ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Later Asan runtime error fixed in 87c6dd3c265df114357dda6c3e601f4754689b9f.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)