Project

General

Profile

Actions

Bug #5205

closed

Crash: SEGV on current master 945be910

Added by iedemam over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
08/06/2021
Due date:
% Done:

100%

Spec Reference:

Description

Let me know if I can provide any additional information. Thanks!

Log Tail

<0008> tbf.cpp:629 TBF(TFI=26 TLLI=0x7f1bd331 DIR=DL STATE=FLOW EGPRS) poll timeout for FN=2006324, TS=4 (curr FN 2006329)
<0008> tbf.cpp:699 TBF(TFI=26 TLLI=0x7f1bd331 DIR=DL STATE=FLOW EGPRS) Timeout for polling PACKET DOWNLINK ACK: |Assignment was on PACCH|No downlink ACK received yet|
<0008> tbf.cpp:407 TBF(TFI=26 TLLI=0x7f1bd331 DIR=DL STATE=FLOW EGPRS) N3105 exceeded MAX (8)
<0002> bts.cpp:303 Detected FN jump! 2006329 -> 2006337
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006337 but previous FN=2006329 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=2006329): TBF(TFI=0 TLLI=0x7d36bb1f DIR=UL STATE=ASSIGN)
<0008> tbf.cpp:629 TBF(TFI=0 TLLI=0x7d36bb1f DIR=UL STATE=ASSIGN) poll timeout for FN=2006329, TS=7 (curr FN 2006337)
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006337 but previous FN=2006333 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006316 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006320 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006324 is still reserved!
<0008> tbf.cpp:407 TBF(TFI=2 TLLI=0x8b201f16 DIR=UL STATE=FLOW EGPRS) N3101 exceeded MAX (10)
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006329 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006333 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006337 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006342 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006350 but previous FN=2006346 is still reserved!
<0005> pdch.cpp:965 PDCH(bts=0,trx=1,ts=7) FN=2006355 Rx UL DATA from unexpected TBF(TFI=13 TLLI=0x7a7bdfa1 DIR=UL STATE=FLOW EGPRS) vs expected TBF(TFI=10 TLLI=0x78ecd326 DIR=UL STATE=FLOW EGPRS)
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006355 but previous FN=2006350 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=2006350): TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=FLOW EGPRS)
<0008> tbf.cpp:629 TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=FLOW EGPRS) poll timeout for FN=2006350, TS=7 (curr FN 2006355)
<0008> tbf.cpp:699 TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=FLOW EGPRS) Timeout for polling PACKET DOWNLINK ACK: |Assignment was on PACCH|Downlink ACK was received|
<0005> pdch.cpp:965 PDCH(bts=0,trx=1,ts=7) FN=2006359 Rx UL DATA from unexpected TBF(TFI=13 TLLI=0x7a7bdfa1 DIR=UL STATE=FINISHED EGPRS) vs expected TBF(TFI=10 TLLI=0x78ecd326 DIR=UL STATE=FLOW EGPRS)
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006368 but previous FN=2006355 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006368 but previous FN=2006359 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006368 but previous FN=2006363 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=4) Expiring FN=2006372 but previous FN=2006368 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006376 but previous FN=2006372 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=2006372): TBF(TFI=0 TLLI=0x7d36bb1f DIR=UL STATE=ASSIGN)
<0008> tbf.cpp:629 TBF(TFI=0 TLLI=0x7d36bb1f DIR=UL STATE=ASSIGN) poll timeout for FN=2006372, TS=7 (curr FN 2006376)
<0008> tbf.cpp:407 TBF(TFI=0 TLLI=0x7d36bb1f DIR=UL STATE=ASSIGN) N3105 exceeded MAX (8)
<0008> ./tbf.h:376 TBF(DL-TFI_9)[7bff80]{ASSIGN}: transition to state ASSIGN not permitted!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=4) Expiring FN=2006394 but previous FN=2006376 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=4) Timeout for registered POLL (FN=2006376): TBF(TFI=8 TLLI=0x7f818670 DIR=UL STATE=FLOW EGPRS)
<0008> tbf.cpp:629 TBF(TFI=8 TLLI=0x7f818670 DIR=UL STATE=FLOW EGPRS) poll timeout for FN=2006376, TS=4 (curr FN 2006394)
<0008> tbf.cpp:673 TBF(TFI=8 TLLI=0x7f818670 DIR=UL STATE=FLOW EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET DOWNLINK ASSIGNMENT: |Assignment was on PACCH|No uplink data received yet|
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=4) Expiring FN=2006394 but previous FN=2006381 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=4) Expiring FN=2006394 but previous FN=2006385 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=4) Expiring FN=2006394 but previous FN=2006389 is still reserved!
<0002> pdch_ul_controller.c:159 PDCH(bts=0,trx=1,ts=7) FN=2006415 Failed allocating POLL, all RRBP values are already reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006398 but previous FN=2006394 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=2006394): TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=FLOW EGPRS)
<0008> tbf.cpp:629 TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=FLOW EGPRS) poll timeout for FN=2006394, TS=7 (curr FN 2006398)
<0008> tbf.cpp:699 TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=FLOW EGPRS) Timeout for polling PACKET DOWNLINK ACK: |Assignment was on PACCH|Downlink ACK was received|
<0008> tbf.cpp:407 TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=FLOW EGPRS) N3105 exceeded MAX (8)
<0002> pdch_ul_controller.c:159 PDCH(bts=0,trx=1,ts=7) FN=2006420 Failed allocating POLL, all RRBP values are already reserved!
<0002> bts.cpp:610 No TFI available (suggested TRX: -1).
<0002> gprs_rlcmac_ts_alloc.cpp:894 [DL] algo B <single> (suggested TRX: -1): failed to allocate a TFI
<0008> tbf.cpp:746 TBF(TFI=0 TLLI=0xb2be855e DIR=DL STATE=NULL EGPRS) Timeslot Allocation failed: trx = -1, single_slot = 1
<0008> tbf_dl.cpp:263 MS(TLLI=0xb2be855e, IMSI=000, TA=0, 12/12,) No PDCH resource
<0002> pdch_ul_controller.c:159 PDCH(bts=0,trx=1,ts=7) FN=2006424 Failed allocating POLL, all RRBP values are already reserved!
<0008> tbf.cpp:458 TBF(TFI=5 TLLI=0xf4297700 DIR=DL STATE=WAIT_RELEASE EGPRS) T3193 timeout expired, freeing TBF
<0008> tbf.cpp:465 TBF(TFI=5 TLLI=0xf4297700 DIR=DL STATE=WAIT_RELEASE EGPRS) T3193 timeout expired, freeing TBF
<0002> pdch_ul_controller.c:159 PDCH(bts=0,trx=1,ts=7) FN=2006428 Failed allocating POLL, all RRBP values are already reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006411 but previous FN=2006407 is still reserved!
<0008> tbf.cpp:407 TBF(TFI=10 TLLI=0x78ecd326 DIR=UL STATE=FLOW EGPRS) N3101 exceeded MAX (10)
<0008> tbf.cpp:458 TBF(TFI=11 TLLI=0xb87baa1d DIR=DL STATE=WAIT_RELEASE) T3193 timeout expired, freeing TBF
<0008> tbf.cpp:465 TBF(TFI=11 TLLI=0xb87baa1d DIR=DL STATE=WAIT_RELEASE) T3193 timeout expired, freeing TBF
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=2006428 but previous FN=2006372 is still reserved!
<0002> gprs_rlcmac_ts_alloc.cpp:776 No USF available
<0008> tbf.cpp:746 TBF(TFI=0 TLLI=0x7f4888cd DIR=UL STATE=NULL EGPRS) Timeslot Allocation failed: trx = 1, single_slot = 0
<0008> tbf_ul.cpp:151 MS(TLLI=0x7f4888cd, IMSI=, TA=0, 12/12,) No PDCH resource
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006428 but previous FN=2006420 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=2006420): TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=RELEASING EGPRS)
<0008> tbf.cpp:629 TBF(TFI=22 TLLI=0x78d1db92 DIR=DL STATE=RELEASING EGPRS) poll timeout for FN=2006420, TS=7 (curr FN 2006428)
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=2006428 but previous FN=2006424 is still reserved!
<0008> tbf.cpp:793 TBF(TFI=0 TLLI=0x7f4888cd DIR=UL STATE=ASSIGN) releasing due to PACCH assignment timeout.
<0002> pdch_ul_controller.c:336 PDCH(bts=0,trx=1,ts=7) Timeout for registered SBA (FN=2006433, TA=0)
<0002> gprs_rlcmac_ts_alloc.cpp:776 No USF available
<0008> tbf.cpp:746 TBF(TFI=0 TLLI=0x79e66022 DIR=UL STATE=NULL) Timeslot Allocation failed: trx = 1, single_slot = 0
<0008> tbf_ul.cpp:151 MS(TLLI=0x79e66022, IMSI=, TA=0, 12/0,) No PDCH resource
<0008> tbf.cpp:793 TBF(TFI=0 TLLI=0x79e66022 DIR=UL STATE=ASSIGN) releasing due to PACCH assignment timeout.
<0002> gprs_rlcmac_ts_alloc.cpp:776 No USF available
<0008> tbf.cpp:746 TBF(TFI=0 TLLI=0xa89db400 DIR=UL STATE=NULL EGPRS) Timeslot Allocation failed: trx = 1, single_slot = 0
<0008> tbf_ul.cpp:151 MS(TLLI=0xa89db400, IMSI=, TA=0, 12/12,) No PDCH resource
<0008> tbf.cpp:793 TBF(TFI=0 TLLI=0xa89db400 DIR=UL STATE=ASSIGN) releasing due to PACCH assignment timeout.
<0002> gprs_rlcmac_ts_alloc.cpp:776 No USF available
<0008> tbf.cpp:746 TBF(TFI=0 TLLI=0xa3ddab60 DIR=UL STATE=NULL EGPRS) Timeslot Allocation failed: trx = 1, single_slot = 0
<0008> tbf_ul.cpp:151 MS(TLLI=0xa3ddab60, IMSI=, TA=0, 12/12,) No PDCH resource
<0008> tbf.cpp:793 TBF(TFI=0 TLLI=0xa3ddab60 DIR=UL STATE=ASSIGN) releasing due to PACCH assignment timeout.
<0002> gprs_rlcmac_ts_alloc.cpp:776 No USF available
<0008> tbf.cpp:746 TBF(TFI=0 TLLI=0xbd18e600 DIR=UL STATE=NULL EGPRS) Timeslot Allocation failed: trx = 1, single_slot = 0
<0008> tbf_ul.cpp:151 MS(TLLI=0xbd18e600, IMSI=000, TA=0, 12/12, DL) No PDCH resource
<0002> gprs_rlcmac_ts_alloc.cpp:776 No USF available
<0008> tbf.cpp:746 TBF(TFI=0 TLLI=0x7f818670 DIR=UL STATE=NULL EGPRS) Timeslot Allocation failed: trx = 1, single_slot = 0
<0008> tbf_ul.cpp:151 MS(TLLI=0x7f818670, IMSI=618010113135356, TA=0, 12/12, DL) No PDCH resource

Backtrace

Program received signal SIGSEGV, Segmentation fault.
gprs_rlcmac_pdch::rcv_control_egprs_dl_ack_nack (this=this@entry=0x7af720, ack_nack=ack_nack@entry=0x8641b0, fn=fn@entry=2006476, meas=meas@entry=0x7fffffffe2a0) at pdch.cpp:525
525    pdch.cpp: No such file or directory.
#0  gprs_rlcmac_pdch::rcv_control_egprs_dl_ack_nack (this=this@entry=0x7af720, ack_nack=ack_nack@entry=0x8641b0, fn=fn@entry=2006476, meas=meas@entry=0x7fffffffe2a0) at pdch.cpp:525
#1  0x00000000004297c2 in gprs_rlcmac_pdch::rcv_control_block (this=0x7af720, data=0x7fffffffe326 "A!\213\376\250!\354\002", data_len=<optimized out>, fn=2006476, meas=0x7fffffffe2a0, cs=<optimized out>) at pdch.cpp:845
#2  0x0000000000429ef4 in gprs_rlcmac_pdch::rcv_block_gprs (this=this@entry=0x7af720, data=data@entry=0x7fffffffe326 "A!\213\376\250!\354\002", data_len=<optimized out>, fn=fn@entry=2006476, meas=meas@entry=0x7fffffffe2a0, cs=<optimized out>) at pdch.cpp:1001
#3  0x000000000042a114 in gprs_rlcmac_pdch::rcv_block (this=this@entry=0x7af720, data=data@entry=0x7fffffffe326 "A!\213\376\250!\354\002", len=<optimized out>, fn=fn@entry=2006476, meas=meas@entry=0x7fffffffe2a0) at pdch.cpp:897
#4  0x000000000040fefd in pcu_rx_data_ind_pdtch (bts=bts@entry=0x7ad200, pdch=pdch@entry=0x7af720, data=data@entry=0x7fffffffe326 "A!\213\376\250!\354\002", len=<optimized out>, fn=2006476, meas=meas@entry=0x7fffffffe2a0) at pcu_l1_if.cpp:284
#5  0x0000000000410834 in pcu_rx_data_ind (data_ind=0x7fffffffe324, bts=0x7ad200) at pcu_l1_if.cpp:418
#6  pcu_rx (pcu_prim=pcu_prim@entry=0x7fffffffe320, pcu_prim_length=<optimized out>) at pcu_l1_if.cpp:992
#7  0x0000000000431399 in pcu_sock_read (bfd=<optimized out>) at osmobts_sock.c:156
#8  0x0000000000431585 in pcu_sock_cb (bfd=0x6974e0 <pcu_sock_state>, flags=1) at osmobts_sock.c:211
#9  0x00007ffff6c8e78c in ?? () from /usr/lib64/libosmocore.so.17
#10 0x00007ffff6c8e836 in osmo_select_main () from /usr/lib64/libosmocore.so.17
#11 0x0000000000406bf5 in main (argc=1, argv=0x7fffffffecd8) at pcu_main.cpp:329
$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {128672, 0, 160, 0, 70560, 0, 47148128, 0, -1447626532, 32767, 1214498954, 32735, 1, 0, 160, 0, 36003808, 0, 45862720, 0, 44, 0, 6622689, 0, 8149536, 0, 44527648, 0}, _kill = {si_pid = 128672, si_uid = 0}, _timer = {si_tid = 128672, si_overrun = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _rt = {si_pid = 128672, si_uid = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _sigchld = {si_pid = 128672, si_uid = 0, si_status = 160, si_utime = 303052892405760, si_stime = 202499667827621888}, _sigfault = {si_addr = 0x1f6a0, _addr_lsb = 160, _addr_bnd = {_lower = 0x113a0, _upper = 0x2cf6c60}}, _sigpoll = {si_band = 128672, si_fd = 160}}}

Possible Fix

diff --git a/src/pdch.cpp b/src/pdch.cpp
index 2ec40ce..b4fced7 100644
--- a/src/pdch.cpp
+++ b/src/pdch.cpp
@@ -452,6 +452,7 @@ void gprs_rlcmac_pdch::rcv_control_dl_ack_nack(Packet_Downlink_Ack_Nack_t *ack_n
         return;
     }
     tbf = as_dl_tbf(poll->tbf_poll.poll_tbf);
+    if (!tbf) return;
     if (tbf->tfi() != tfi) {
         LOGPTBFDL(tbf, LOGL_NOTICE,
               "PACKET DOWNLINK ACK with wrong TFI=%d, ignoring!\n", tfi);
@@ -522,6 +523,7 @@ void gprs_rlcmac_pdch::rcv_control_egprs_dl_ack_nack(EGPRS_PD_AckNack_t *ack_nac
         return;
     }
     tbf = as_dl_tbf(poll->tbf_poll.poll_tbf);
+    if (!tbf) return;
     if (tbf->tfi() != tfi) {
         LOGPDCH(this, DRLCMAC, LOGL_NOTICE, "EGPRS PACKET DOWNLINK ACK with " 
             "wrong TFI=%d, ignoring!\n", tfi);
Actions #1

Updated by laforge over 2 years ago

  • Assignee set to neels
Actions #2

Updated by laforge over 2 years ago

Actions #3

Updated by neels over 2 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 70

submitted a patch to fix the immediate crash cause: https://gerrit.osmocom.org/c/osmo-pcu/+/25182

I haven't identified any higher level reason why the DL TBF would be NULL even though it apparently is always expected non-NULL.
Anyway, not crashing the entire PCU is a good thing, maybe we still need to figure out a higher level error later on.

Actions #4

Updated by laforge over 2 years ago

  • Assignee changed from neels to pespin
Actions #5

Updated by pespin over 2 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from pespin to iedemam

As far as I know this should be fixed in master versions of osmo-pcu. I asked iedemam a few days ago to test it with a recent version of osmo-pcu after the FSM refactorings where merged, and afair the issue was solved.

Assigning to him for feedback.

Actions #6

Updated by iedemam over 2 years ago

I think this crash is still an issue. Here is a trace from a version which includes all submitted fixes.

# osmo-pcu --version
OsmoPCU version 0.9.0-35d51ca4
<0008> tbf.cpp:381 TBF(TFI=5 TLLI=0xb9ce6756 DIR=UL STATE=FLOW EGPRS) N3101 exceeded MAX (10)
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216522 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216527 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216531 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216535 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216540 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216544 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216548 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216553 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=1216557 but previous FN=1216553 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=1216553): TBF(TFI=1 TLLI=0x8dc3ee16 DIR=DL STATE=ASSIGN EGPRS)
<0008> tbf.cpp:538 TBF(TFI=1 TLLI=0x8dc3ee16 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=1216553, TS=7 (curr FN 1216557)
<0008> tbf_dl_ass_fsm.c:196 TBF(TFI=1 TLLI=0x8dc3ee16 DIR=DL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET DOWNLINK ASSIGNMENT: |Assignment was on PACCH|No downlink ACK received yet|

Program received signal SIGSEGV, Segmentation fault.
gprs_rlcmac_pdch::rcv_control_egprs_dl_ack_nack (this=this@entry=0x7af430, ack_nack=ack_nack@entry=0x8e6e50, fn=fn@entry=1216561, meas=meas@entry=0x7fffffffe290) at pdch.cpp:511
511    pdch.cpp: No such file or directory.
No symbol "all" in current context.
$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {128672, 0, 160, 0, 89408, 0, 40297152, 0, 1666373180, 32765, 682903690, 32695, 1, 0, 160, 0, 29012848, 0, 35819584, 0, 44, 0, 6622689, 0, 8149536, 0, 40171264, 0}, _kill = {si_pid = 128672, si_uid = 0}, _timer = {si_tid = 128672, si_overrun = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _rt = {si_pid = 128672, si_uid = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _sigchld = {si_pid = 128672, si_uid = 0, si_status = 160, si_utime = 384004436000768, si_stime = 173074949961940992}, _sigfault = {si_addr = 0x1f6a0, _addr_lsb = 160, _addr_bnd = {_lower = 0x15d40, _upper = 0x266e2c0}}, _sigpoll = {si_band = 128672, si_fd = 160}}}

It looks like another unprotected tbf read is occurring? That section of code is:

    tbf = as_dl_tbf(poll->tbf_poll.poll_tbf);
    if (tbf->tfi() != tfi) {
        LOGPDCH(this, DRLCMAC, LOGL_NOTICE, "EGPRS PACKET DOWNLINK ACK with " 
            "wrong TFI=%d, ignoring!\n", tfi);
        return;
    }
Actions #7

Updated by iedemam over 2 years ago

Same version, a new type of SEGV.

<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=5) Expiring FN=1369155 but previous FN=1369103 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=5) Expiring FN=1369155 but previous FN=1369108 is still reserved!
<0008> tbf.cpp:431 TBF(TFI=25 TLLI=0x959e1446 DIR=DL STATE=RELEASING EGPRS) T3191 timeout expired, freeing TBF: |Assignment was on PACCH|Downlink ACK was received|
<0008> tbf_fsm.c:363 TBF(TFI=3 TLLI=0xa3415a40 DIR=UL STATE=ASSIGN EGPRS) releasing due to PACCH assignment timeout.
<0002> bts.cpp:314 Detected FN jump! 1369160 -> 1369173
<0008> tbf.cpp:431 TBF(TFI=6 TLLI=0x8cbbee1e DIR=DL STATE=RELEASING EGPRS) T3191 timeout expired, freeing TBF: |Assignment was on PACCH|Downlink ACK was received|
<0002> bts.cpp:314 Detected FN jump! 1369173 -> 1369207
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=4) Expiring FN=1369207 but previous FN=1369160 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=4) Timeout for registered POLL (FN=1369160): TBF(TFI=18 TLLI=0xf7b1f400 DIR=DL STATE=WAIT_RELEASE EGPRS
<0008> tbf.cpp:538 TBF(TFI=18 TLLI=0xf7b1f400 DIR=DL STATE=WAIT_RELEASE EGPRS poll timeout for FN=1369160, TS=4 (curr FN 1369207)
<0008> tbf_ul_ass_fsm.c:224 TBF(TFI=18 TLLI=0xf7b1f400 DIR=DL STATE=WAIT_RELEASE EGPRS Timeout for polling PACKET CONTROL ACK for PACKET UPLINK ASSIGNMENT: |Assignment was on PACCH|Downlink ACK was received|
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1369207 but previous FN=1369181 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=6) Timeout for registered POLL (FN=1369181): TBF(TFI=24 TLLI=0xe5643a40 DIR=DL STATE=ASSIGN EGPRS)
<0008> tbf.cpp:538 TBF(TFI=24 TLLI=0xe5643a40 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=1369181, TS=6 (curr FN 1369207)
<0008> tbf_dl_ass_fsm.c:196 TBF(TFI=24 TLLI=0xe5643a40 DIR=DL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET DOWNLINK ASSIGNMENT: |Assignment was on PACCH|No downlink ACK received yet|
<0002> bts.cpp:314 Detected FN jump! 1369207 -> 1369220
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=1369220 but previous FN=1369038 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=1369038): TBF(TFI=1 TLLI=0xb622b676 DIR=UL STATE=ASSIGN EGPRS)
<0008> tbf.cpp:538 TBF(TFI=1 TLLI=0xb622b676 DIR=UL STATE=ASSIGN EGPRS) poll timeout for FN=1369038, TS=7 (curr FN 1369220)
<0008> tbf_ul_ass_fsm.c:224 TBF(TFI=1 TLLI=0xb622b676 DIR=UL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET UPLINK ASSIGNMENT: |Assignment was on PACCH|No uplink data received yet|
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=1369220 but previous FN=1369069 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=1369069): TBF(TFI=3 TLLI=0xffffffff DIR=UL STATE=NULL EGPRS)
<0008> tbf.cpp:538 TBF(TFI=3 TLLI=0xffffffff DIR=UL STATE=NULL EGPRS) poll timeout for FN=1369069, TS=7 (curr FN 1369220)

Program received signal SIGSEGV, Segmentation fault.
gprs_rlcmac_tbf::poll_timeout (this=0xd3f030, pdch=0x7af430, poll_fn=1369069, reason=PDCH_ULC_POLL_UL_ASS) at tbf.cpp:551
551    tbf.cpp: No such file or directory.
No symbol "all" in current context.
$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {52, 0, 160, 0, 77632, 0, 35389632, 0, -561014292, 32766, -17229686, 32701, 1, 0, 160, 0, 24068976, 0, 30876096, 0, 44, 0, 6622689, 0, 8149536, 0, 34555216, 0}, _kill = {si_pid = 52, si_uid = 0}, _timer = {si_tid = 52, si_overrun = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _rt = {si_pid = 52, si_uid = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _sigchld = {si_pid = 52, si_uid = 0, si_status = 160, si_utime = 333426901123072, si_stime = 151997312057475072}, _sigfault = {si_addr = 0x34, _addr_lsb = 160, _addr_bnd = {_lower = 0x12f40, _upper = 0x21c00c0}}, _sigpoll = {si_band = 52, si_fd = 160}}}

And the code context for this crash is:

void gprs_rlcmac_tbf::poll_timeout(struct gprs_rlcmac_pdch *pdch, uint32_t poll_fn, enum pdch_ulc_tbf_poll_reason reason)
{
    gprs_rlcmac_ul_tbf *ul_tbf = as_ul_tbf(this);

    LOGPTBF(this, LOGL_NOTICE, "poll timeout for FN=%d, TS=%d (curr FN %d)\n",
        poll_fn, pdch->ts_no, bts_current_frame_number(bts));

    if (ul_tbf && reason == PDCH_ULC_POLL_UL_ACK && tbf_ul_ack_exp_ctrl_ack(ul_tbf, poll_fn, pdch->ts_no)) {
        bts_do_rate_ctr_inc(bts, CTR_RLC_ACK_TIMEDOUT);
        bts_do_rate_ctr_inc(bts, CTR_PUAN_POLL_TIMEDOUT);
        if (state_is(TBF_ST_FINISHED)) {
            if (ul_tbf->n_inc(N3103)) {
                bts_do_rate_ctr_inc(bts, CTR_PUAN_POLL_FAILED);
                osmo_fsm_inst_dispatch(this->state_fsm.fi, TBF_EV_MAX_N3103, NULL);
                return;
            }
        }
        osmo_fsm_inst_dispatch(ul_tbf->ul_ack_fsm.fi, TBF_UL_ACK_EV_POLL_TIMEOUT, NULL);
551: } else if (ul_ass_state_is(TBF_UL_ASS_WAIT_ACK)) {
        bts_do_rate_ctr_inc(bts, CTR_RLC_ASS_TIMEDOUT);
        bts_do_rate_ctr_inc(bts, CTR_PUA_POLL_TIMEDOUT);
        if (n_inc(N3105)) {
            osmo_fsm_inst_dispatch(this->state_fsm.fi, TBF_EV_MAX_N3105, NULL);
            bts_do_rate_ctr_inc(bts, CTR_RLC_ASS_FAILED);
            bts_do_rate_ctr_inc(bts, CTR_PUA_POLL_FAILED);
            return;
        }
        /* Signal timeout to FSM to reschedule UL assignment */
        osmo_fsm_inst_dispatch(this->ul_ass_fsm.fi, TBF_UL_ASS_EV_ASS_POLL_TIMEOUT, NULL);
Actions #8

Updated by pespin over 2 years ago

Hi iedemam, I submitted a set of patches doing some general clean up and refactoring which was still pending on that side of the code. See following commits in gerrit:

tbf_fsm: Ignore event DL_ACKNACK_MISS in state RELEASING
assert if tbf pointer for POLL event is NULL
cosmetic: Fix missing space
pdch: refactor rcv_control_ack() with a switch statement
tests: TbfTest: Fix wrong behavior in test_tbf_dl_reuse()
tbf: refactor poll_timeout() with a switch statement
nacc: Introduce helper function nacc_fsm_exp_ctrl_ack()
tbf: poll_timeout(): Validate expected poll reason 

Those link to https://gerrit.osmocom.org/c/osmo-pcu/+/25625 and follow-up patches.

I'm not sure they will fix the crash but they will at least add some extra levels of security against unexpected paths (it already caught a wrong set of steps from one of the unit tests).

From looking at your traces, the exact issue is not clear. It seems to come from the PDCH ULC (uplink controller), where it reports a timeout from a tbf poll, but the tbf seems gone from memory or corrupted or whatever. I added a few asserts in some patches which should help clarify the issue. Also if you can, building osmo-pcu with --enable-sanitize would help find some possible heap-use-after-free.

BTW, I'm seeing a lot of FN clock jumps in your setup (eg: "bts.cpp:314 Detected FN jump! 1369160 -> 1369173"). This indicates something is wrong with your setup, and it may be somehow responsible to the crashes (I'm not saying osmo-pcu is fine crashing there, just that it may be triggering some code paths usually not triggered). Are you using master osmo-bts-trx too? Do you have NOPE.ind implemented in your TRX?

PS: Feel free to contact me on jabber if you want to debug this issue or discuss whatever related together.

Actions #9

Updated by pespin over 2 years ago

I believe this should be fixed in current osmo-pcu.git master b0aba591433c7c22298035453713724172d1cfbc

Please iedemam see if you can still reproduce.

Actions #10

Updated by laforge over 2 years ago

pespin wrote:

I believe this should be fixed in current osmo-pcu.git master b0aba591433c7c22298035453713724172d1cfbc

Please iedemam see if you can still reproduce.

any update here, can the issue be closed?

Actions #11

Updated by iedemam over 2 years ago

Have not been able to reproduce this. Looking good and can be closed. Thanks!

Actions #12

Updated by pespin over 2 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 70 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)