Project

General

Profile

Actions

Bug #6020

closed

"Discarding duplicated RSP from old CMD ..."

Added by fixeria 11 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
osmo-bts-trx
Target version:
-
Start date:
04/28/2023
Due date:
% Done:

100%

Spec Reference:

Description

I am seeing these messages quite often when running osmo-bts-trx with fake_trx.py:

...
DTRX NOTICE trx_if.c:694 phy0.0: Discarding duplicated RSP from old CMD 'RSP SETFORMAT 1 1'
...
DTRX NOTICE trx_if.c:708 phy0.0: Discarding duplicated RSP from old CMD 'RSP POWERON -1'
...
DTRX NOTICE trx_if.c:708 phy0.0: Discarding duplicated RSP from old CMD 'RSP SETSLOT 0 0 4'
...

First I thought it's a bug in fake_trx.py, but then I found out that it's actually osmo-bts-trx sending some messages twice:

...
DTRX FATAL trx_if.c:268 phy0.0: Enqueue TRXC 'CMD SETFORMAT 1'
DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD SETFORMAT 1'  <---- sending 1st time
DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD SETFORMAT 1'  <---- sending 2nd time?!?
DTRX FATAL trx_if.c:685 phy0.0: Response message: 'RSP SETFORMAT 1 1'
DTRX FATAL trx_if.c:727 phy0.0: Dequeue TRXC 'CMD SETFORMAT 1'
DTRX FATAL trx_if.c:685 phy0.0: Response message: 'RSP SETFORMAT 1 1'
DTRX NOTICE trx_if.c:694 phy0.0: Discarding duplicated RSP from old CMD 'RSP SETFORMAT 1 1'
...
Actions #1

Updated by fixeria 11 months ago

gdb shows that duplicate messages are sent during the same iteration of osmo_select_main, but from different places:

DTRX FATAL trx_if.c:268 phy0.0: Enqueue TRXC 'CMD RFMUTE 1'
#0  trx_ctrl_send (l1h=0x5555556b98f0) at trx_if.c:163
#1  0x0000555555563c0e in trx_ctrl_cmd_cb (l1h=0x5555556b98f0, critical=0, cb=0x0, cmd=0x5555555c9682 "RFMUTE", fmt=0x5555555c967e "1") at trx_if.c:274
#2  0x000055555556445a in trx_if_cmd_rfmute (l1h=0x5555556b98f0, mute=true) at trx_if.c:415
#3  0x000055555556866a in bts_model_trx_deact_rf (trx=0x7ffff6b6e070) at l1_if.c:178
#4  0x00005555555a9a56 in st_wait_trx_closed_on_enter (fi=0x555555787ae0, prev_state=1) at bts_shutdown_fsm.c:137
#5  0x00007ffff7be4fe0 in state_chg (fi=0x555555787ae0, new_state=<optimized out>, keep_timer=keep_timer@entry=false, timeout_ms=<optimized out>, T=-2, 
    file=<optimized out>, line=126) at ../../../../src/libosmocore/src/core/fsm.c:697
#6  0x00007ffff7be5980 in _osmo_fsm_inst_state_chg (fi=<optimized out>, new_state=<optimized out>, timeout_secs=<optimized out>, T=<optimized out>, 
    file=<optimized out>, line=<optimized out>) at ../../../../src/libosmocore/src/core/fsm.c:746
#7  0x00005555555a99f7 in st_wait_ramp_down_compl (fi=0x555555787ae0, event=1, data=0x7ffff6b6e070) at bts_shutdown_fsm.c:126
#8  0x00007ffff7be5bbc in _osmo_fsm_inst_dispatch (fi=0x555555787ae0, event=1, data=0x7ffff6b6e070, file=0x5555555d4a60 "bts_shutdown_fsm.c", line=82)
    at ../../../../src/libosmocore/src/core/fsm.c:875
#9  0x00005555555a9640 in ramp_down_compl_cb (trx=0x7ffff6b6e070) at bts_shutdown_fsm.c:82
#10 0x00005555555a8de1 in power_ramp_do_step (trx=0x7ffff6b6e070, first=0) at tx_power.c:216
#11 0x00005555555a8d85 in power_trx_change_compl (trx=0x7ffff6b6e070, p_trxout_cur_mdBm=-10000) at tx_power.c:206
#12 0x0000555555568578 in l1if_setpower_att_cb (l1h=0x5555556b98f0, power_att_db=60, rc=0) at l1_if.c:150
#13 0x0000555555565131 in trx_ctrl_rx_rsp_setpower (l1h=0x5555556b98f0, rsp=0x7fffffffd7c0) at trx_if.c:619
#14 0x000055555556529e in trx_ctrl_rx_rsp (l1h=0x5555556b98f0, rsp=0x7fffffffd7c0, tcm=0x5555558bd560) at trx_if.c:647
#15 0x0000555555565c3d in trx_ctrl_read_cb (ofd=0x5555556b9910, what=1) at trx_if.c:734
#16 0x00007ffff7bf2980 in poll_disp_fds (n_fd=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:419
#17 _osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:457
#18 0x00007ffff7bf2a5e in osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:496
#19 0x00005555555b5462 in bts_main (argc=3, argv=0x7fffffffe088) at main.c:429
#20 0x00005555555627a7 in main (argc=3, argv=0x7fffffffe088) at main.c:196
DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD RFMUTE 1'  <---- sending 1st time
DTRX FATAL trx_if.c:268 phy0.0: Enqueue TRXC 'CMD POWEROFF'

#0  trx_ctrl_send (l1h=0x5555556b98f0) at trx_if.c:163
#1  0x0000555555565c9e in trx_ctrl_read_cb (ofd=0x5555556b9910, what=1) at trx_if.c:745
#2  0x00007ffff7bf2980 in poll_disp_fds (n_fd=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:419
#3  _osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:457
#4  0x00007ffff7bf2a5e in osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:496
#5  0x00005555555b5462 in bts_main (argc=3, argv=0x7fffffffe088) at main.c:429
#6  0x00005555555627a7 in main (argc=3, argv=0x7fffffffe088) at main.c:196
DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD RFMUTE 1'  <---- sending 2nd time
DTRX FATAL trx_if.c:685 phy0.0: Response message: 'RSP RFMUTE 0 1'
DTRX FATAL trx_if.c:727 phy0.0: Dequeue TRXC 'CMD RFMUTE 1'
Actions #2

Updated by fixeria 11 months ago

I think I understand why this is happening, take a look at this code snippet from trx_ctrl_read_cb():

 724         /* Remove command from list, save it to last_acked and remove previous
 725          * last_acked. Do it before calling callback to avoid user freeing tcm
 726          * pointer if flushing/closing the iface. */
 727         llist_del(&tcm->list);
 728         talloc_free(l1h->last_acked);
 729         l1h->last_acked = tcm;
 730 
 731         /* check for response code */
 732         rc = trx_ctrl_rx_rsp(l1h, &rsp, tcm);
 733         if (rc == -EINVAL)
 734                 goto rsp_error;
 735 
 736         /* re-schedule last cmd in rc seconds time */
 737         if (rc > 0) {
 738                 if (!llist_empty(&l1h->trx_ctrl_list))
 739                         osmo_timer_schedule(&l1h->trx_ctrl_timer, rc, 0);
 740                 return 0;
 741         }
 742 
 743         trx_ctrl_send(l1h); 

One important point is that we keep a struct trx_ctrl_msg in the queue until we get a response, and then dequeue it in trx_ctrl_read_cb().
Another important point is that in trx_ctrl_cmd_cb() we send() a message immediately if the queue is empty.

  1. L724 the queue contains one struct trx_ctrl_msg, for which we have received a response;
  2. L727 the queue becomes empty, since we unlink the struct trx_ctrl_msg from there;
  3. L732 we call trx_ctrl_rx_rsp(), which may trigger code paths enqueueing another TRXC message;
    • in the backtrace above it actually does enqueue CMD RFMUTE 1,
    • since the queue is empty, CMD RFMUTE 1 gets sent immediately (see "sending 1st time"),
    • a struct trx_ctrl_msg remains in the queue because CMD RFMUTE 1 was not answered yet;
  4. L733 assuming rc == 0, the queue now contains a struct trx_ctrl_msg awaiting for response;
  5. L743 here we trigger sending of CMD RFMUTE 1 again.
Actions #3

Updated by fixeria 11 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 80

https://gerrit.osmocom.org/c/osmo-bts/+/32552 osmo-bts-trx: fix trx_ctrl_read_cb(): do not send CMDs twice [NEW]

Actions #4

Updated by fixeria 10 months ago

  • Status changed from Feedback to Stalled

The patch is still in Gerrit, pespin marked it negative "since this has to be thought more".

Actions #5

Updated by pespin 5 months ago

  • Status changed from Stalled to Feedback
  • Assignee changed from fixeria to pespin
Actions #6

Updated by pespin 5 months ago

  • Assignee changed from pespin to fixeria

I submitted a revert + new patch which I believe should properly fix the problems of the previous one:
https://gerrit.osmocom.org/c/osmo-bts/+/34894 Revert "trx_if: Allow calling trx_if_flush/close from within TRXC callback" [NEW]
https://gerrit.osmocom.org/c/osmo-bts/+/32552 trx_if: Allow calling trx_if_flush/close from within TRXC callback (v2)

fixeria it would be great if you can give it a try and confirm the issues you spotted are fixed now.

Actions #7

Updated by fixeria 5 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 80 to 100

I confirm that your patches fix the problem. All have been merged meanwhile.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)