Bug #6020
open"Discarding duplicated RSP from old CMD ..."
80%
Description
I am seeing these messages quite often when running osmo-bts-trx with fake_trx.py
:
... DTRX NOTICE trx_if.c:694 phy0.0: Discarding duplicated RSP from old CMD 'RSP SETFORMAT 1 1' ... DTRX NOTICE trx_if.c:708 phy0.0: Discarding duplicated RSP from old CMD 'RSP POWERON -1' ... DTRX NOTICE trx_if.c:708 phy0.0: Discarding duplicated RSP from old CMD 'RSP SETSLOT 0 0 4' ...
First I thought it's a bug in fake_trx.py
, but then I found out that it's actually osmo-bts-trx sending some messages twice:
... DTRX FATAL trx_if.c:268 phy0.0: Enqueue TRXC 'CMD SETFORMAT 1' DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD SETFORMAT 1' <---- sending 1st time DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD SETFORMAT 1' <---- sending 2nd time?!? DTRX FATAL trx_if.c:685 phy0.0: Response message: 'RSP SETFORMAT 1 1' DTRX FATAL trx_if.c:727 phy0.0: Dequeue TRXC 'CMD SETFORMAT 1' DTRX FATAL trx_if.c:685 phy0.0: Response message: 'RSP SETFORMAT 1 1' DTRX NOTICE trx_if.c:694 phy0.0: Discarding duplicated RSP from old CMD 'RSP SETFORMAT 1 1' ...
Updated by fixeria about 1 month ago
gdb shows that duplicate messages are sent during the same iteration of osmo_select_main
, but from different places:
DTRX FATAL trx_if.c:268 phy0.0: Enqueue TRXC 'CMD RFMUTE 1' #0 trx_ctrl_send (l1h=0x5555556b98f0) at trx_if.c:163 #1 0x0000555555563c0e in trx_ctrl_cmd_cb (l1h=0x5555556b98f0, critical=0, cb=0x0, cmd=0x5555555c9682 "RFMUTE", fmt=0x5555555c967e "1") at trx_if.c:274 #2 0x000055555556445a in trx_if_cmd_rfmute (l1h=0x5555556b98f0, mute=true) at trx_if.c:415 #3 0x000055555556866a in bts_model_trx_deact_rf (trx=0x7ffff6b6e070) at l1_if.c:178 #4 0x00005555555a9a56 in st_wait_trx_closed_on_enter (fi=0x555555787ae0, prev_state=1) at bts_shutdown_fsm.c:137 #5 0x00007ffff7be4fe0 in state_chg (fi=0x555555787ae0, new_state=<optimized out>, keep_timer=keep_timer@entry=false, timeout_ms=<optimized out>, T=-2, file=<optimized out>, line=126) at ../../../../src/libosmocore/src/core/fsm.c:697 #6 0x00007ffff7be5980 in _osmo_fsm_inst_state_chg (fi=<optimized out>, new_state=<optimized out>, timeout_secs=<optimized out>, T=<optimized out>, file=<optimized out>, line=<optimized out>) at ../../../../src/libosmocore/src/core/fsm.c:746 #7 0x00005555555a99f7 in st_wait_ramp_down_compl (fi=0x555555787ae0, event=1, data=0x7ffff6b6e070) at bts_shutdown_fsm.c:126 #8 0x00007ffff7be5bbc in _osmo_fsm_inst_dispatch (fi=0x555555787ae0, event=1, data=0x7ffff6b6e070, file=0x5555555d4a60 "bts_shutdown_fsm.c", line=82) at ../../../../src/libosmocore/src/core/fsm.c:875 #9 0x00005555555a9640 in ramp_down_compl_cb (trx=0x7ffff6b6e070) at bts_shutdown_fsm.c:82 #10 0x00005555555a8de1 in power_ramp_do_step (trx=0x7ffff6b6e070, first=0) at tx_power.c:216 #11 0x00005555555a8d85 in power_trx_change_compl (trx=0x7ffff6b6e070, p_trxout_cur_mdBm=-10000) at tx_power.c:206 #12 0x0000555555568578 in l1if_setpower_att_cb (l1h=0x5555556b98f0, power_att_db=60, rc=0) at l1_if.c:150 #13 0x0000555555565131 in trx_ctrl_rx_rsp_setpower (l1h=0x5555556b98f0, rsp=0x7fffffffd7c0) at trx_if.c:619 #14 0x000055555556529e in trx_ctrl_rx_rsp (l1h=0x5555556b98f0, rsp=0x7fffffffd7c0, tcm=0x5555558bd560) at trx_if.c:647 #15 0x0000555555565c3d in trx_ctrl_read_cb (ofd=0x5555556b9910, what=1) at trx_if.c:734 #16 0x00007ffff7bf2980 in poll_disp_fds (n_fd=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:419 #17 _osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:457 #18 0x00007ffff7bf2a5e in osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:496 #19 0x00005555555b5462 in bts_main (argc=3, argv=0x7fffffffe088) at main.c:429 #20 0x00005555555627a7 in main (argc=3, argv=0x7fffffffe088) at main.c:196 DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD RFMUTE 1' <---- sending 1st time DTRX FATAL trx_if.c:268 phy0.0: Enqueue TRXC 'CMD POWEROFF' #0 trx_ctrl_send (l1h=0x5555556b98f0) at trx_if.c:163 #1 0x0000555555565c9e in trx_ctrl_read_cb (ofd=0x5555556b9910, what=1) at trx_if.c:745 #2 0x00007ffff7bf2980 in poll_disp_fds (n_fd=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:419 #3 _osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:457 #4 0x00007ffff7bf2a5e in osmo_select_main (polling=<optimized out>) at ../../../../src/libosmocore/src/core/select.c:496 #5 0x00005555555b5462 in bts_main (argc=3, argv=0x7fffffffe088) at main.c:429 #6 0x00005555555627a7 in main (argc=3, argv=0x7fffffffe088) at main.c:196 DTRX FATAL trx_if.c:177 phy0.0: Sending control 'CMD RFMUTE 1' <---- sending 2nd time DTRX FATAL trx_if.c:685 phy0.0: Response message: 'RSP RFMUTE 0 1' DTRX FATAL trx_if.c:727 phy0.0: Dequeue TRXC 'CMD RFMUTE 1'
Updated by fixeria about 1 month ago
I think I understand why this is happening, take a look at this code snippet from trx_ctrl_read_cb()
:
724 /* Remove command from list, save it to last_acked and remove previous
725 * last_acked. Do it before calling callback to avoid user freeing tcm
726 * pointer if flushing/closing the iface. */
727 llist_del(&tcm->list);
728 talloc_free(l1h->last_acked);
729 l1h->last_acked = tcm;
730
731 /* check for response code */
732 rc = trx_ctrl_rx_rsp(l1h, &rsp, tcm);
733 if (rc == -EINVAL)
734 goto rsp_error;
735
736 /* re-schedule last cmd in rc seconds time */
737 if (rc > 0) {
738 if (!llist_empty(&l1h->trx_ctrl_list))
739 osmo_timer_schedule(&l1h->trx_ctrl_timer, rc, 0);
740 return 0;
741 }
742
743 trx_ctrl_send(l1h);
One important point is that we keep a struct trx_ctrl_msg
in the queue until we get a response, and then dequeue it in trx_ctrl_read_cb()
.
Another important point is that in trx_ctrl_cmd_cb()
we send() a message immediately if the queue is empty.
- L724 the queue contains one
struct trx_ctrl_msg
, for which we have received a response; - L727 the queue becomes empty, since we unlink the
struct trx_ctrl_msg
from there; - L732 we call
trx_ctrl_rx_rsp()
, which may trigger code paths enqueueing another TRXC message;- in the backtrace above it actually does enqueue
CMD RFMUTE 1
, - since the queue is empty,
CMD RFMUTE 1
gets sent immediately (see "sending 1st time"), - a
struct trx_ctrl_msg
remains in the queue becauseCMD RFMUTE 1
was not answered yet;
- in the backtrace above it actually does enqueue
- L733 assuming
rc == 0
, the queue now contains astruct trx_ctrl_msg
awaiting for response; - L743 here we trigger sending of
CMD RFMUTE 1
again.
Updated by fixeria about 1 month ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 80
https://gerrit.osmocom.org/c/osmo-bts/+/32552 osmo-bts-trx: fix trx_ctrl_read_cb(): do not send CMDs twice [NEW]