Bug #3692
closedosmo-trx-uhd fails to start with UHD "recv packet demuxer unexpected sid"
90%
Description
Sometimes TTCN3 tests running with real HW under osmo-gsm-tester (osmo-bts-trx + osmo+trx + B200 in this case) fails with TTCN3 error "RSL_Emulation.ttcn:381 : Lost IPA connection!"
Looking at pcap file, one can see indeed osmo-bts-trx closes the TCP conn after starting IPA handshake (TCP FIN is sent to TTCN3 BSC emulation).
osmo-bts-trx log shows why it closes the connection:
20181113143214669 DPAG <0005> paging.c:540 Paging SI update 20181113143214669 DPCU <0009> pcu_sock.c:124 Sending info 20181113143214669 DPCU <0009> pcu_sock.c:139 BTS is up 20181113143214669 DPCU <0009> pcu_sock.c:232 trx=0 ts=7: available (tsc=7 arfcn=868) 20181113143215220 DTRX <000b> trx_if.c:450 Response message: 'RSP POWERON 1' 20181113143215221 DTRX <000b> trx_if.c:491 transceiver (phy0.0) rejected TRX command with response: 'RSP POWERON 1' 20181113143215221 DOML <0001> bts.c:239 Shutting down BTS 0, Reason TRX-CTRL-MSG: CRITICAL 20181113143215221 DL1C <0006> scheduler.c:242 Exit scheduler for trx=0 20181113143215221 DL1C <0006> scheduler.c:219 Init scheduler for trx=0 20181113143215221 DTRX <000b> trx_if.c:240 Enqueuing TRX control command 'CMD POWEROFF' ... Shutdown timer expired
So basically osmo-trx-uhd fails to start running and tells the osmo-bts-trx, which decides to die.
osmo-trx-uhd shows during that time:
20181113143206437 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is TXTUNE 1876400 20181113143206437 DDEV <0001> UHDDevice.cpp:1114 [tid=140684301293312] Tune Result: Target RF Freq: 1876.400000 (MHz) Actual RF Freq: 1876.399999 (MHz) Target DSP Freq: 0.000001 (MHz) Actual DSP Freq: 0.000001 (MHz) 20181113143206438 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETTSC 7 20181113143206438 DMAIN <0000> Transceiver.cpp:826 [tid=140684301293312] Changing TSC from 7 to 7 20181113143206438 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWERON 20181113143206438 DMAIN <0000> Transceiver.cpp:253 [tid=140684301293312] Starting the transceiver 20181113143206438 DMAIN <0000> radioInterface.cpp:169 [tid=140684301293312] Starting radio device 20181113143206438 DDEV <0001> UHDDevice.cpp:804 [tid=140684301293312] Starting USRP... 20181113143206438 DMAIN <0000> Threads.cpp:116 [tid=140683935381248] Thread 140683935381248 (task 58222) set name: UHDAsyncEvent 20181113143206543 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x80001 20181113143206544 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90001 20181113143206546 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90005 20181113143206548 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xeffff 20181113143206550 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0004 20181113143206552 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xb0004 20181113143206554 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xafff0 20181113143206556 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x20004 20181113143206558 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x50002 20181113143206560 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffef000a 20181113143206561 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffafffe 20181113143206563 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xd0009 20181113143206565 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffbfffa 20181113143206567 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x6fffc 20181113143206569 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff5fffe 20181113143206571 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80005 20181113143206573 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff7ffff 20181113143206575 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff8fffa 20181113143206577 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x5fffe 20181113143206578 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xb0005 20181113143206580 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffaffff 20181113143206582 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x10008 20181113143206584 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80001 20181113143206586 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90005 20181113143206588 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffa0015 20181113143206590 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xe0002 20181113143206592 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffcfff5 20181113143206594 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff60001 20181113143206595 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80000 20181113143206597 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffe000b 20181113143206599 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0006 20181113143206601 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff70009 20181113143206603 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0003 20181113143206605 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x1ffff ... (lots more of this unexpected error messages coming from UHD)... 20181113143207539 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff9fffe 20181113143207541 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffb0013 20181113143207541 DDEV <0001> UHDDevice.cpp:769 [tid=140684301293312] Device timed out 20181113143207541 DMAIN <0000> Transceiver.cpp:261 [tid=140684301293312] Device failed to start 20181113143210961 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143211017 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143211440 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143214961 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143215018 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF
So it seems it fails due to some low level UHD issues, and sends the RSP POWERON 1 (see "Device failed to start" in code, start() return false).
Next time (test) osmo-bts-trx is started, power on in osmo-trx starts exactly the same way (failing), but eventually recovers and startup fine:
20181113143215442 DMAIN <0000> Transceiver.cpp:253 [tid=140684301293312] Starting the transceiver 20181113143215442 DMAIN <0000> radioInterface.cpp:169 [tid=140684301293312] Starting radio device 20181113143215442 DDEV <0001> UHDDevice.cpp:804 [tid=140684301293312] Starting USRP... 20181113143215442 DMAIN <0000> Threads.cpp:116 [tid=140683935115008] Thread 140683935115008 (task 58223) set name: UHDAsyncEvent 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x10fffa 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffefffd 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x3ffff 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x5fff4 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x8ffea 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x50000 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x1ffef ... 20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff4fffd 20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff3fffe 20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x6fff3 20181113143215564 DDEV <0001> UHDDevice.cpp:779 [tid=140684301293312] Initial timestamp 723990769 20181113143215564 DDEV <0001> UHDDevice.cpp:825 [tid=140684301293312] The current time is 668.302 seconds 20181113143215564 DMAIN <0000> radioInterface.cpp:190 [tid=140684301293312] Radio started 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935414016] Thread 140683935414016 (task 58224) set name: TxLower 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935446784] Thread 140683935446784 (task 58225) set name: RxLower 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935479552] Thread 140683935479552 (task 58226) set name: RxUpper0 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140684299265792] Thread 140684299265792 (task 58227) set name: TxUpper0 20181113143215565 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETRXGAIN 25 20181113143215565 DDEV <0001> UHDDevice.cpp:497 [tid=140684301293312] Set RX gain to 25dB (asked for 25dB) 20181113143215566 DDEV <0001> UHDDevice.cpp:1482 [tid=140683935446784] Skipping buffer data: timestamp=723992813 time_end=709289470 20181113143215566 DMAIN <0000> Transceiver.cpp:1038 [tid=140683935446784] ClockInterface: sending IND CLOCK 2421400 20181113143215566 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETPOWER 0 20181113143215567 DDEV <0001> UHDDevice.cpp:482 [tid=140684301293312] Set TX gain to 89.75dB (asked for 89.75dB) 20181113143215567 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 0 5 20181113143215568 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 1 1 20181113143215568 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 2 1 20181113143215582 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 3 1 20181113143215629 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 4 8 20181113143215678 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 5 3 20181113143215726 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 6 7 20181113143215775 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 7 13 20181113143216010 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is NOHANDOVER 7 0
This issue seems to (hopefully) be solved in UHD 3.12: https://github.com/EttusResearch/uhd/issues/173
In debian9 there's currently 3.9.5-2+b3, quite old. 3.12 is available in buster (testing).
So, what I propose:
- Update RnD debian9 with UHD packages from testing and see if everything works fine, then upgrade Prod (ansible?)
- Since it seems re-starting the Transceiver in osmo-trx eventually works, let's modify osmo-bts-trx to instead of exiting, just re-try to power on the device in a few seconds (and somehow state to osmo-bsc in OML that state is degraded?)
Related issues