Bug #3692
closedosmo-trx-uhd fails to start with UHD "recv packet demuxer unexpected sid"
90%
Description
Sometimes TTCN3 tests running with real HW under osmo-gsm-tester (osmo-bts-trx + osmo+trx + B200 in this case) fails with TTCN3 error "RSL_Emulation.ttcn:381 : Lost IPA connection!"
Looking at pcap file, one can see indeed osmo-bts-trx closes the TCP conn after starting IPA handshake (TCP FIN is sent to TTCN3 BSC emulation).
osmo-bts-trx log shows why it closes the connection:
20181113143214669 DPAG <0005> paging.c:540 Paging SI update 20181113143214669 DPCU <0009> pcu_sock.c:124 Sending info 20181113143214669 DPCU <0009> pcu_sock.c:139 BTS is up 20181113143214669 DPCU <0009> pcu_sock.c:232 trx=0 ts=7: available (tsc=7 arfcn=868) 20181113143215220 DTRX <000b> trx_if.c:450 Response message: 'RSP POWERON 1' 20181113143215221 DTRX <000b> trx_if.c:491 transceiver (phy0.0) rejected TRX command with response: 'RSP POWERON 1' 20181113143215221 DOML <0001> bts.c:239 Shutting down BTS 0, Reason TRX-CTRL-MSG: CRITICAL 20181113143215221 DL1C <0006> scheduler.c:242 Exit scheduler for trx=0 20181113143215221 DL1C <0006> scheduler.c:219 Init scheduler for trx=0 20181113143215221 DTRX <000b> trx_if.c:240 Enqueuing TRX control command 'CMD POWEROFF' ... Shutdown timer expired
So basically osmo-trx-uhd fails to start running and tells the osmo-bts-trx, which decides to die.
osmo-trx-uhd shows during that time:
20181113143206437 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is TXTUNE 1876400 20181113143206437 DDEV <0001> UHDDevice.cpp:1114 [tid=140684301293312] Tune Result: Target RF Freq: 1876.400000 (MHz) Actual RF Freq: 1876.399999 (MHz) Target DSP Freq: 0.000001 (MHz) Actual DSP Freq: 0.000001 (MHz) 20181113143206438 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETTSC 7 20181113143206438 DMAIN <0000> Transceiver.cpp:826 [tid=140684301293312] Changing TSC from 7 to 7 20181113143206438 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWERON 20181113143206438 DMAIN <0000> Transceiver.cpp:253 [tid=140684301293312] Starting the transceiver 20181113143206438 DMAIN <0000> radioInterface.cpp:169 [tid=140684301293312] Starting radio device 20181113143206438 DDEV <0001> UHDDevice.cpp:804 [tid=140684301293312] Starting USRP... 20181113143206438 DMAIN <0000> Threads.cpp:116 [tid=140683935381248] Thread 140683935381248 (task 58222) set name: UHDAsyncEvent 20181113143206543 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x80001 20181113143206544 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90001 20181113143206546 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90005 20181113143206548 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xeffff 20181113143206550 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0004 20181113143206552 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xb0004 20181113143206554 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xafff0 20181113143206556 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x20004 20181113143206558 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x50002 20181113143206560 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffef000a 20181113143206561 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffafffe 20181113143206563 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xd0009 20181113143206565 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffbfffa 20181113143206567 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x6fffc 20181113143206569 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff5fffe 20181113143206571 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80005 20181113143206573 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff7ffff 20181113143206575 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff8fffa 20181113143206577 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x5fffe 20181113143206578 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xb0005 20181113143206580 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffaffff 20181113143206582 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x10008 20181113143206584 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80001 20181113143206586 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90005 20181113143206588 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffa0015 20181113143206590 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xe0002 20181113143206592 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffcfff5 20181113143206594 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff60001 20181113143206595 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80000 20181113143206597 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffe000b 20181113143206599 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0006 20181113143206601 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff70009 20181113143206603 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0003 20181113143206605 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x1ffff ... (lots more of this unexpected error messages coming from UHD)... 20181113143207539 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff9fffe 20181113143207541 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffb0013 20181113143207541 DDEV <0001> UHDDevice.cpp:769 [tid=140684301293312] Device timed out 20181113143207541 DMAIN <0000> Transceiver.cpp:261 [tid=140684301293312] Device failed to start 20181113143210961 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143211017 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143211440 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143214961 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF 20181113143215018 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF
So it seems it fails due to some low level UHD issues, and sends the RSP POWERON 1 (see "Device failed to start" in code, start() return false).
Next time (test) osmo-bts-trx is started, power on in osmo-trx starts exactly the same way (failing), but eventually recovers and startup fine:
20181113143215442 DMAIN <0000> Transceiver.cpp:253 [tid=140684301293312] Starting the transceiver 20181113143215442 DMAIN <0000> radioInterface.cpp:169 [tid=140684301293312] Starting radio device 20181113143215442 DDEV <0001> UHDDevice.cpp:804 [tid=140684301293312] Starting USRP... 20181113143215442 DMAIN <0000> Threads.cpp:116 [tid=140683935115008] Thread 140683935115008 (task 58223) set name: UHDAsyncEvent 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x10fffa 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffefffd 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x3ffff 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x5fff4 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x8ffea 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x50000 20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x1ffef ... 20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff4fffd 20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff3fffe 20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x6fff3 20181113143215564 DDEV <0001> UHDDevice.cpp:779 [tid=140684301293312] Initial timestamp 723990769 20181113143215564 DDEV <0001> UHDDevice.cpp:825 [tid=140684301293312] The current time is 668.302 seconds 20181113143215564 DMAIN <0000> radioInterface.cpp:190 [tid=140684301293312] Radio started 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935414016] Thread 140683935414016 (task 58224) set name: TxLower 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935446784] Thread 140683935446784 (task 58225) set name: RxLower 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935479552] Thread 140683935479552 (task 58226) set name: RxUpper0 20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140684299265792] Thread 140684299265792 (task 58227) set name: TxUpper0 20181113143215565 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETRXGAIN 25 20181113143215565 DDEV <0001> UHDDevice.cpp:497 [tid=140684301293312] Set RX gain to 25dB (asked for 25dB) 20181113143215566 DDEV <0001> UHDDevice.cpp:1482 [tid=140683935446784] Skipping buffer data: timestamp=723992813 time_end=709289470 20181113143215566 DMAIN <0000> Transceiver.cpp:1038 [tid=140683935446784] ClockInterface: sending IND CLOCK 2421400 20181113143215566 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETPOWER 0 20181113143215567 DDEV <0001> UHDDevice.cpp:482 [tid=140684301293312] Set TX gain to 89.75dB (asked for 89.75dB) 20181113143215567 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 0 5 20181113143215568 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 1 1 20181113143215568 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 2 1 20181113143215582 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 3 1 20181113143215629 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 4 8 20181113143215678 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 5 3 20181113143215726 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 6 7 20181113143215775 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 7 13 20181113143216010 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is NOHANDOVER 7 0
This issue seems to (hopefully) be solved in UHD 3.12: https://github.com/EttusResearch/uhd/issues/173
In debian9 there's currently 3.9.5-2+b3, quite old. 3.12 is available in buster (testing).
So, what I propose:
- Update RnD debian9 with UHD packages from testing and see if everything works fine, then upgrade Prod (ansible?)
- Since it seems re-starting the Transceiver in osmo-trx eventually works, let's modify osmo-bts-trx to instead of exiting, just re-try to power on the device in a few seconds (and somehow state to osmo-bsc in OML that state is degraded?)
Related issues
Updated by fixeria over 5 years ago
- Related to Bug #3458: apps/grgsm_trx: [ERROR] [STREAMER] recv packet demuxer unexpected sid 0xXX added
Updated by pespin over 5 years ago
- Status changed from New to In Progress
Pushed some commits to help with this kind of issue:
remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11765 bts-trx: trx_if: Use struct to store CTRL msg parsed responses
remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11766 bts-trx: setup timer once at creation time
remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11767 common: Implement OML for trx_set_available(1)
remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11768 bts-trx: Allow re-try without exit if POWERON fails in osmo-trx
Updated by pespin over 5 years ago
I used something like this to simulate osmo-trx start failing ocasionally:
diff --git a/Transceiver52M/Transceiver.cpp b/Transceiver52M/Transceiver.cpp index d6ddce8..c889d88 100644 --- a/Transceiver52M/Transceiver.cpp +++ b/Transceiver52M/Transceiver.cpp @@ -241,6 +241,8 @@ void Transceiver::setSignalHandler(osmo_signal_cbfn cbfn) * launch threads to handle sample I/O. Re-synchronize the transmit burst * counters to the central radio clock here as well. */ + #include <sys/time.h> + bool Transceiver::start() { ScopedLock lock(mLock); @@ -257,6 +259,14 @@ bool Transceiver::start() mLastClockUpdateTime = time; mLatencyUpdateTime = time; + struct timeval tv; + gettimeofday(&tv, NULL); + srandom(tv.tv_usec); + if(random() % 3 != 0) { + LOG(ALERT) << "Device failed to start"; + return false; + } + if (!mRadioInterface->start()) { LOG(ALERT) << "Device failed to start"; return false;
Updated by pespin over 5 years ago
- Status changed from In Progress to Feedback
- % Done changed from 0 to 90
I tried updating to UHD 3.12 using debian testing repos, but it pulls in too many deps:
# cat sources.list.d/testing.list deb http://ftp.de.debian.org/debian/ testing main non-free contrib deb-src http://ftp.de.debian.org/debian/ testing main non-free contrib deb http://security.debian.org/debian-security testing/updates main contrib non-free deb-src http://security.debian.org/debian-security testing/updates main contrib non-free
# cat apt.conf.d/99defaultrelease APT::Default-Release "stable";
root@osmo-gsm-tester-rnd:/etc/apt/preferences.d# apt-get -t testing install uhd-host Reading package lists... Done Building dependency tree Reading state information... Done The following packages were automatically installed and are no longer required: libpython3.5-dev python3.5 python3.5-dev python3.5-minimal Use 'sudo apt autoremove' to remove them. The following additional packages will be installed: gcc-8-base libapt-inst2.0 libapt-pkg5.0 libboost-atomic1.62-dev libboost-atomic1.62.0 libboost-chrono1.62-dev libboost-chrono1.62.0 libboost-context1.62-dev libboost-context1.62.0 libboost-coroutine1.62-dev libboost-coroutine1.62.0 libboost-date-time1.62-dev libboost-date-time1.62.0 libboost-exception1.62-dev libboost-fiber1.62-dev libboost-fiber1.62.0 libboost-filesystem1.62-dev libboost-filesystem1.62.0 libboost-graph-parallel1.62-dev libboost-graph-parallel1.62.0 libboost-graph1.62-dev libboost-graph1.62.0 libboost-iostreams1.62-dev libboost-iostreams1.62.0 libboost-locale1.62-dev libboost-locale1.62.0 libboost-log1.62-dev libboost-log1.62.0 libboost-math1.62-dev libboost-math1.62.0 libboost-mpi-python1.62-dev libboost-mpi-python1.62.0 libboost-mpi1.62-dev libboost-mpi1.62.0 libboost-program-options1.62-dev libboost-program-options1.62.0 libboost-python1.62-dev libboost-python1.62.0 libboost-random1.62-dev libboost-random1.62.0 libboost-regex1.62-dev libboost-regex1.62.0 libboost-serialization1.62-dev libboost-serialization1.62.0 libboost-signals1.62-dev libboost-signals1.62.0 libboost-system1.62-dev libboost-system1.62.0 libboost-test1.62-dev libboost-test1.62.0 libboost-thread1.62-dev libboost-thread1.62.0 libboost-timer1.62-dev libboost-timer1.62.0 libboost-type-erasure1.62-dev libboost-type-erasure1.62.0 libboost-wave1.62-dev libboost-wave1.62.0 libboost1.62-dev libc-bin libc-dev-bin libc-l10n libc6 libc6-dbg libc6-dev libevent-2.1-6 libevent-core-2.1-6 libevent-pthreads-2.1-6 libfabric1 libgfortran5 libhwloc-dev libhwloc5 libicu-le-hb0 libicu60 libncurses6 libncursesw6 libopenmpi3 libpmix2 libpsm2-2 libpython3-dev libpython3-stdlib libpython3.6 libpython3.6-dev libpython3.6-minimal libpython3.6-stdlib libssl-dev libssl1.1 libstdc++6 libtinfo6 libuhd3.12.0 libzstd1 locales python3 python3-apt python3-cffi-backend python3-crypto python3-dbus python3-dev python3-distutils python3-gi python3-lib2to3 python3-markupsafe python3-minimal python3-yaml python3.6 python3.6-dev python3.6-minimal Suggested packages: libboost1.62-doc gccxml libmpfrc++-dev libntl-dev glibc-doc libhwloc-contrib-plugins gnuradio python3-doc python3-tk python3-venv python3-apt-dbg python-apt-doc python-crypto-doc python-dbus-doc python3-dbus-dbg python3.6-venv python3.6-doc Recommended packages: apt The following NEW packages will be installed: gcc-8-base libevent-2.1-6 libevent-core-2.1-6 libevent-pthreads-2.1-6 libgfortran5 libicu-le-hb0 libicu60 libncurses6 libncursesw6 libopenmpi3 libpmix2 libpsm2-2 libpython3.6 libpython3.6-dev libpython3.6-minimal libpython3.6-stdlib libtinfo6 libuhd3.12.0 libzstd1 python3-distutils python3-lib2to3 python3.6 python3.6-dev python3.6-minimal The following packages will be upgraded: libapt-inst2.0 libapt-pkg5.0 libboost-atomic1.62-dev libboost-atomic1.62.0 libboost-chrono1.62-dev libboost-chrono1.62.0 libboost-context1.62-dev libboost-context1.62.0 libboost-coroutine1.62-dev libboost-coroutine1.62.0 libboost-date-time1.62-dev libboost-date-time1.62.0 libboost-exception1.62-dev libboost-fiber1.62-dev libboost-fiber1.62.0 libboost-filesystem1.62-dev libboost-filesystem1.62.0 libboost-graph-parallel1.62-dev libboost-graph-parallel1.62.0 libboost-graph1.62-dev libboost-graph1.62.0 libboost-iostreams1.62-dev libboost-iostreams1.62.0 libboost-locale1.62-dev libboost-locale1.62.0 libboost-log1.62-dev libboost-log1.62.0 libboost-math1.62-dev libboost-math1.62.0 libboost-mpi-python1.62-dev libboost-mpi-python1.62.0 libboost-mpi1.62-dev libboost-mpi1.62.0 libboost-program-options1.62-dev libboost-program-options1.62.0 libboost-python1.62-dev libboost-python1.62.0 libboost-random1.62-dev libboost-random1.62.0 libboost-regex1.62-dev libboost-regex1.62.0 libboost-serialization1.62-dev libboost-serialization1.62.0 libboost-signals1.62-dev libboost-signals1.62.0 libboost-system1.62-dev libboost-system1.62.0 libboost-test1.62-dev libboost-test1.62.0 libboost-thread1.62-dev libboost-thread1.62.0 libboost-timer1.62-dev libboost-timer1.62.0 libboost-type-erasure1.62-dev libboost-type-erasure1.62.0 libboost-wave1.62-dev libboost-wave1.62.0 libboost1.62-dev libc-bin libc-dev-bin libc-l10n libc6 libc6-dbg libc6-dev libfabric1 libhwloc-dev libhwloc5 libpython3-dev libpython3-stdlib libssl-dev libssl1.1 libstdc++6 locales python3 python3-apt python3-cffi-backend python3-crypto python3-dbus python3-dev python3-gi python3-markupsafe python3-minimal python3-yaml uhd-host
But anyway, hopefully patches to osmo-bts-trx should be enough to get TTCN3 tests passing when this kind of issue occurs.
Updated by tnt about 5 years ago
This looks very very much like the improper reset bug of UHD.
Use newer UHD or a fpga image that has the workaround. Nothing that can really be done in osmo-trx.
Updated by pespin almost 5 years ago
- Status changed from Feedback to Closed
I haven't seen this bug for a while, since we started using newer debian and UHD (3.14). Closing the ticket.