Project

General

Profile

Actions

Bug #3692

closed

osmo-trx-uhd fails to start with UHD "recv packet demuxer unexpected sid"

Added by pespin over 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
UHD
Target version:
-
Start date:
11/13/2018
Due date:
% Done:

90%

Spec Reference:

Description

Sometimes TTCN3 tests running with real HW under osmo-gsm-tester (osmo-bts-trx + osmo+trx + B200 in this case) fails with TTCN3 error "RSL_Emulation.ttcn:381 : Lost IPA connection!"

Looking at pcap file, one can see indeed osmo-bts-trx closes the TCP conn after starting IPA handshake (TCP FIN is sent to TTCN3 BSC emulation).

osmo-bts-trx log shows why it closes the connection:

20181113143214669 DPAG <0005> paging.c:540 Paging SI update
20181113143214669 DPCU <0009> pcu_sock.c:124 Sending info
20181113143214669 DPCU <0009> pcu_sock.c:139 BTS is up
20181113143214669 DPCU <0009> pcu_sock.c:232 trx=0 ts=7: available (tsc=7 arfcn=868)
20181113143215220 DTRX <000b> trx_if.c:450 Response message: 'RSP POWERON 1'
20181113143215221 DTRX <000b> trx_if.c:491 transceiver (phy0.0) rejected TRX command with response: 'RSP POWERON 1'
20181113143215221 DOML <0001> bts.c:239 Shutting down BTS 0, Reason TRX-CTRL-MSG: CRITICAL
20181113143215221 DL1C <0006> scheduler.c:242 Exit scheduler for trx=0
20181113143215221 DL1C <0006> scheduler.c:219 Init scheduler for trx=0
20181113143215221 DTRX <000b> trx_if.c:240 Enqueuing TRX control command 'CMD POWEROFF'
...
Shutdown timer expired

So basically osmo-trx-uhd fails to start running and tells the osmo-bts-trx, which decides to die.

osmo-trx-uhd shows during that time:

20181113143206437 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is TXTUNE 1876400
20181113143206437 DDEV <0001> UHDDevice.cpp:1114 [tid=140684301293312]
Tune Result:
    Target RF  Freq: 1876.400000 (MHz)
    Actual RF  Freq: 1876.399999 (MHz)
    Target DSP Freq: 0.000001 (MHz)
    Actual DSP Freq: 0.000001 (MHz)

20181113143206438 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETTSC 7
20181113143206438 DMAIN <0000> Transceiver.cpp:826 [tid=140684301293312] Changing TSC from 7 to 7
20181113143206438 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWERON
20181113143206438 DMAIN <0000> Transceiver.cpp:253 [tid=140684301293312] Starting the transceiver
20181113143206438 DMAIN <0000> radioInterface.cpp:169 [tid=140684301293312] Starting radio device
20181113143206438 DDEV <0001> UHDDevice.cpp:804 [tid=140684301293312] Starting USRP...
20181113143206438 DMAIN <0000> Threads.cpp:116 [tid=140683935381248] Thread 140683935381248 (task 58222) set name: UHDAsyncEvent
20181113143206543 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x80001
20181113143206544 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90001
20181113143206546 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90005
20181113143206548 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xeffff
20181113143206550 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0004
20181113143206552 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xb0004
20181113143206554 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xafff0
20181113143206556 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x20004
20181113143206558 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x50002
20181113143206560 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffef000a
20181113143206561 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffafffe
20181113143206563 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xd0009
20181113143206565 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffbfffa
20181113143206567 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x6fffc
20181113143206569 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff5fffe
20181113143206571 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80005
20181113143206573 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff7ffff
20181113143206575 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff8fffa
20181113143206577 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x5fffe
20181113143206578 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xb0005
20181113143206580 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffaffff
20181113143206582 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x10008
20181113143206584 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80001
20181113143206586 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x90005
20181113143206588 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffa0015
20181113143206590 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xe0002
20181113143206592 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffcfff5
20181113143206594 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff60001
20181113143206595 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff80000
20181113143206597 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffe000b
20181113143206599 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0006
20181113143206601 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff70009
20181113143206603 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xffff0003
20181113143206605 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x1ffff
... (lots more of this unexpected error messages coming from UHD)...
20181113143207539 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff9fffe
20181113143207541 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffb0013
20181113143207541 DDEV <0001> UHDDevice.cpp:769 [tid=140684301293312] Device timed out
20181113143207541 DMAIN <0000> Transceiver.cpp:261 [tid=140684301293312] Device failed to start
20181113143210961 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF
20181113143211017 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF
20181113143211440 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF
20181113143214961 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF
20181113143215018 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is POWEROFF

So it seems it fails due to some low level UHD issues, and sends the RSP POWERON 1 (see "Device failed to start" in code, start() return false).

Next time (test) osmo-bts-trx is started, power on in osmo-trx starts exactly the same way (failing), but eventually recovers and startup fine:

20181113143215442 DMAIN <0000> Transceiver.cpp:253 [tid=140684301293312] Starting the transceiver
20181113143215442 DMAIN <0000> radioInterface.cpp:169 [tid=140684301293312] Starting radio device
20181113143215442 DDEV <0001> UHDDevice.cpp:804 [tid=140684301293312] Starting USRP...
20181113143215442 DMAIN <0000> Threads.cpp:116 [tid=140683935115008] Thread 140683935115008 (task 58223) set name: UHDAsyncEvent
20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x10fffa
20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfffefffd
20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x3ffff
20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x5fff4
20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x8ffea
20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x50000
20181113143215443 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x1ffef
...
20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff4fffd
20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0xfff3fffe
20181113143215446 DDEV <0001> UHDDevice.cpp:352 [tid=140684301293312] recv packet demuxer unexpected sid 0x6fff3
20181113143215564 DDEV <0001> UHDDevice.cpp:779 [tid=140684301293312] Initial timestamp 723990769
20181113143215564 DDEV <0001> UHDDevice.cpp:825 [tid=140684301293312] The current time is 668.302 seconds
20181113143215564 DMAIN <0000> radioInterface.cpp:190 [tid=140684301293312] Radio started
20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935414016] Thread 140683935414016 (task 58224) set name: TxLower
20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935446784] Thread 140683935446784 (task 58225) set name: RxLower
20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140683935479552] Thread 140683935479552 (task 58226) set name: RxUpper0
20181113143215564 DMAIN <0000> Threads.cpp:116 [tid=140684299265792] Thread 140684299265792 (task 58227) set name: TxUpper0
20181113143215565 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETRXGAIN 25
20181113143215565 DDEV <0001> UHDDevice.cpp:497 [tid=140684301293312] Set RX gain to 25dB (asked for 25dB)
20181113143215566 DDEV <0001> UHDDevice.cpp:1482 [tid=140683935446784] Skipping buffer data: timestamp=723992813 time_end=709289470
20181113143215566 DMAIN <0000> Transceiver.cpp:1038 [tid=140683935446784] ClockInterface: sending IND CLOCK 2421400
20181113143215566 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETPOWER 0
20181113143215567 DDEV <0001> UHDDevice.cpp:482 [tid=140684301293312] Set TX gain to 89.75dB (asked for 89.75dB)
20181113143215567 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 0 5
20181113143215568 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 1 1
20181113143215568 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 2 1
20181113143215582 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 3 1
20181113143215629 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 4 8
20181113143215678 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 5 3
20181113143215726 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 6 7
20181113143215775 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is SETSLOT 7 13
20181113143216010 DMAIN <0000> Transceiver.cpp:724 [tid=140684301293312] command is NOHANDOVER 7 0

This issue seems to (hopefully) be solved in UHD 3.12: https://github.com/EttusResearch/uhd/issues/173
In debian9 there's currently 3.9.5-2+b3, quite old. 3.12 is available in buster (testing).

So, what I propose:
- Update RnD debian9 with UHD packages from testing and see if everything works fine, then upgrade Prod (ansible?)
- Since it seems re-starting the Transceiver in osmo-trx eventually works, let's modify osmo-bts-trx to instead of exiting, just re-try to power on the device in a few seconds (and somehow state to osmo-bsc in OML that state is degraded?)


Related issues

Related to OsmocomBB SDR PHY - Bug #3458: apps/grgsm_trx: [ERROR] [STREAMER] recv packet demuxer unexpected sid 0xXXStalled08/10/2018

Actions
Actions #1

Updated by fixeria over 5 years ago

  • Related to Bug #3458: apps/grgsm_trx: [ERROR] [STREAMER] recv packet demuxer unexpected sid 0xXX added
Actions #2

Updated by pespin over 5 years ago

  • Status changed from New to In Progress

Pushed some commits to help with this kind of issue:

remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11765 bts-trx: trx_if: Use struct to store CTRL msg parsed responses
remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11766 bts-trx: setup timer once at creation time
remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11767 common: Implement OML for trx_set_available(1)
remote: https://gerrit.osmocom.org/#/c/osmo-bts/+/11768 bts-trx: Allow re-try without exit if POWERON fails in osmo-trx

Actions #3

Updated by pespin over 5 years ago

I used something like this to simulate osmo-trx start failing ocasionally:

diff --git a/Transceiver52M/Transceiver.cpp b/Transceiver52M/Transceiver.cpp
index d6ddce8..c889d88 100644
--- a/Transceiver52M/Transceiver.cpp
+++ b/Transceiver52M/Transceiver.cpp
@@ -241,6 +241,8 @@ void Transceiver::setSignalHandler(osmo_signal_cbfn cbfn)
  * launch threads to handle sample I/O. Re-synchronize the transmit burst
  * counters to the central radio clock here as well.
  */
+ #include <sys/time.h>
+
 bool Transceiver::start()
 {
   ScopedLock lock(mLock);
@@ -257,6 +259,14 @@ bool Transceiver::start()
   mLastClockUpdateTime = time;
   mLatencyUpdateTime = time;

+  struct timeval tv;
+  gettimeofday(&tv, NULL);
+  srandom(tv.tv_usec);
+  if(random() % 3 != 0) {
+           LOG(ALERT) << "Device failed to start";
+        return false;
+        }
+
   if (!mRadioInterface->start()) {
     LOG(ALERT) << "Device failed to start";
     return false;

Actions #4

Updated by pespin over 5 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 90

I tried updating to UHD 3.12 using debian testing repos, but it pulls in too many deps:

# cat sources.list.d/testing.list
deb http://ftp.de.debian.org/debian/ testing main non-free contrib
deb-src http://ftp.de.debian.org/debian/ testing main non-free contrib

deb http://security.debian.org/debian-security testing/updates main contrib non-free
deb-src http://security.debian.org/debian-security testing/updates main contrib non-free
# cat apt.conf.d/99defaultrelease
APT::Default-Release "stable";
root@osmo-gsm-tester-rnd:/etc/apt/preferences.d# apt-get -t testing install uhd-host
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libpython3.5-dev python3.5 python3.5-dev python3.5-minimal
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  gcc-8-base libapt-inst2.0 libapt-pkg5.0 libboost-atomic1.62-dev libboost-atomic1.62.0 libboost-chrono1.62-dev libboost-chrono1.62.0
  libboost-context1.62-dev libboost-context1.62.0 libboost-coroutine1.62-dev libboost-coroutine1.62.0 libboost-date-time1.62-dev libboost-date-time1.62.0
  libboost-exception1.62-dev libboost-fiber1.62-dev libboost-fiber1.62.0 libboost-filesystem1.62-dev libboost-filesystem1.62.0
  libboost-graph-parallel1.62-dev libboost-graph-parallel1.62.0 libboost-graph1.62-dev libboost-graph1.62.0 libboost-iostreams1.62-dev
  libboost-iostreams1.62.0 libboost-locale1.62-dev libboost-locale1.62.0 libboost-log1.62-dev libboost-log1.62.0 libboost-math1.62-dev libboost-math1.62.0
  libboost-mpi-python1.62-dev libboost-mpi-python1.62.0 libboost-mpi1.62-dev libboost-mpi1.62.0 libboost-program-options1.62-dev
  libboost-program-options1.62.0 libboost-python1.62-dev libboost-python1.62.0 libboost-random1.62-dev libboost-random1.62.0 libboost-regex1.62-dev
  libboost-regex1.62.0 libboost-serialization1.62-dev libboost-serialization1.62.0 libboost-signals1.62-dev libboost-signals1.62.0 libboost-system1.62-dev
  libboost-system1.62.0 libboost-test1.62-dev libboost-test1.62.0 libboost-thread1.62-dev libboost-thread1.62.0 libboost-timer1.62-dev libboost-timer1.62.0
  libboost-type-erasure1.62-dev libboost-type-erasure1.62.0 libboost-wave1.62-dev libboost-wave1.62.0 libboost1.62-dev libc-bin libc-dev-bin libc-l10n libc6
  libc6-dbg libc6-dev libevent-2.1-6 libevent-core-2.1-6 libevent-pthreads-2.1-6 libfabric1 libgfortran5 libhwloc-dev libhwloc5 libicu-le-hb0 libicu60
  libncurses6 libncursesw6 libopenmpi3 libpmix2 libpsm2-2 libpython3-dev libpython3-stdlib libpython3.6 libpython3.6-dev libpython3.6-minimal
  libpython3.6-stdlib libssl-dev libssl1.1 libstdc++6 libtinfo6 libuhd3.12.0 libzstd1 locales python3 python3-apt python3-cffi-backend python3-crypto
  python3-dbus python3-dev python3-distutils python3-gi python3-lib2to3 python3-markupsafe python3-minimal python3-yaml python3.6 python3.6-dev
  python3.6-minimal
Suggested packages:
  libboost1.62-doc gccxml libmpfrc++-dev libntl-dev glibc-doc libhwloc-contrib-plugins gnuradio python3-doc python3-tk python3-venv python3-apt-dbg
  python-apt-doc python-crypto-doc python-dbus-doc python3-dbus-dbg python3.6-venv python3.6-doc
Recommended packages:
  apt
The following NEW packages will be installed:
  gcc-8-base libevent-2.1-6 libevent-core-2.1-6 libevent-pthreads-2.1-6 libgfortran5 libicu-le-hb0 libicu60 libncurses6 libncursesw6 libopenmpi3 libpmix2
  libpsm2-2 libpython3.6 libpython3.6-dev libpython3.6-minimal libpython3.6-stdlib libtinfo6 libuhd3.12.0 libzstd1 python3-distutils python3-lib2to3
  python3.6 python3.6-dev python3.6-minimal
The following packages will be upgraded:
  libapt-inst2.0 libapt-pkg5.0 libboost-atomic1.62-dev libboost-atomic1.62.0 libboost-chrono1.62-dev libboost-chrono1.62.0 libboost-context1.62-dev
  libboost-context1.62.0 libboost-coroutine1.62-dev libboost-coroutine1.62.0 libboost-date-time1.62-dev libboost-date-time1.62.0 libboost-exception1.62-dev
  libboost-fiber1.62-dev libboost-fiber1.62.0 libboost-filesystem1.62-dev libboost-filesystem1.62.0 libboost-graph-parallel1.62-dev
  libboost-graph-parallel1.62.0 libboost-graph1.62-dev libboost-graph1.62.0 libboost-iostreams1.62-dev libboost-iostreams1.62.0 libboost-locale1.62-dev
  libboost-locale1.62.0 libboost-log1.62-dev libboost-log1.62.0 libboost-math1.62-dev libboost-math1.62.0 libboost-mpi-python1.62-dev
  libboost-mpi-python1.62.0 libboost-mpi1.62-dev libboost-mpi1.62.0 libboost-program-options1.62-dev libboost-program-options1.62.0 libboost-python1.62-dev
  libboost-python1.62.0 libboost-random1.62-dev libboost-random1.62.0 libboost-regex1.62-dev libboost-regex1.62.0 libboost-serialization1.62-dev
  libboost-serialization1.62.0 libboost-signals1.62-dev libboost-signals1.62.0 libboost-system1.62-dev libboost-system1.62.0 libboost-test1.62-dev
  libboost-test1.62.0 libboost-thread1.62-dev libboost-thread1.62.0 libboost-timer1.62-dev libboost-timer1.62.0 libboost-type-erasure1.62-dev
  libboost-type-erasure1.62.0 libboost-wave1.62-dev libboost-wave1.62.0 libboost1.62-dev libc-bin libc-dev-bin libc-l10n libc6 libc6-dbg libc6-dev libfabric1
  libhwloc-dev libhwloc5 libpython3-dev libpython3-stdlib libssl-dev libssl1.1 libstdc++6 locales python3 python3-apt python3-cffi-backend python3-crypto
  python3-dbus python3-dev python3-gi python3-markupsafe python3-minimal python3-yaml uhd-host

But anyway, hopefully patches to osmo-bts-trx should be enough to get TTCN3 tests passing when this kind of issue occurs.

Actions #5

Updated by tnt almost 5 years ago

This looks very very much like the improper reset bug of UHD.

Use newer UHD or a fpga image that has the workaround. Nothing that can really be done in osmo-trx.

Actions #6

Updated by pespin over 4 years ago

  • Status changed from Feedback to Closed

I haven't seen this bug for a while, since we started using newer debian and UHD (3.14). Closing the ticket.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)