Project

General

Profile

Actions

Bug #3562

closed

osmo-trx-uhd doesn't exit after UHD device disappears

Added by laforge over 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
UHD
Target version:
-
Start date:
09/18/2018
Due date:
% Done:

0%

Spec Reference:

Description

When operating osmo-trx-uhd and then unplugging the USB connector of the USRP B2xx, osmo-trx-uhd prints the following error message:

[ERROR] [UHD] An unexpected exception was caught in a task loop.The task loop will now exit, things may not work.EnvironmentError: IOError: usb rx8 transfer status: LIBUSB_TRANSFER_NO_DEVICE

but continues to run.

I think the expectation is that the process should exit and re-spawn until the re-plugged device is found again? Let's e.g. think of a temporary glitch reading to USB device re-enumeration.

Actions #1

Updated by laforge over 5 years ago

  • Subject changed from osmo-trx-uhd doesn't exist after UHD device disappears to osmo-trx-uhd doesn't exit after UHD device disappears

also, let's double-confirm that osmo-trx-uhd doesnt' recover internally after the device is re-plugged.

Actions #2

Updated by pespin over 5 years ago

  • Assignee set to pespin

Assigning to me since I did related work recently regarding same issue with osmo-trx-lms.

self-reminder: I can try to test this with my LimeSDR using osmo-trx-uhd.

Actions #3

Updated by pespin over 5 years ago

I unplugged the usb cable from my PC while running a network with osmo-trx-uhd and B200. This is what I get, process stops:

Thu Nov  1 18:34:50 2018 DMAIN <0000> Transceiver.cpp:1038 [tid=140280087037696] ClockInterface: sending IND CLOCK 2701351
terminate called after throwing an instance of 'uhd::io_error'
  what():  EnvironmentError: IOError: usb rx6 transfer status: LIBUSB_TRANSFER_ERROR
signal 6 received
talloc report on 'OsmoTRX' (total   3336 bytes in  15 blocks)
    telnet_connection              contains      1 bytes in   1 blocks (ref 0) 0x60b0000afd30
    logging                        contains   2955 bytes in   9 blocks (ref 0) 0x60b000013810
    struct trx_ctx                 contains    380 bytes in   3 blocks (ref 0) 0x6140000006a0
    msgb                           contains      0 bytes in   1 blocks (ref 0) 0x608000004f80
full talloc report on 'OsmoTRX' (total   3336 bytes in  15 blocks)
    telnet_connection              contains      1 bytes in   1 blocks (ref 0) 0x60b0000afd30
    logging                        contains   2955 bytes in   9 blocks (ref 0) 0x60b000013810
        Configure logging
Set the log level for a specified category
Main generic category
Device/Driver specific code
Logging from within LimeSuite itself
Library-internal global log family
LAPD in libosmogsm
A-bis Intput Subsystem
A-bis B-Subchannel TRAU Frame Multiplex
A-bis Input Driver for Signalling
A-bis Input Driver for B-Channels (voice)
Layer3 Short Message Service (SMS)
Control Interface
GPRS GTP library
Statistics messages and logging
Generic Subscriber Update Protocol
Osmocom Authentication Protocol
libosmo-sigtran Signalling System 7
libosmo-sigtran SCCP Implementation
libosmo-sigtran SCCP User Adaptation
libosmo-sigtran MTP3 User Adaptation
libosmo-mgcp Media Gateway Control Protocol
libosmo-netif Jitter Buffer
Deprecated alias for 'no logging level force-all'
 contains    779 bytes in   1 blocks (ref 0) 0x6180000014e0
        logging level (main|dev|lms|lglobal|llapd|linp|lmux|lmi|lmib|lsms|lctrl|lgtp|lstats|lgsup|loap|lss7|lsccp|lsua|lm3ua|lmgcp|ljibuf) everything contains    142 bytes in   1 blocks (ref 0) 0x611000002760
        Configure logging
Set the log level for a specified category
Main generic category
Device/Driver specific code
Logging from within LimeSuite itself
Library-internal global log family
LAPD in libosmogsm
A-bis Intput Subsystem
A-bis B-Subchannel TRAU Frame Multiplex
A-bis Input Driver for Signalling
A-bis Input Driver for B-Channels (voice)
Layer3 Short Message Service (SMS)
Control Interface
GPRS GTP library
Statistics messages and logging
Generic Subscriber Update Protocol
Osmocom Authentication Protocol
libosmo-sigtran Signalling System 7
libosmo-sigtran SCCP Implementation
libosmo-sigtran SCCP User Adaptation
libosmo-sigtran MTP3 User Adaptation
libosmo-mgcp Media Gateway Control Protocol
libosmo-netif Jitter Buffer
Log debug messages and higher levels
Log informational messages and higher levels
Log noticeable messages and higher levels
Log error messages and higher levels
Log only fatal messages
 contains    914 bytes in   1 blocks (ref 0) 0x6190000230e0
        logging level (main|dev|lms|lglobal|llapd|linp|lmux|lmi|lmib|lsms|lctrl|lgtp|lstats|lgsup|loap|lss7|lsccp|lsua|lm3ua|lmgcp|ljibuf) (debug|info|notice|error|fatal) contains    163 bytes in   1 blocks (ref 0) 0x612000000ca0
        struct log_target              contains    212 bytes in   2 blocks (ref 0) 0x612000000820
            struct log_category            contains     44 bytes in   1 blocks (ref 0) 0x60d000000720
        struct log_info                contains    744 bytes in   2 blocks (ref 0) 0x60d000000650
            struct log_info_cat            contains    704 bytes in   1 blocks (ref 0) 0x6180000000e0
    struct trx_ctx                 contains    380 bytes in   3 blocks (ref 0) 0x6140000006a0
        192.168.30.1                   contains     13 bytes in   1 blocks (ref 0) 0x60b0000ad600
        192.168.30.100                 contains     15 bytes in   1 blocks (ref 0) 0x60b0000ad080
    msgb                           contains      0 bytes in   1 blocks (ref 0) 0x608000004f80
./run_out.sh: line 12: 13790 Aborted                 (core dumped) $@

So it seems SIGABRT is called (signal 6) and after printing the report some random strings are printed.

Actions #4

Updated by pespin over 5 years ago

Another similar but not exactly equal (different exception raised) while unplugging the usb from the B200 side quickly. Again, process stops (aborts):

terminate called after throwing an instance of 'uhd::io_error'
  what():  EnvironmentError: IOError: usb rx6 transfer status: LIBUSB_TRANSFER_NO_DEVICE
[ERROR] [UHDsignal 6 received
] An unexpected exception was caught in a task loop.The task loop will now exit, things may not work.EnvironmentError: IOError: usb rx8 transfer status: LIBUSB_TRANSFER_NO_DEVICE
talloc report on 'OsmoTRX' (total   3336 bytes in  15 blocks)
    telnet_connection              contains      1 bytes in   1 blocks (ref 0) 0x60b0000afd30
    logging                        contains   2955 bytes in   9 blocks (ref 0) 0x60b000013810
    struct trx_ctx                 contains    380 bytes in   3 blocks (ref 0) 0x6140000006a0
    msgb                           contains      0 bytes in   1 blocks (ref 0) 0x608000004f80
full talloc report on 'OsmoTRX' (total   3336 bytes in  15 blocks)
    telnet_connection              contains      1 bytes in   1 blocks (ref 0) 0x60b0000afd30
    logging                        contains   2955 bytes in   9 blocks (ref 0) 0x60b000013810
        Configure logging
Set the log level for a specified category
Main generic category
Device/Driver specific code
Logging from within LimeSuite itself
Library-internal global log family
LAPD in libosmogsm
A-bis Intput Subsystem
A-bis B-Subchannel TRAU Frame Multiplex
A-bis Input Driver for Signalling
A-bis Input Driver for B-Channels (voice)
Layer3 Short Message Service (SMS)
Control Interface
GPRS GTP library
Statistics messages and logging
Generic Subscriber Update Protocol
Osmocom Authentication Protocol
libosmo-sigtran Signalling System 7
libosmo-sigtran SCCP Implementation
libosmo-sigtran SCCP User Adaptation
libosmo-sigtran MTP3 User Adaptation
libosmo-mgcp Media Gateway Control Protocol
libosmo-netif Jitter Buffer
Deprecated alias for 'no logging level force-all'
 contains    779 bytes in   1 blocks (ref 0) 0x6180000014e0
        logging level (main|dev|lms|lglobal|llapd|linp|lmux|lmi|lmib|lsms|lctrl|lgtp|lstats|lgsup|loap|lss7|lsccp|lsua|lm3ua|lmgcp|ljibuf) everything contains    142 bytes in   1 blocks (ref 0) 0x611000002760
        Configure logging
Set the log level for a specified category
Main generic category
Device/Driver specific code
Logging from within LimeSuite itself
Library-internal global log family
LAPD in libosmogsm
A-bis Intput Subsystem
A-bis B-Subchannel TRAU Frame Multiplex
A-bis Input Driver for Signalling
A-bis Input Driver for B-Channels (voice)
Layer3 Short Message Service (SMS)
Control Interface
GPRS GTP library
Statistics messages and logging
Generic Subscriber Update Protocol
Osmocom Authentication Protocol
libosmo-sigtran Signalling System 7
libosmo-sigtran SCCP Implementation
libosmo-sigtran SCCP User Adaptation
libosmo-sigtran MTP3 User Adaptation
libosmo-mgcp Media Gateway Control Protocol
libosmo-netif Jitter Buffer
Log debug messages and higher levels
Log informational messages and higher levels
Log noticeable messages and higher levels
Log error messages and higher levels
Log only fatal messages
 contains    914 bytes in   1 blocks (ref 0) 0x6190000230e0
        logging level (main|dev|lms|lglobal|llapd|linp|lmux|lmi|lmib|lsms|lctrl|lgtp|lstats|lgsup|loap|lss7|lsccp|lsua|lm3ua|lmgcp|ljibuf) (debug|info|notice|error|fatal) contains    163 bytes in   1 blocks (ref 0) 0x612000000ca0
        struct log_target              contains    212 bytes in   2 blocks (ref 0) 0x612000000820
            struct log_category            contains     44 bytes in   1 blocks (ref 0) 0x60d000000720
        struct log_info                contains    744 bytes in   2 blocks (ref 0) 0x60d000000650
            struct log_info_cat            contains    704 bytes in   1 blocks (ref 0) 0x6180000000e0
    struct trx_ctx                 contains    380 bytes in   3 blocks (ref 0) 0x6140000006a0
        192.168.30.1                   contains     13 bytes in   1 blocks (ref 0) 0x60b0000ad600
        192.168.30.100                 contains     15 bytes in   1 blocks (ref 0) 0x60b0000ad080
    msgb                           contains      0 bytes in   1 blocks (ref 0) 0x608000004f80
./run_out.sh: line 12: 14376 Aborted                 (core dumped) $@

Actions #5

Updated by pespin almost 5 years ago

  • Status changed from New to Resolved

These should all be fixed after recent patches to fix exit sequence race conditions in osmo-trx:

commit 21032b75c00710ab30a0a74a4006608a58295d99
Author: Pau Espin Pedrol <pespin@sysmocom.de>
Date:   Fri Mar 29 19:20:06 2019 +0100

    osmo-trx: Use signalfd to serialize signals in main thread ctx

    This should avoid prolematic scenarios where different signal handlers
    are running on different thread in parallel. Furthermore, we make sure
    those signals are always run by main loop thread.

    Change-Id: I9b9d9793be9af11dbe433e0ce09b7ac57a3bdfb5

commit d01c7b98b63dfda1e72e289eccd7a384c658069f
Author: Pau Espin Pedrol <pespin@sysmocom.de>
Date:   Fri Mar 29 18:36:30 2019 +0100

    osmo-trx: Avoid handling signals after shutdown triggered

    Recently a blocked osmo-trx process was found after ending SIGTERM to
    it.
    Apparently one thread was handling SIGTERM and calling fprintf()
    (grabbing libc lock) while another thread was handling another signal
    and also grabbing similar lock. Both thread looked deadlocked there.
    Probably this change doesn't fix the block on its own, but at least
    simplifies scenarios inside signal ctx which can go wrong.

    Change-Id: If91621913b8b03d8a0f4c863be0b0d479f97e8a1

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)