Project

General

Profile

Actions

Bug #5415

open

cardem: watchdog triggers firmware reset

Added by lynxis about 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
firmware
Target version:
-
Start date:
01/24/2022
Due date:
% Done:

0%

Spec Reference:

Description

While testing the cardem firmware on a owhw board with a script, the watchdog resets the board from time to time (2-4 times while doing 50 test runs).
When the watchdog triggers, the userspace application also exits because the USB transfer errors with a stall (bulk transfer).

bootloader version: 87f8de15 (based on ea9a91f5c)
app: 87f8de15 (based on ea9a91f5c)
I've pushed this version to lynxis/wip.

The test look like this pseudo c code

for(i=0; i<50; i++) {
  reset_modem();
  for (j=0; j<5; j++) {
    if (get_imsi() == 0)
      break;
  }
}

Actions #1

Updated by lynxis about 2 years ago

  • Description updated (diff)

I'll do another test with an binary of ea9a91f5c from https://downloads.osmocom.org/binaries/simtrace2/firmware/all/

Actions #2

Updated by lynxis about 2 years ago

It also happens with 0.8.1.7-ea9a.

Actions #3

Updated by laforge about 2 years ago

might be worth disabling the watchdog and then using JTAG to determine where exactly the code is stuck at the time the USB side stalls appear. Most likely in some kind of endless loop, whihc should then be easy to find.

If the problem disappears when disablign the watchdog, it means the timeout is too low, or somehow we are not triggering it too long. AFAIR it is set in the "seconds" order of magnitude, so I have a difficult time imagining we'd really not return to the main loop for that long.

Actions #4

Updated by laforge about 2 years ago

I've added a commit in the laforge/202201 branch which should switch watchdogs from resetting to printing a panic with at least the instruction pointer executed while entering the exception:

commit a3c8283ce5268edbe42f9a2c485f25bad5e0575a
Author: Harald Welte <laforge@osmocom.org>
Date:   Wed Jan 26 10:47:52 2022 +0100

    HACK: carem: Don't reset on watchdog, but panic.

    This should help debug watchdog triggers.  Also reduce the timer to 1s
    to hopefully increase the chance of triggering it.

    Change-Id: Ie3f47e5612cdf501abff8cb6954600b785b3a3fa

this is untested as of yet, as I have not been able to trigger this problem.

Actions #5

Updated by lynxis about 2 years ago

even with 913a86b551960fbb2ab2595a3f1c45ab806fb112 the firmware triggers the watchdog reset. I'm doing another test run with the full large/202201 branch.

Tue Feb 15 16:13:47 2022 user.notice serial: ^M-I- 0: send_tpdu_header: 00 c0 00 00 23
Tue Feb 15 16:13:47 2022 user.notice serial: -I- 0: flush_rx_buffer (5)
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.310927] usb 1-1.2: USB disconnect, device number 15
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.312270] option1 ttyUSB0: GSM modem (1-port) converter now disconnected from ttyUSB0
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.312586] option 1-1.2:1.0: device disconnected
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.314107] option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.314391] option 1-1.2:1.1: device disconnected
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.315932] option1 ttyUSB2: GSM modem (1-port) converter now disconnected from ttyUSB2
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.316219] option 1-1.2:1.2: device disconnected
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.317683] option1 ttyUSB3: GSM modem (1-port) converter now disconnected from ttyUSB3
Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.317974] option 1-1.2:1.3: device disconnected
Tue Feb 15 16:13:51 2022 user.notice root: ESC[1;31mDST2ESC[0;m ESC[1;31mFATALESC[0;m user_simtrace2.c:225 USB IN transfer failed, status=4
Tue Feb 15 16:13:51 2022 user.notice serial: ^M-
Tue Feb 15 16:13:51 2022 user.notice serial:
Tue Feb 15 16:13:51 2022 user.notice serial: ^M=============================================================================
Tue Feb 15 16:13:51 2022 user.notice serial: ^MSIMtrace2 firmware 0.8.1.21-913a, BOARD=owhw, APP=cardem
Tue Feb 15 16:13:51 2022 user.notice serial: ^M(C) 2010-2019 by Harald Welte, 2018-2019 by Kevin Redon
Tue Feb 15 16:13:51 2022 user.notice serial: ^M=============================================================================
Tue Feb 15 16:13:51 2022 user.notice serial: ^M-I- Chip ID: 0x28900960 (Ext 0x00000000)
Tue Feb 15 16:13:51 2022 user.notice serial: ^M-I- Serial Nr. 51203120-4e473450-31303231-39303030
Tue Feb 15 16:13:51 2022 user.notice serial: ^M-I- Reset Cause: watchdog reset (watchdog fault occurred)
Actions #6

Updated by laforge about 2 years ago

On Tue, Feb 15, 2022 at 04:48:32PM +0000, lynxis wrote:

even with 913a86b551960fbb2ab2595a3f1c45ab806fb112 the firmware triggers.

fatal: bad object 913a86b551960fbb2ab2595a3f1c45ab806fb112

Actions #7

Updated by lynxis about 2 years ago

right. I forgot the I've rebased/cherry-picked the commit to the current master.

commit 913a86b551960fbb2ab2595a3f1c45ab806fb112
Author: Harald Welte <laforge@osmocom.org>
Date:   Tue Jan 25 23:24:48 2022 +0100

    cardem: set more reasonable interrupt priorities

    the ISO7816 UARTs have highest priority, while console has lowest.
    remaining sources (USB, ADC, GPIO) are in between.

    Change-Id: Ie6c97d61d8da3990b6e44144f36cb6d37d194307

Actions #8

Updated by laforge about 2 years ago

This irq prio patch is not expected to do anything about watchdog failures.

IIRC I stated before that watchdog failures must be debugged by disabling the watchdog reset. It can instead generate an interrupt, where you can print a backtrace and/or put an endless loop and look via jtag/gdb what it was doing.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)