Bug #5415
opencardem: watchdog triggers firmware reset
0%
Description
While testing the cardem firmware on a owhw board with a script, the watchdog resets the board from time to time (2-4 times while doing 50 test runs).
When the watchdog triggers, the userspace application also exits because the USB transfer errors with a stall (bulk transfer).
bootloader version: 87f8de15 (based on ea9a91f5c)
app: 87f8de15 (based on ea9a91f5c)
I've pushed this version to lynxis/wip.
The test look like this pseudo c code
for(i=0; i<50; i++) { reset_modem(); for (j=0; j<5; j++) { if (get_imsi() == 0) break; } }
Updated by lynxis over 2 years ago
- Description updated (diff)
I'll do another test with an binary of ea9a91f5c from https://downloads.osmocom.org/binaries/simtrace2/firmware/all/
Updated by laforge over 2 years ago
might be worth disabling the watchdog and then using JTAG to determine where exactly the code is stuck at the time the USB side stalls appear. Most likely in some kind of endless loop, whihc should then be easy to find.
If the problem disappears when disablign the watchdog, it means the timeout is too low, or somehow we are not triggering it too long. AFAIR it is set in the "seconds" order of magnitude, so I have a difficult time imagining we'd really not return to the main loop for that long.
Updated by laforge over 2 years ago
I've added a commit in the laforge/202201
branch which should switch watchdogs from resetting to printing a panic with at least the instruction pointer executed while entering the exception:
commit a3c8283ce5268edbe42f9a2c485f25bad5e0575a Author: Harald Welte <laforge@osmocom.org> Date: Wed Jan 26 10:47:52 2022 +0100 HACK: carem: Don't reset on watchdog, but panic. This should help debug watchdog triggers. Also reduce the timer to 1s to hopefully increase the chance of triggering it. Change-Id: Ie3f47e5612cdf501abff8cb6954600b785b3a3fa
this is untested as of yet, as I have not been able to trigger this problem.
Updated by lynxis about 2 years ago
even with 913a86b551960fbb2ab2595a3f1c45ab806fb112 the firmware triggers the watchdog reset. I'm doing another test run with the full large/202201 branch.
Tue Feb 15 16:13:47 2022 user.notice serial: ^M-I- 0: send_tpdu_header: 00 c0 00 00 23 Tue Feb 15 16:13:47 2022 user.notice serial: -I- 0: flush_rx_buffer (5) Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.310927] usb 1-1.2: USB disconnect, device number 15 Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.312270] option1 ttyUSB0: GSM modem (1-port) converter now disconnected from ttyUSB0 Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.312586] option 1-1.2:1.0: device disconnected Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.314107] option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1 Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.314391] option 1-1.2:1.1: device disconnected Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.315932] option1 ttyUSB2: GSM modem (1-port) converter now disconnected from ttyUSB2 Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.316219] option 1-1.2:1.2: device disconnected Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.317683] option1 ttyUSB3: GSM modem (1-port) converter now disconnected from ttyUSB3 Tue Feb 15 16:13:47 2022 kern.info kernel: [402871.317974] option 1-1.2:1.3: device disconnected Tue Feb 15 16:13:51 2022 user.notice root: ESC[1;31mDST2ESC[0;m ESC[1;31mFATALESC[0;m user_simtrace2.c:225 USB IN transfer failed, status=4 Tue Feb 15 16:13:51 2022 user.notice serial: ^M- Tue Feb 15 16:13:51 2022 user.notice serial: Tue Feb 15 16:13:51 2022 user.notice serial: ^M============================================================================= Tue Feb 15 16:13:51 2022 user.notice serial: ^MSIMtrace2 firmware 0.8.1.21-913a, BOARD=owhw, APP=cardem Tue Feb 15 16:13:51 2022 user.notice serial: ^M(C) 2010-2019 by Harald Welte, 2018-2019 by Kevin Redon Tue Feb 15 16:13:51 2022 user.notice serial: ^M============================================================================= Tue Feb 15 16:13:51 2022 user.notice serial: ^M-I- Chip ID: 0x28900960 (Ext 0x00000000) Tue Feb 15 16:13:51 2022 user.notice serial: ^M-I- Serial Nr. 51203120-4e473450-31303231-39303030 Tue Feb 15 16:13:51 2022 user.notice serial: ^M-I- Reset Cause: watchdog reset (watchdog fault occurred)
Updated by laforge about 2 years ago
On Tue, Feb 15, 2022 at 04:48:32PM +0000, lynxis wrote:
even with 913a86b551960fbb2ab2595a3f1c45ab806fb112 the firmware triggers.
fatal: bad object 913a86b551960fbb2ab2595a3f1c45ab806fb112
Updated by lynxis about 2 years ago
right. I forgot the I've rebased/cherry-picked the commit to the current master.
commit 913a86b551960fbb2ab2595a3f1c45ab806fb112 Author: Harald Welte <laforge@osmocom.org> Date: Tue Jan 25 23:24:48 2022 +0100 cardem: set more reasonable interrupt priorities the ISO7816 UARTs have highest priority, while console has lowest. remaining sources (USB, ADC, GPIO) are in between. Change-Id: Ie6c97d61d8da3990b6e44144f36cb6d37d194307
Updated by laforge about 2 years ago
This irq prio patch is not expected to do anything about watchdog failures.
IIRC I stated before that watchdog failures must be debugged by disabling the watchdog reset. It can instead generate an interrupt, where you can print a backtrace and/or put an endless loop and look via jtag/gdb what it was doing.