Project

General

Profile

Bug #4251

simtrace2 firmware can get OOM / talloc unable to allocate buffer for APDU

Added by laforge 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
11/07/2019
Due date:
% Done:

100%


Description

In a scenario where the modem/phone is already sending APDUs to the simtrace2 firmware before any libusb-client software on the USB host is running, the firmware is allocating buffers for those APDUs and putting them in the usb endpoint buffer queue of the IN endpoint.

At some point, the firmware is out of memory as all memory is allocated in buffers on the queue.

As there's no explicit notification if somebody is currently submitting IN URBs on the endpoint on the host (i.e. an application handling the device) or not, we have to resort to indirect means of determining this situation.

My idea is to store the systick timer at the time of enqueue inside the buffer descriptor, and then have some periodic timer that verifies every e.g. 10ms if any entries in the queue have been sitting there for more than 10ms. If so, release them.

This checking for expired / too old buffers could also happen at other points in time, such as
  • when we enqueue a new entry into the queue (and hold the lock anyway)
  • when we want to allocate a buffer but are OOM
  • ...

I wonder how other USB device firmware projects are handling this general problem.


Related issues

Related to SIMtrace 2 - Bug #4329: IN endpoint gets stuck during USB suspendResolved12/15/2019

History

#1 Updated by laforge about 2 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 90

Unfortuantely this issue has not been updated with the progress. A patch was developed and submitted as https://gerrit.osmocom.org/#/c/simtrace2/+/16097/ to gerrit. The gerrit issue also unfortunately fails to state Closes: OS#4251 in ist commit log message.

However, we have had user reports of still encountering OOM errors, so I'm keeping this issue open meanwhile.

#2 Updated by laforge about 2 months ago

  • % Done changed from 90 to 60

In the related logs of the user, we get many instances of

-E- ep 2: usb_buf_alloc_st EOMEM (queue already empty)

this is odd, as it raises the question where those buffers are used, if
they are not in the transmit queue.

as well as many more of

-E- _talloc_zero() out of memory!

So I guess some deeper debugging is required in terms of what's happening here.

#3 Updated by laforge about 2 months ago

list of heap allocations in st2 firmware as of 0.6.1

  • only use appears to be libosmocore msgb code
  • only caller of msgb_alloc() is usb_buf_alloc()
  • usb_buf_alloc() callers
    • libcommon/source/card_emu.c
    • libcommon/source/host_communication.c
      • usb_refill_from_host() [for OUT EP]
    • libcommon/source/mode_cardemu.c
      • flush_pts()
      • add_tpdu_byte()
      • send_tpdu_header()
      • card_emu_report_status()
    • libcommon/source/sniffer.c
      • usb_send_data()
      • usb_send_fidi()
      • usb_send_change()

So all in all, it seems extremely simple, as there are not many allocation/free paths. Especially if one only looks at cardem.

#4 Updated by laforge about 1 month ago

  • Assignee changed from tsaitgaist to laforge

I've just observed this bug for the first time on my desk. Still not sure how to reproduce it, but will investigate further.

It seems related to the firmware being out of normal operational mode (i.e. no bidirectional sim card emulation with the host program running).

#5 Updated by laforge about 1 month ago

  • % Done changed from 60 to 70
I think I found the problem here:
  • buffers are allocated both for receiving data from the host (OUT endpoint) and for data to the host (IN endpoint)
  • the new buffer recycle logic is only executed when allocating buffers in IN direction

This means that all memory can be eaten up by pending IN transfers, with the firmware unable to allocate any buffer for receiving data from the host on the OUT endpoint. My original suggested solution of using timer/age information and releasing buffers queued for too much time would have properly solved the problem and prevented this new problem of IN buffers starving OUT buffers.

#7 Updated by laforge about 1 month ago

  • Related to Bug #4329: IN endpoint gets stuck during USB suspend added

#8 Updated by laforge about 1 month ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100

patches merged. I'd be very surprised if there were any memleaks left after detailed audit + testing.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)