Bug #4995: handle ENOBUFS on write to AF_PACKET socket - libosmocore - Open Source Mobile Communications

Actions

Copy link

Bug #4995

closed

handle ENOBUFS on write to AF_PACKET socket

Added by laforge over 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

laforge

Category:

Target version:

Start date:

01/30/2021

Due date:

% Done:

100%

Spec Reference:

Description

AF_PACKET sockets have the following "incredible" semantics:

even when marked write-able by select/poll, they still many return -1 / ENOBUFS in case the socket buffer and/or tx-queue of the driver is full
there is no way to safely/sanely wait for buffer space to become available again
the only option is to re-try until it finally succeeds, preferably after a "reasonable" amount of sleep, considering the data rate of the underlying transport medium

I'm attaching a reproducer to demonstrate the problem, both with select() and without.

See also SYS#5343

The problem is further increased due to the fact that libosmocore/ns2 doesn't even notice if this happens.

The call chain looks like this:

frame_relay.c:osmo_fr_tx_dlc() is used to transmit NS messages by the NS2 core
frame_relay.c will do whatever handling internaly
gprs_ns2fr.c:fr_tx_cb() is the call-back we register with the frame_relay.c cre
- puts msgb into osmo_wqueue
wqueue code calls gprs_ns2_fr.c:handle_netif_write() as write-callback when the FD is write-able (always!)
- we directlry retunr te result of the write() syscall (which is -1 in case of error)
- osmo_wqueue_bfd_cb() calls that write_cb(), but
  - it expects a -errno type return value, not the return value of a syscall that's likely just -1.
  - only treats "-EAGAIN" as a trigger to re-enqueue the just-dequeued message

So all in all, we are using a write_queue, but it will never really queue anything, as the socket is always writable and we don't realize if the write actually fails.

What makes this even worse: A shared write_queue for user traffic and Q.933 LMI (or even NS-ALIVE) traffic is actively dangerous

if Q.933 starts to fail ,the entire link will be marked dead at the FR level
if NS-ALIVE starts to fail, the NS-VC will be marked as DEAD

Now the question is what to do about it:

as an absolute minimum, we should have a counter and/or error messages if ENOBUFS or any other error happens
users of osmo_wqueue should always return a "-errno" style return value. We should audit all our code, not just this example
there should be some notification of the uppwer layers/application when ENOBUFS happens
have separate queues or some other prioritization that prefers Q.933 LMI traffic and NS-signaling (ALIVE/ALIVE-ACK) over all user traffic (NS-UNITDATA)

Files

packet.c

3.94 KB

reproducer program

laforge, 01/30/2021 10:33 AM

Related issues

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Osmocom Libraries » libosmocore

Custom queries

Bug #4995

handle ENOBUFS on write to AF_PACKET socket

Updated by laforge over 3 years ago

Updated by laforge over 3 years ago

Updated by laforge over 3 years ago

Updated by laforge over 3 years ago

Updated by laforge over 3 years ago

Updated by laforge about 3 years ago

Updated by laforge about 3 years ago