Project

General

Profile

Actions

Feature #5751

open

io_uring support in libosmocore

Added by laforge 19 days ago. Updated 6 days ago.

Status:
In Progress
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
11/09/2022
Due date:
% Done:

0%

Spec Reference:
Tags:

Description

Traditionally our I/O abstraction in libosmocore has been select(). In libosmocore 1.5.0 (2020) we migrated over to poll() to support more than 1024 FDs and to avoid the extreme amount of fd-set memcpy()ing involved in the venerable select interface.

Now of course both select and poll are ancient unix interfaces for non-blocking I/O, and both come at a high cost for systems under high load.

Specifically, we are getting reports from osmo-bsc users that indicate a busy BSC with 100 BTS ( 400 TRX)_is spending about 40% of its CPU cycles in the (kernel side) sock_poll, tcp_poll, do_sys_poll.

There are other interfaces such as linux aio, posix aio and epoll, but the brightest and shiniest new I/O interface on Linux is io_uring. Contrary to any of its predecessors, io_uring can, in the "worst" case, operate without any system calls at all anymore. io_uring recognizes that each syscall is associated with a rather high context switch cost.

io_uring consists of memory-mapped (between kernel and userspace process) queues for requests and completions, as well as lockless primitives to enqueue/dequeue from these.

The requests in the queue are requests like read N bytes from this file descriptor or write N bytes to that file descriptor. But io_uring can do much more (many other syscalls), though the read/write is the most relevant part to us.

we already have two io_uring users in the osmocom universe: the GTP and the UDP/RTP load generators I wrote some time ago. They manage their file descriptors internally.

This ticket is now about introducing io_uring support into libosmocore itself, in a way to enable all osmocom programs to use that shared infrastructure.

Conceptual differences

reading from a socket

Conceptually, the existing code typically works like this:

  1. register some socket file descriptor for read
  2. libosmocore includes it in the poll-set
  3. libosmocore calls poll()
  4. kernel returns from poll, indicating fd is readable
  5. libosmocore dispatches to the application call-back
  6. application allocates msgb, reads data from socket
  7. application processes data in msgb

With io_uring, this model needs to change to something like this:

  1. application tells us it wants to read from a socket
  2. libosmocore or application pre-allocate the msgb
  3. libosmocore uses liburing to add a read request to the io_uring submission queue
  4. kernel signals us at some point a completion event via io_uring / liburing
  5. libosmocore dispatches pre-filled msgb to application call-back
  6. application processes data n msgb

So as we can see, the responsibility for the actual reading transfers from application (or intermediate library like libosmo-netif / libosmo-sigtran) into library.

writing to a socket

Conceptually, the existing code typically works like this:

  1. register some socket file descriptor for read
  2. libosmocore includes it in the poll-set
  3. libosmocore calls poll()
  4. kernel returns from poll, indicating fd is writeable
  5. libosmocore dispatches to the application call-back
  6. application writes data to msgb and free's msgb.

With io_uring, this model needs to change to something like this:

  1. application tells us it wants to write to a socket, including the msgb
  2. libosmocore uses liburing to add a write request to the io_uring submission queue
  3. kernel signals us at some point a completion event via io_uring / liburing
  4. libosmocore releases the msgb with msgb_free()

Again, the actual reading/writing passes into the library, and outside the scope of the application (or intermediate library like libosmo-netif / libosmo-sigtran)


Related issues

Related to libosmo-sccp + libosmo-sigtran - Feature #5752: io_uring support in libosmo-sigtranNew11/09/2022

Actions
Related to libosmo-netif - Feature #5753: io_uring support in libosmo-netifNew11/09/2022

Actions
Related to OsmoMGW - Feature #5754: io_uring support in libosmo-mgcp-clientNew11/09/2022

Actions
Related to OsmoBSC - Feature #5755: io_uring support in osmo-bscNew11/09/2022

Actions
Related to libosmo-abis - Bug #5756: io_uring support in libosmo-abisNew11/09/2022

Actions
Related to libosmo-abis - Feature #5766: use Linux kernel KCM for IPA header?New11/13/2022

Actions
Actions #1

Updated by laforge 19 days ago

I'd like the idea of splitting tihs into two separate sub-tasks:

  1. introduce the conceptual API changes of having the actual read/write done inside libosmocore; then start to port applications over to that new API
  2. subsequently (and fully optionally) introduce an io_uring backend to libosmocore so it can benefit from the related performance improvements.

By splitting this is up into two parts, we can more easily pinpoint any related problems, as we can test one part without the other.

Furthermore, on any older systems that don't have kernels with io_uring support, we can simply not use it, as the second step is independent of the first step. The applications simply always use the same API, whether or not libosmocore uses io_uring becomes an implementation detail unknown to the applications.

Actions #2

Updated by laforge 19 days ago

  • Related to Feature #5752: io_uring support in libosmo-sigtran added
Actions #3

Updated by laforge 19 days ago

  • Related to Feature #5753: io_uring support in libosmo-netif added
Actions #4

Updated by laforge 19 days ago

  • Tags set to io_uring
Actions #5

Updated by laforge 19 days ago

  • Related to Feature #5754: io_uring support in libosmo-mgcp-client added
Actions #6

Updated by laforge 19 days ago

Actions #7

Updated by laforge 19 days ago

  • Related to Bug #5756: io_uring support in libosmo-abis added
Actions #8

Updated by laforge 19 days ago

for some existing example how to use io_uring in the osmocom context, check out rtp-load-gen at https://gitea.osmocom.org/cellular-infrastructure/osmo-mgw/src/branch/laforge/rtp-load-gen/contrib/rtp-load-gen and grep for io_uring_ showing the various API calls. There's also https://gitea.osmocom.org/cellular-infrastructure/gtp-load-gen

  • io_uring_get_sqe returns an unused submission queue entry
  • io_uring_prep_write and io_uring_prep_write fills that submission queue entry with a fd, pointer to data + length
  • io_uring_submit submits whatever prepared submission queue entries
also see:

The libosmocore integration with the existing select/poll would likely be done via an eventfd. So applications will continue to use osmo_select_main() etc. and can use any number of their file descriptors as they did so far. But libosmocore will internally register an eventfd with the existing select/poll API, so that any time io_uring wants to notify us about completions, it marks that eventfd as readable, triggering our select/poll loop to handle those completion events. So why is this faster? Because there will be one such eventfd-poll-trigger for a virtually unlimited number of io_uring completion events, as opposed to one poll+read/write syscall for each of them.

Actions #9

Updated by Hoernchen 19 days ago

Please keep in mind that IORING_REGISTER_IOWQ_AFF is a fairly recent feature, so unless that exists "automatically" turning on uring support, if available, leads to a bunch of theads ( as for the number and other details: https://blog.cloudflare.com/missing-manuals-io_uring-worker-pool/ is worth a read) that just end up somewhere, without easy ways to move those to a specific cpu.

Actions #10

Updated by laforge 19 days ago

On Wed, Nov 09, 2022 at 01:58:54PM +0000, Hoernchen wrote:

Please keep in mind that IORING_REGISTER_IOWQ_AFF is a fairly recent feature, so unless that exists "automatically" turning on uring support, if available, leads to a bunch of theads ( as for the number and other details: https://blog.cloudflare.com/missing-manuals-io_uring-worker-pool/ is worth a read) that just end up somewhere, without easy ways to move those to a specific cpu.

AFAICT there are no kernel threads created for socket read/write, as sockets support non-blocking operation.

99.9% of all I/O we are doing is on sockets (UDP, TCP, SCTP, Unix) for talking to other network elements or
the user via VTY/CTRL. There is a bit of file I/O when reading config files (not worth optimzing anyway) and from osmo-hlr / osmo-msc for the respective database, which is accessed in blocking I/O anyway.

Actions #11

Updated by laforge 15 days ago

  • Related to Feature #5766: use Linux kernel KCM for IPA header? added
Actions #12

Updated by laforge 10 days ago

  • Assignee changed from laforge to daniel

Update: I've been playing for a few days with some of the concepts and trying to bring all our requirements in-line toward the first step (new API that can support poll and later io_uring backend).

I've handed this over to daniel now as he has more time available right now and indicated an interest in this topic. We just had a call where I explained my thoughts and the latest results how I think it shuold all be put together.

I'm of course available whenever feedback/questions arise.

Actions #13

Updated by laforge 9 days ago

summary of some of my ideas / thoughts on the new I/O provider so far:

  • modes. The new I/O provider will need to offer the following modes:
    • read/write (e.g. tcp sockets for IPA OML/RSL/GSUP as well as CBSP, VTY, CTRL, ...)
    • recvfrom/sendto (e.g. UDP sockets used for RTP, GTP, MGCP, ...)
      • io_uring doesn't directly support those syscalls. However, it does support recvmsg/sendmsg, which is a superset of recfrom/sendto combined with readv/writev
      • we have to convert recfrom/sendto by API users (applications) to recvmsg/sendmsg
    • sctp_recvfrom/sctp_sendto (SCTP sockets for anything M3UA/SUA/sigtran)
      • this API from libsctp is just a 20-line wrapper around normal recvmsg/sendmsg calls
      • we have to re-implement this wrapper in our io_uring code
  • introduction of a new struct osmo_io_fd which will be used instead of osmo_fd, containing
    • fd
    • const char *name for application to provide a human-readable name of the FD (in case I/O provider wants to log something)
    • parameters for msgb_alloc (headroom, context, size)
    • a built-in write-queue with semantics like osmo_wqueue
    • call-back functions for the user application (read/write completion call-backs)
    • priv/priv_nr for context of application (like osmo_fd)
  • write operation
    • application does something like osmo_io_write(struct osmo_io_fd *, struct msgb *)
    • I/O provider enqueues any write into write queue and marks FD as "wants to write"
    • io_uring backend
      • would check if write is pending completion. If not, submit first entry of write_queue to io_uring
      • at some later point, I/O provider io_uring backend is notified via osmo_fd-wapped-eventfd that io_uring has completed something
      • once I/O provider io_uring backend identifies a write has completed, it will call the io_fd->write_cb(struct osmo_io_fd *fd, int rc, struct msgb *msg) call-back
    • classic poll backend
      • would now check if OSMO_FD_WRITE is active. If not, set it.
      • gets notified that osmo_fd is writable
      • issues normal non-blocking write() syscall
      • call the io_fd->write_cb(struct osmo_io_fd *fd, int rc, struct msgb *msg) call-back
    • the application can now act basd on rc (short write, negative error, dead socket, etc)
    • once call-back returns, I/O provider does msgb_free(msg)
  • read operation
    • application notifies I/O provider that it wants to read from osmo_io_fd
    • io_uring backend
      • allocates a msgb (using parameters provided by application stored in osmo_io_fd
      • submits a read() syscall to io_uring submission queue pointing to msgb memory
      • completion is handled just like the write completion via osmo_fd-wrapped-eventfd
      • io_fd->read_cb(struct osmo_io_fd *fd, int rc, struct msgb *msg) is called
    • classic poll backend
      • enables OSMO_FD_READ on socket
      • gets notified that osmo_fd is readable once data is available
      • allocates a msgb (using parameters provided by application stored in osmo_io_fd
      • issues normal non-blocking read() syscall
      • io_fd->read_cb(struct osmo_io_fd *fd, int rc, struct msgb *msg) is called
Actions #14

Updated by laforge 9 days ago

For the {send,recv}{to,from,msg}() family of calls, we need to extend the above slightly. In addition the raw msgb, we have metadata like the struct sockaddr to send to.

I originally thought we could push this to the front of the msgb headroom, but sockaddr_storage is already 128 bytes plus the struct msghdr struct iovec etc. quickly adds up to something like 200 bytes. Since msgb size (including headroom) is limited to 16bit (historical mistake), I'm not sure if it's the right way.

I then decided to go for a struct serialized_msghdr which we allocate at the time the user issues e.g. a osmo_io_sendto(struct osmo_io_fd *, struct msgb *msg, int flags, const struct sockaddr *dest_addr, socklen_t addrlen) call. The function would then copy the provided parameters into that heap-allocated serialized_msghdr, and enqueue that (instead of the pure msg) into the in-memory transmit queue. Once the actual sendmsg call is performed (async via io_uring or directly via syscall), we dequeue that msghdr and make use of it. On completion we call the user completion call-back and then free the serialized_msghdr as well as the msgb afterwards.

The same approach also works for the recvmsg/recvffrom case, where we can have an application call-back like void (*recvfrom_cb)(struct osmo_io_fd *iofd, int rc, struct msgb *msg, struct sockaddr *src_addr, socklen_t *addrlen);

Equally this approach works for sctp_sendmsg/sctp_recvmsg as those are just wrappers with different function arguments that all get encoded into a struct msghdr.

Actions #15

Updated by laforge 9 days ago

  • Priority changed from Normal to Urgent
Actions #16

Updated by daniel 6 days ago

  • Status changed from New to In Progress
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)