Feature #6387: osmo_io / io_uring support for RTP/RTCP - OsmoMGW - Open Source Mobile Communications

Actions

Copy link

Feature #6387

closed

osmo_io / io_uring support for RTP/RTCP

Added by laforge 4 months ago. Updated 2 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

laforge

Category:

Target version:

Start date:

03/02/2024

Due date:

% Done:

100%

Spec Reference:

Tags:

io_uring

Description

The RTP/RTCP sockets of osmo-mgw should be prime candidates for migration to osmo_io and hence benefit from the optional io_uring backend.

Given the many small recvfrom/sendto syscalls on those sockets, performance should be enhanced in a significant way.

Related issues

Actions

Copy link

Updated by laforge 4 months ago

Tags set to io_uring

Actions

Copy link

Updated by laforge 4 months ago

Related to Feature #5751: io_uring support in libosmocore added

Actions

Copy link

Updated by laforge 3 months ago

Status changed from New to In Progress
Assignee set to laforge
% Done changed from 0 to 80

The patch is in https://gerrit.osmocom.org/c/osmo-mgw/+/36363 - in my local testing it shows no regression in the TTCN3 test suite. Jenkins however does report regressions in the unit tests, I'll investigate.

In a benchmark running 200 concurrent bi-directional voice calls (set up from mncc-python, using rtpsource as RTP generator) with GSM-EFR codec, I am observing:

the code before this patch uses 40..42% of a single core on a Ryzen 5950X at 200 calls (=> 200 endpoints with each two connections)
no increase in CPU utilization before/after this patch, i.e. the osmo_io overhead for the osmo_fd backend is insignificant compared to the direct osmo_fd mode before
an almost exactly 50% reduction of CPU utilization when running the same osmo-mgw build with LIBOSMO_IO_BACKEND=IO_URING - top shows 19..21% for the same workload instead of 40..42% with the OSMO_FD default backend.
an increase of about 4 Megabytes in both RSS and VIRT size when enabling the OSMO_IO backend. This is likely the memory-mapped rings.

Actions

Copy link

Updated by laforge 3 months ago

When doing a strace on the process, we can now see that the only syscalls really are:

poll (including the eventfd of io_uring)
the read of said eventfd
tons of io_uring_enter() syscalls

The latter are the result of us calling io_uring_submit() after every every individual read or write operation we add to the submission queue.

I've done another experiment to remove thsoe io_uring_submit() calls and do them just before we enter poll(). This indeed removed the duplicate io_uring_enter() syscalls, and we now have an even number of poll, read(eventfd) and io_uring_enter calls. The patch is at https://gerrit.osmocom.org/c/libosmocore/+/36364

However, this is not really making a visible difference in terms of CPU utilization reported by top/ps. Maybe 1% but not more than that; so at least at this relatively low overall CPU load of ~20% it doesn't make a difference. This might change when we get closer to 100% CPU and hence more batching might give more benefits.

FYI, In my 200-calls on 200-endpoints with 400-connections load test, I'm seeing the eventfd signalling something like 3..5 completions each time we poll+read it.

Actions

Copy link

Updated by laforge 3 months ago

% Done changed from 80 to 90

finally ported the failing unit test over to the new code; build verification now passes.

Actions

Copy link

Updated by laforge 2 months ago

Status changed from In Progress to Resolved
% Done changed from 90 to 100

patch merged quite some time ago

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Cellular Network Infrastructure » OsmoMGW

Custom queries

Feature #6387

osmo_io / io_uring support for RTP/RTCP

Updated by laforge 4 months ago

Updated by laforge 4 months ago

Updated by laforge 3 months ago

Updated by laforge 3 months ago

Updated by laforge 3 months ago

Updated by laforge 2 months ago