osmo-trx-lms: makes kernel eat all system memory when run under realtime priority
Initially found and described in detail here: https://osmocom.org/issues/3339?#note-15
My system totally freezes for 2-5 seconds before/during the time osmo-trx starts failing reading/writing on OS#3339. That happens about 30 second after starting osmo-trx-lms. My XServer blocks and music playing from a youtube video on the background also either stops or plays in a 1 sec loop. When I recover control of my system, I can see in the logs of osmo-trx the read/write failure from OS#3339.
Through htop one can easily see that upon starting osmo-trx-lms, memory suddenly grows until filling my 16GB, and then is when my system freezes and osmo-trx starts failing, during that time kernel is working heavily to free up memory.
Interestingly, if I strace the osmo-trx-lms I don't see this kind of issue, but it's true too that the CPU consumption drops a lot too. strace only shows heavy use of calls: accept(), poll() and select().
If I ctrl+z (SIGSTOP) the osmo-trx-lms, the kernel stops acquiring memory (and releases most of it). Once I use "fg" to SIGCONTINUE the process, it continues acquiring memory like crazy. Same if I use gdb to do the same kind of operation.
Allocation happens in kernel memory, not process-related memory:
kernel dynamic memory 10.2G 1009.3M 9.2G <-----!!!!!!! OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 26117200 26114592 99% 0.25K 816487 32 6531896K filp <---!!!!!!! 26120640 26118794 99% 0.06K 408135 64 1632540K kmalloc-64 <---!!!!!!!
- Reproducible both on LimeSDR-USB and LimeSDR-mini HW.
- reproducible both on USB2 and USB3.
- Reproducible both on LimeSuite 18.10.* and 19.01.*
- Reproducible both on kernel 4.19.4-arch1-1-ARCH and 5.0.9-arch1-1-ARCH
- Reproducible on 1.0.22-1
- Reproducible both with ASan enabled or disabled.
I found how to reproduce it or avoid reproducing it on my system:
Add "rt-prio 18" on osmo-trx-lms.cfg -> BUG
remove it -> no memleak.
So somehow changing the process to use realtime priority makes the kernel not free stuff on time. looks like it's not really a memleak, since if you pause the process the memory is freed at some point a few secs later. But still looks like the kernel is not freeing memory quick enough to keep up with the allocation pace.
"rt-prio 18" in osmo-trx-lms.cfg basically means osmo-trx-lms is going to call this during startup:
struct sched_param param; memset(¶m, 0, sizeof(param)); param.sched_priority = 18; rc = sched_setscheduler(getpid(), SCHED_RR, ¶m);