Project

General

Profile

Actions

Bug #4136

closed

osmo-trx-lms: makes kernel eat all system memory when run under realtime priority

Added by pespin over 4 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
LimeSDR
Target version:
-
Start date:
07/30/2019
Due date:
% Done:

100%

Spec Reference:

Description

Originally https://osmocom.org/issues/3971, I deleted it by error.

Initially found and described in detail here: https://osmocom.org/issues/3339?#note-15

Related LimeSuite github issue: https://github.com/myriadrf/LimeSuite/issues/263

My system totally freezes for 2-5 seconds before/during the time osmo-trx starts failing reading/writing on OS#3339. That happens about 30 second after starting osmo-trx-lms. My XServer blocks and music playing from a youtube video on the background also either stops or plays in a 1 sec loop. When I recover control of my system, I can see in the logs of osmo-trx the read/write failure from OS#3339.

Through htop one can easily see that upon starting osmo-trx-lms, memory suddenly grows until filling my 16GB, and then is when my system freezes and osmo-trx starts failing, during that time kernel is working heavily to free up memory.

Interestingly, if I strace the osmo-trx-lms I don't see this kind of issue, but it's true too that the CPU consumption drops a lot too. strace only shows heavy use of calls: accept(), poll() and select().

If I ctrl+z (SIGSTOP) the osmo-trx-lms, the kernel stops acquiring memory (and releases most of it). Once I use "fg" to SIGCONTINUE the process, it continues acquiring memory like crazy. Same if I use gdb to do the same kind of operation.

Allocation happens in kernel memory, not process-related memory:

kernel dynamic memory         10.2G    1009.3M       9.2G  <-----!!!!!!!

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
26117200 26114592  99%    0.25K 816487       32   6531896K filp            <---!!!!!!!
26120640 26118794  99%    0.06K 408135       64   1632540K kmalloc-64      <---!!!!!!!

  • Reproducible both on LimeSDR-USB and LimeSDR-mini HW.
  • reproducible both on USB2 and USB3.
  • Reproducible both on LimeSuite 18.10.* and 19.01.*
  • Reproducible both on kernel 4.19.4-arch1-1-ARCH and 5.0.9-arch1-1-ARCH
  • Reproducible on 1.0.22-1
  • Reproducible both with ASan enabled or disabled.

---------------
I found how to reproduce it or avoid reproducing it on my system:
Add "rt-prio 18" on osmo-trx-lms.cfg -> BUG
remove it -> no memleak.

So somehow changing the process to use realtime priority makes the kernel not free stuff on time. looks like it's not really a memleak, since if you pause the process the memory is freed at some point a few secs later. But still looks like the kernel is not freeing memory quick enough to keep up with the allocation pace.

"rt-prio 18" in osmo-trx-lms.cfg basically means osmo-trx-lms is going to call this during startup:

    struct sched_param param;
    memset(&param, 0, sizeof(param));
    param.sched_priority = 18;
    rc = sched_setscheduler(getpid(), SCHED_RR, &param);

----------------------------

Issue still happening on my Thinkpad x280 laptop with:

LimeSite: 19.04.0.r0.abf82140
kernel: 5.2.2-arch1-1-ARCH x86_64
libusb: 1.0.22-1

A comment was added in related github issue from LimeSuite developer:

If it libusb ralated, then it might be "libusb_handle_events_timeout_completed()" that is being run in a loop (the loop is started when opening device). The timeout is set to 250ms in LimeSuite so it blocks for up to 250ms if there is no USB events. I don't know what it does internally while waiting for events.


Attached a perf record of osmo-trx-lms running with rt-prio and seeing memory growing (around 2GB till stopped iirc).

Samples: 106K of event 'cycles', Event count (approx.): 88212958338
Overhead  Command       Shared Object            Symbol
  10.52%  osmo-trx-lms  [kernel.kallsyms]        [k] do_syscall_64                                                                                                                                                                                ◆
   5.61%  osmo-trx-lms  [kernel.kallsyms]        [k] syscall_return_via_sysret                                                                                                                                                                    ▒
   5.48%  osmo-trx-lms  [kernel.kallsyms]        [k] entry_SYSCALL_64                                                                                                                                                                             ▒
   5.20%  osmo-trx-lms  [kernel.kallsyms]        [k] preempt_count_add                                                                                                                                                                            ▒
   4.72%  osmo-trx-lms  [kernel.kallsyms]        [k] _raw_spin_lock                                                                                                                                                                               ▒
   4.04%  osmo-trx-lms  [kernel.kallsyms]        [k] kmem_cache_alloc                                                                                                                                                                             ▒
   2.67%  osmo-trx-lms  [kernel.kallsyms]        [k] __alloc_file                                                                                                                                                                                 ▒
   2.24%  osmo-trx-lms  [kernel.kallsyms]        [k] preempt_count_sub                                                                                                                                                                            ▒
   1.89%  osmo-trx-lms  [kernel.kallsyms]        [k] get_mem_cgroup_from_mm                                                                                                                                                                       ▒
   1.86%  osmo-trx-lms  [kernel.kallsyms]        [k] kmem_cache_free                                                                                                                                                                              ▒
   1.74%  osmo-trx-lms  [kernel.kallsyms]        [k] __call_rcu                                                                                                                                                                                   ▒
   1.60%  osmo-trx-lms  [kernel.kallsyms]        [k] _raw_spin_lock_irq                                                                                                                                                                           ▒
   1.58%  osmo-trx-lms  [kernel.kallsyms]        [k] debug_smp_processor_id                                                                                                                                                                       ▒
   1.50%  osmo-trx-lms  libpthread-2.29.so       [.] __libc_accept                                                                                                                                                                                ▒
   1.45%  osmo-trx-lms  [kernel.kallsyms]        [k] truncate_inode_pages_range                                                                                                                                                                   ▒
   1.41%  osmo-trx-lms  [kernel.kallsyms]        [k] in_lock_functions                                                                                                                                                                            ▒
   1.39%  osmo-trx-lms  [kernel.kallsyms]        [k] memcg_kmem_get_cache                                                                                                                                                                         ▒
   1.34%  osmo-trx-lms  [kernel.kallsyms]        [k] __sys_accept4                                                                                                                                                                                ▒
   1.31%  osmo-trx-lms  [kernel.kallsyms]        [k] __fput                                                                                                                                                                                       ▒
   1.18%  osmo-trx-lms  [kernel.kallsyms]        [k] task_work_run                                                                                                                                                                                ▒
   1.06%  osmo-trx-lms  [kernel.kallsyms]        [k] task_work_add                                                                                                                                                                                ▒
   1.05%  osmo-trx-lms  [kernel.kallsyms]        [k] __fget                                                                                                                                                                                       ▒
   0.98%  osmo-trx-lms  [kernel.kallsyms]        [k] _raw_spin_lock_bh                                                                                                                                                                            ▒
   0.97%  osmo-trx-lms  [kernel.kallsyms]        [k] rcu_segcblist_enqueue                                                                                                                                                                        ▒
   0.90%  osmo-trx-lms  [kernel.kallsyms]        [k] inet_csk_accept                                                                                                                                                                              ▒
   0.88%  osmo-trx-lms  [kernel.kallsyms]        [k] fput_many                                                                                                                                                                                    ▒
   0.87%  osmo-trx-lms  [kernel.kallsyms]        [k] new_slab                                                                                                                                                                                     ▒
   0.81%  osmo-trx-lms  [kernel.kallsyms]        [k] do_softirq.part.0                                                                                                                                                                            ▒
   0.78%  osmo-trx-lms  [kernel.kallsyms]        [k] __fsnotify_parent                                                                                                                                                                            ▒
   0.76%  osmo-trx-lms  [kernel.kallsyms]        [k] exit_to_usermode_loop                                                                                                                                                                        ▒
   0.75%  osmo-trx-lms  [kernel.kallsyms]        [k] __srcu_read_lock                                                                                                                                                                             ▒
   0.72%  osmo-trx-lms  [kernel.kallsyms]        [k] syscall_trace_enter                                                                                                                                                                          ▒
   0.70%  osmo-trx-lms  [kernel.kallsyms]        [k] __rcu_read_unlock                                                                                                                                                                            ▒
   0.70%  osmo-trx-lms  [kernel.kallsyms]        [k] inet_release                                                                                                                                                                                 ▒
   0.62%  osmo-trx-lms  [kernel.kallsyms]        [k] _raw_spin_unlock                                                                                                                                                                             ▒
   0.60%  osmo-trx-lms  [kernel.kallsyms]        [k] __fget_light                                                                                                                                                                                 ▒
   0.59%  osmo-trx-lms  [kernel.kallsyms]        [k] __audit_syscall_exit                                                                                                                                                                         ▒
   0.58%  osmo-trx-lms  [kernel.kallsyms]        [k] memset_erms                                                                                                                                                                                  ▒
   0.56%  osmo-trx-lms  [kernel.kallsyms]        [k] __local_bh_enable_ip                                                                                                                                                                         ▒
   0.56%  osmo-trx-lms  libpthread-2.29.so       [.] __pthread_enable_asynccancel                                                                                                                                                                 ▒
   0.56%  osmo-trx-lms  [kernel.kallsyms]        [k] __srcu_read_unlock                                                                                                                                                                           ▒
   0.55%  osmo-trx-lms  [kernel.kallsyms]        [k] inet_accept                                                                                                                                                                                  ▒
   0.55%  osmo-trx-lms  [kernel.kallsyms]        [k] _raw_spin_trylock                                                                                                                                                                            ▒
   0.55%  osmo-trx-lms  [kernel.kallsyms]        [k] __rcu_read_lock                                                                                                                                                                              ▒
   0.55%  osmo-trx-lms  [kernel.kallsyms]        [k] __alloc_fd                                                                                                                                                                                   ▒
   0.54%  osmo-trx-lms  libpthread-2.29.so       [.] __pthread_disable_asynccancel                                                                                                                                                                ▒
   0.53%  osmo-trx-lms  [kernel.kallsyms]        [k] sock_alloc_inode                                                                                                                                                                             ▒
   0.53%  osmo-trx-lms  [kernel.kallsyms]        [k] kmem_cache_alloc_trace                                                                                                                                                                       ▒
   0.52%  osmo-trx-lms  [kernel.kallsyms]        [k] sock_close                                                                                                                                                                                   ▒
   0.51%  osmo-trx-lms  [kernel.kallsyms]        [k] _atomic_dec_and_lock                                                                                                                                                                         ▒
   0.50%  osmo-trx-lms  [kernel.kallsyms]        [k] fsnotify                                                                                                                                                                                     ▒
   0.49%  osmo-trx-lms  [kernel.kallsyms]        [k] __this_cpu_preempt_check                                                                                                                                                                     ▒
   0.48%  osmo-trx-lms  [kernel.kallsyms]        [k] down_write                                                                                                                                                                                   ▒
   0.48%  osmo-trx-lms  [kernel.kallsyms]        [k] entry_SYSCALL_64_after_hwframe                                                                                                                                                               ▒
   0.48%  osmo-trx-lms  [kernel.kallsyms]        [k] __d_instantiate                                                                                                                                                                              ▒
   0.48%  osmo-trx-lms  [kernel.kallsyms]        [k] up_write                                                                                                                                                                                     ▒
   0.47%  osmo-trx-lms  [kernel.kallsyms]        [k] percpu_counter_add_batch                                                                                                                                                                     ▒
   0.47%  osmo-trx-lms  [kernel.kallsyms]        [k] memcg_kmem_put_cache                                                                                                                                                                         ▒
   0.46%  osmo-trx-lms  [kernel.kallsyms]        [k] __x64_sys_accept                                                                                                                                                                             ▒
   0.45%  osmo-trx-lms  [kernel.kallsyms]        [k] inode_init_always                                                                                                                                                                            ▒
   0.44%  osmo-trx-lms  [kernel.kallsyms]        [k] __x86_indirect_thunk_rax                                                                                                                                                                     ▒
   0.44%  osmo-trx-lms  [kernel.kallsyms]        [k] _raw_spin_unlock_irq                                                                                                                                                                         ▒
   0.44%  osmo-trx-lms  [kernel.kallsyms]        [k] lockref_put_return                                                                                                                                                                           ▒
   0.43%  osmo-trx-lms  [kernel.kallsyms]        [k] truncate_inode_pages_final                                                                                                                                                                   ▒
   0.41%  osmo-trx-lms  [kernel.kallsyms]        [k] get_random_u32                                                                                                                                                                               ▒
   0.40%  osmo-trx-lms  [kernel.kallsyms]        [k] tcp_release_cb                                                                                                                                                                               ▒
   0.39%  osmo-trx-lms  [kernel.kallsyms]        [k] __audit_syscall_entry                                                                                                                                                                        ▒
   0.39%  osmo-trx-lms  [kernel.kallsyms]        [k] map_id_range_down                                                                                                                                                                            ▒
   0.37%  osmo-trx-lms  [kernel.kallsyms]        [k] __sock_release                                                                                                                                                                               ▒
   0.37%  osmo-trx-lms  [kernel.kallsyms]        [k] memcpy_erms                                                                                                                                                                                  ▒
   0.34%  osmo-trx-lms  [kernel.kallsyms]        [k] evict                                                                                                                                                                                        ▒
   0.34%  osmo-trx-lms  [kernel.kallsyms]        [k] __d_alloc                                                                                                                                                                                    ▒
   0.31%  osmo-trx-lms  [kernel.kallsyms]        [k] alloc_file                                                                                                                                                                                   ▒
   0.31%  osmo-trx-lms  [kernel.kallsyms]        [k] __dentry_kill                                                                                                                                                                                ▒
   0.29%  osmo-trx-lms  [kernel.kallsyms]        [k] alloc_file_pseudo                                                                                                                                                                            ▒
   0.27%  osmo-trx-lms  [kernel.kallsyms]        [k] ksoftirqd_running                                                                                                                                                                            ▒
   0.25%  osmo-trx-lms  [kernel.kallsyms]        [k] sock_alloc                                                                                                                                                                                   ▒
   0.24%  osmo-trx-lms  [kernel.kallsyms]        [k] locks_remove_file                                                                                                                                                                            ▒
   0.23%  osmo-trx-lms  [kernel.kallsyms]        [k] dput                                                                                                                                                                                         ▒
   0.23%  osmo-trx-lms  [kernel.kallsyms]        [k] get_next_ino                                                                                                                                                                                 ▒
   0.23%  osmo-trx-lms  [kernel.kallsyms]        [k] put_unused_fd                                                                                                                                                                                ▒
   0.22%  osmo-trx-lms  [kernel.kallsyms]        [k] release_sock                                                                                                                                                                                 ▒
   0.22%  osmo-trx-lms  [kernel.kallsyms]        [k] __kthread_should_park                                                                                                                                                                        ▒
   0.22%  osmo-trx-lms  [kernel.kallsyms]        [k] iput                                                                                                                                                                                         ▒
   0.21%  osmo-trx-lms  [kernel.kallsyms]        [k] dentry_unlink_inode                                                                                                                                                                          ▒
   0.21%  osmo-trx-lms  [kernel.kallsyms]        [k] d_set_d_op                                                                                                                                                                                   ▒
   0.20%  osmo-trx-lms  [kernel.kallsyms]        [k] ____fput                                                                                                                                                                                     ▒
   0.20%  osmo-trx-lms  [kernel.kallsyms]        [k] find_next_zero_bit                                                                                                                                                                           ▒
   0.20%  osmo-trx-lms  [kernel.kallsyms]        [k] get_page_from_freelist                                                                                                                                                                       ▒

Files

perf-kernel.svg View perf-kernel.svg 271 KB pespin, 08/02/2019 01:24 PM
hello.c hello.c 1.24 KB pespin, 08/02/2019 03:13 PM
Actions #1

Updated by pespin over 4 years ago

So recording with stack support enabled, it can easily be seen that all those allocations happen in accept() system call (trace comes from starting osmo-trx-lms for a few seconds without connecting a BTS to it, where memory bug can already be seen).:

-   98.80%     0.00%  osmo-trx-lms  [unknown]                [.] 0000000000000000                                                                                                                                                                                                         ▒
   - 98.79% 0                                                                                                                                                                                                                                                                             ▒
      - 97.29% __libc_accept                                                                                                                                                                                                                                                              ◆
         - 83.65% entry_SYSCALL_64_after_hwframe                                                                                                                                                                                                                                          ▒
            - do_syscall_64                                                                                                                                                                                                                                                               ▒
               - 40.22% __x64_sys_accept                                                                                                                                                                                                                                                  ▒
                  - __sys_accept4                                                                                                                                                                                                                                                         ▒
                     - 16.76% sock_alloc_file                                                                                                                                                                                                                                             ▒
                        - alloc_file_pseudo                                                                                                                                                                                                                                               ▒
                           - 9.45% alloc_file                                                                                                                                                                                                                                             ▒
                              - 9.10% alloc_empty_file                                                                                                                                                                                                                                    ▒
                                 - 8.51% __alloc_file                                                                                                                                                                                                                                     ▒
                                    - 5.68% kmem_cache_alloc                                                                                                                                                                                                                              ▒
                                       - 1.90% __slab_alloc.isra.0                                                                                                                                                                                                                        ▒
                                          - 1.87% ___slab_alloc                                                                                                                                                                                                                           ▒
                                               1.73% new_slab                                                                                                                                                                                                                             ▒
                                       - 1.32% memcg_kmem_get_cache                                                                                                                                                                                                                       ▒
                                            0.76% get_mem_cgroup_from_mm                                                                                                                                                                                                                  ▒
                                         0.52% memset_erms                                                                                                                                                                                                                                ▒
                           - 4.42% d_alloc_pseudo                                                                                                                                                                                                                                         ▒
                              - __d_alloc                                                                                                                                                                                                                                                 ▒
                                 - 3.49% kmem_cache_alloc                                                                                                                                                                                                                                 ▒
                                    - 1.62% memcg_kmem_get_cache                                                                                                                                                                                                                          ▒
                                         0.79% get_mem_cgroup_from_mm                                                                                                                                                                                                                     ▒
                           + 2.21% d_instantiate                                                                                                                                                                                                                                          ▒
                     - 9.15% sock_alloc                                                                                                                                                                                                                                                   ▒
                        - 8.25% new_inode_pseudo                                                                                                                                                                                                                                          ▒
                           - 7.12% alloc_inode                                                                                                                                                                                                                                            ▒
                              - 5.86% sock_alloc_inode                                                                                                                                                                                                                                    ▒
                                 - 3.91% kmem_cache_alloc                                                                                                                                                                                                                                 ▒
                                    - 2.04% memcg_kmem_get_cache                                                                                                                                                                                                                          ▒
                                         1.55% get_mem_cgroup_from_mm                                                                                                                                                                                                                     ▒
                                 + 1.36% kmem_cache_alloc_trace                                                                                                                                                                                                                           ▒
                                1.13% inode_init_always                                                                                                                                                                                                                                   ▒
                             0.91% _raw_spin_lock                                                                                                                                                                                                                                         ▒
                          0.62% get_next_ino                                                                                                                                                                                                                                              ▒
                     - 5.06% inet_accept                                                                                                                                                                                                                                                  ▒
                        - inet_csk_accept                                                                                                                                                                                                                                                 ▒
                           + 1.49% release_sock                                                                                                                                                                                                                                           ▒
                           + 1.22% lock_sock_nested                                                                                                                                                                                                                                       ▒
                             0.78% __local_bh_enable_ip                                                                                                                                                                                                                                   ▒
                     - 2.41% fput_many                                                                                                                                                                                                                                                    ▒
                          task_work_add                                                                                                                                                                                                                                                   ▒
                     + 1.90% sockfd_lookup_light                                                                                                                                                                                                                                          ▒
                     - 1.83% __alloc_fd                                                                                                                                                                                                                                                   ▒
                          0.74% _raw_spin_lock                                                                                                                                                                                                                                            ▒
                     + 0.84% put_unused_fd                                                                                                                                                                                                                                                ▒
               + 29.60% exit_to_usermode_loop                                                                                                                                                                                                                                             ▒
                 1.33% syscall_trace_enter                                                                                                                                                                                                                                                ▒
               + 1.27% syscall_slow_exit_work                                                                                                                                                                                                                                             ▒
           5.82% entry_SYSCALL_64                                                                                                                                                                                                                                                         ▒
           5.03% syscall_return_via_sysret                                                                                                                                                                                                                                                ▒
           0.76% debug_smp_processor_id                                                                                                                                                                                                                                                   ▒
        0.59% __pthread_disable_asynccancel                                                                                                                                                                                                                                               ▒
        0.57% __pthread_enable_asynccancel

I attach a related svg graph too.

Now we need to find where does that accept() call comes from (libusb probably?)

Actions #2

Updated by pespin over 4 years ago

  • Description updated (diff)
Actions #3

Updated by pespin over 4 years ago

  • Description updated (diff)
Actions #4

Updated by pespin over 4 years ago

Using gdb, accept() being called all the time comes from:

Thread 5 "osmo-trx-lms" hit Breakpoint 1, 0x00007ffff7fbdb50 in accept () from /usr/lib/libpthread.so.0
(gdb) bt full
#0  0x00007ffff7fbdb50 in accept () from /usr/lib/libpthread.so.0
No symbol table info available.
#1  0x00007ffff786f8ff in lime::LMS64CProtocol::ProcessConnections() () from /usr/lib/libLimeSuite.so.19.04-1
No symbol table info available.
#2  0x00007ffff76bacd4 in std::execute_native_thread_routine (__p=0x5555556d2fd0) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
        __t = std::unique_ptr<std::thread::_State> = {get() = 0x5555556d2fd0}
#3  0x00007ffff7fb457f in start_thread () from /usr/lib/libpthread.so.0
No symbol table info available.
#4  0x00007ffff73c30e3 in clone () from /usr/lib/libc.so.6
No symbol table info available.

So that's the offeding line src/ConnectionRemote/LMS64CProtocol_remote.cpp:119:

clientFd = accept(socketFd, (struct sockaddr*)&cli_addr, &clilen);

I set a breakpoint for close() (same function, line 185) hoping it would trigger from time to time, but it doesn't?!

Actions #5

Updated by Hoernchen over 4 years ago

There should not be any successful accept calls anyway, since we don't use any remote functionality provided by the limesuite? Unless the port clashes with the CNI or somesthing like that, so somethung connects somewhere and accept succeeds...

Actions #6

Updated by Hoernchen over 4 years ago

There should not be any successful accept calls anyway, since we don't use any remote functionality provided by the limesuite? Unless the port clashes with the CNI or somesthing like that, so somethung connects somewhere and accept succeeds...

const int port = 5000; - uh oh.
Actions #7

Updated by pespin over 4 years ago

clientFd is always -1 (accept always return -1, so it fails). So userspace cannot be leaking the memory, it's the kernel itself during the return path!

Why is LimeSuite hammering accept() all the time thought? it should only do that in case poll()/accept() specified so...
Also to check: is socketFd set as NONBLOCK? Yes it is: fcntl(socketFd, F_SETFL, O_NONBLOCK);

Actions #8

Updated by pespin over 4 years ago

Ok, so removing the fcntl(socketFd, F_SETFL, O_NONBLOCK); line fixes the issue. I see now reason for socketFd to be non-blocking, so it should be removed.

Actions #9

Updated by pespin over 4 years ago

Setting socket non-blocking, accept() fails all the time with EAGAIN:

accept() -> 11: Resource temporarily unavailable

I can reproduce the issue with this small C program attached.

Actions #10

Updated by pespin over 4 years ago

Setting as feedback while waiting for response from LimeSuite devs, to see if they are fixing it or shall we send a patch against LimeSuite implementing the proposed solution.

Actions #11

Updated by pespin over 4 years ago

  • Status changed from New to Feedback
Actions #12

Updated by laforge over 3 years ago

  • Assignee deleted (pespin)
Actions #13

Updated by laforge over 3 years ago

  • Status changed from Feedback to Stalled
Actions #14

Updated by pespin over 2 years ago

  • Status changed from Stalled to Closed
  • % Done changed from 0 to 100

The option causing the bug cannot longer be enabled starting from:
https://github.com/myriadrf/LimeSuite/commit/cfc53ed6433305e807dc161a9a9c81b868bf829e

Also described in the related upstream GH PR:
https://github.com/myriadrf/LimeSuite/issues/263

Closing the ticket.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)