Project

General

Profile

Bug #4062

vty tests fails on arm (raspberry pi)

Added by lynxis about 1 month ago. Updated 28 days ago.

Status:
Stalled
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
06/16/2019
Due date:
% Done:

30%

Spec Reference:

Description

The debian packages for raspbian fails on OBS because of a vty test failure.
Attached the debian source package.

https://build.opensuse.org/package/show/network:osmocom:nightly/libosmocore

libosmocore_1.1.0.68.a08e.dsc libosmocore_1.1.0.68.a08e.dsc 1.89 KB lynxis, 06/16/2019 12:14 PM
libosmocore_1.1.0.68.a08e.tar.xz libosmocore_1.1.0.68.a08e.tar.xz 753 KB lynxis, 06/16/2019 12:14 PM
_log.txt _log.txt 791 KB build log lynxis, 06/16/2019 12:14 PM
valgrind-vty_test.txt valgrind-vty_test.txt 21.5 KB laforge, 06/21/2019 06:29 PM

History

#1 Updated by laforge 28 days ago

  • Assignee set to laforge

according to the logs, there's actually a segfault during vty test execution, which is quite troubling.

#2 Updated by laforge 28 days ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30

I can reproduce the problem on a Raspi 3 with raspbian: vty_test segfaults. gdb doesn't show a backtrace but indicates the crash is in 'memcmp'

When compiling with "-g -O0", the test passes. Will try to debug further.

#3 Updated by laforge 28 days ago

It's not the optimization, but it's the "-g" which makes the problem vanish. weird.

strace shows:

mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f3f000
set_tls(0x76f3f4c0, 0x76f41e68, 0x76f4e050, 0x76f3f4c0, 0x76f4e050) = 0
mprotect(0x76e84000, 8192, PROT_READ)   = 0
mprotect(0x76d49000, 4096, PROT_READ)   = 0
mprotect(0x76ed6000, 4096, PROT_READ)   = 0
mprotect(0x76f05000, 4096, PROT_READ)   = 0
mprotect(0x76eb3000, 4096, PROT_READ)   = 0
mprotect(0x76f07000, 20480, PROT_READ|PROT_WRITE) = 0
mprotect(0x76f07000, 20480, PROT_READ|PROT_EXEC) = 0
cacheflush(0x76f07000, 0x76f0c000, 0, 0x15, 0) = 0
mprotect(0x76f1b000, 4096, PROT_READ)   = 0
mprotect(0x23000, 4096, PROT_READ)      = 0
mprotect(0x76f4d000, 4096, PROT_READ)   = 0
munmap(0x76f43000, 30037)               = 0
brk(NULL)                               = 0x1f9f000
brk(0x1fc0000)                          = 0x1fc0000
getcwd("/home/pi/osmo/libosmocore/tests", 4096) = 32
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x76f43618} ---
+++ killed by SIGSEGV +++
Segmentation fault

#4 Updated by laforge 28 days ago

valgrind also makes the program work, but shows an impressive list of problems during startup of the program (See attachment)

#5 Updated by laforge 28 days ago

stepping through the program in gdb points at gen_logging_level_cmd_strs(), and within that add_category_strings() and

 osmo_str_tolower (src=0x76fa508d "LGLOBAL") at utils.c:912
912             osmo_str_tolower_buf(buf, sizeof(buf), src);
(gdb) 
910     {
(gdb) 
912             osmo_str_tolower_buf(buf, sizeof(buf), src);
(gdb) 

osmo_str_tolower_buf (dest=0x76de9618 "-linux-armhf.so.3", dest_len=128, src=0x76fa508d "LGLOBAL")
    at utils.c:886

for sure we don't want ot write to somewhere that holds "-linux-armhf.so.3" ?!?

#6 Updated by laforge 28 days ago

Note: The valgrind errors don't show up on x86!

#7 Updated by laforge 28 days ago

git bisect states 171ef826e1489031bc48745f29fa2d4657bf165f is the culprit. This is what introduces thread-local storage to libosmocore static buffers. But why would those be different on raspbian?

#8 Updated by laforge 28 days ago

so the test executes a chain of functions ending with osmo_str_tolower_buf(), which wants to use a thread-local static buffer for string lower case conversion and that fails. but why?

#9 Updated by laforge 28 days ago

It's a mystery to me, why the __thread annotation for thread-local storage would fail to work on raspbian, but work with Debian on the same hardware, and also work with any other of the distributions/versions that we're working with.

It's unlikely that we're the first program on the planet wanting to use thread-local storage, either?

Also fascinating, when trying to use gdb:

(gdb) p buf
Cannot find thread-local storage for process 25857, shared library /home/pi/osmo/libosmocore/src/.libs/libosmocore.so.12:
Cannot find thread-local variables on this target

#10 Updated by laforge 28 days ago

  • Status changed from In Progress to Stalled

I'm giving up on this. It's ridiculous that something as basic as __thread is failing on something considered a "stable" distribution.

#11 Updated by laforge 28 days ago

Final update:
  • the bug is gone when building with clang instead of gcc
    • vty_test passes
    • valgrind doesn't complain at all anymore

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)