Project

General

Profile

Bug #4062

vty tests fails on arm (raspberry pi)

Added by lynxis over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
06/16/2019
Due date:
% Done:

100%

Spec Reference:

Description

The debian packages for raspbian fails on OBS because of a vty test failure.
Attached the debian source package.

https://build.opensuse.org/package/show/network:osmocom:nightly/libosmocore

libosmocore_1.1.0.68.a08e.dsc libosmocore_1.1.0.68.a08e.dsc 1.89 KB lynxis, 06/16/2019 12:14 PM
libosmocore_1.1.0.68.a08e.tar.xz libosmocore_1.1.0.68.a08e.tar.xz 753 KB lynxis, 06/16/2019 12:14 PM
_log.txt _log.txt 791 KB build log lynxis, 06/16/2019 12:14 PM
valgrind-vty_test.txt valgrind-vty_test.txt 21.5 KB laforge, 06/21/2019 06:29 PM
tls-demonstrator.tar.bz2 tls-demonstrator.tar.bz2 941 Bytes laforge, 07/31/2019 03:56 PM
tls-reproducer2.tar.bz2 tls-reproducer2.tar.bz2 1.88 KB bug reproducer laforge, 07/31/2019 08:09 PM

Related issues

Related to osmo-remsim - Bug #4456: SIGABRT while opening USB device in osmo-remsim-client-st2 on OpenWRT with muslResolved03/16/2020

Related to libosmocore - Bug #5079: osmo_str_tolower() segfaults on sysmoBTS (arm-poky-linux-gnueabi)Resolved03/17/2021

Associated revisions

Revision e188b8cd (diff)
Added by pespin about 2 years ago

configure: Autodetect TLS bug on ARM with old gcc and apply workaround

Check if compiler being used contains the bug. GCC 7.3.0 is the oldest
version containing the fix, and version 6.3.0 is known to contain the
bug. Bug is only known to appear so far only on ARM32. If the bug is
present, gcc will generate a wrong binary which wil lend up segfaulting
when accessing TLS (__thread) variables under certain conditions.

Related: OS#4062
Related: SYS#4628
Change-Id: I8acc2cf41b73da0c3290f1cefd79f2bc68b0e77d

Revision afce89da (diff)
Added by pespin about 2 years ago

configure: Allow disabling workaround for TLS bug in old ARM gcc versions

Some toolchains (such as sysmobts 201705 one) containing the TLS bug on
old ARM gcc versions (<7.3.0) also crash if the initial workaround found
is aplied (CFLAGS="-mtls-dialect=gnu2"). In that scenario, let's provide
a way to disable the workaround (to avoid "ld" crashing) and warn the
user about requirement to build with -O0 to avoid runtime crashes.

Related: OS#4062
Related: SYS#4628
Change-Id: I04ff8c702eabcf4f6e7b59e11aece2744267cefe

Revision b732f857 (diff)
Added by laforge over 1 year ago

HACK: Disable the use of thread-local storage

This will disable the thread-safe use of many libosmo* featues,
but it seems even in 2020 there are still systems out there
wthat don't support thread-local-storage.

Related: OS#4456
Related: OS#4062
Change-Id: I9baa7047fc4ff1a595ee318dcdb1247f1460da1b

History

#1 Updated by laforge over 2 years ago

  • Assignee set to laforge

according to the logs, there's actually a segfault during vty test execution, which is quite troubling.

#2 Updated by laforge over 2 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30

I can reproduce the problem on a Raspi 3 with raspbian: vty_test segfaults. gdb doesn't show a backtrace but indicates the crash is in 'memcmp'

When compiling with "-g -O0", the test passes. Will try to debug further.

#3 Updated by laforge over 2 years ago

It's not the optimization, but it's the "-g" which makes the problem vanish. weird.

strace shows:

mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f3f000
set_tls(0x76f3f4c0, 0x76f41e68, 0x76f4e050, 0x76f3f4c0, 0x76f4e050) = 0
mprotect(0x76e84000, 8192, PROT_READ)   = 0
mprotect(0x76d49000, 4096, PROT_READ)   = 0
mprotect(0x76ed6000, 4096, PROT_READ)   = 0
mprotect(0x76f05000, 4096, PROT_READ)   = 0
mprotect(0x76eb3000, 4096, PROT_READ)   = 0
mprotect(0x76f07000, 20480, PROT_READ|PROT_WRITE) = 0
mprotect(0x76f07000, 20480, PROT_READ|PROT_EXEC) = 0
cacheflush(0x76f07000, 0x76f0c000, 0, 0x15, 0) = 0
mprotect(0x76f1b000, 4096, PROT_READ)   = 0
mprotect(0x23000, 4096, PROT_READ)      = 0
mprotect(0x76f4d000, 4096, PROT_READ)   = 0
munmap(0x76f43000, 30037)               = 0
brk(NULL)                               = 0x1f9f000
brk(0x1fc0000)                          = 0x1fc0000
getcwd("/home/pi/osmo/libosmocore/tests", 4096) = 32
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x76f43618} ---
+++ killed by SIGSEGV +++
Segmentation fault

#4 Updated by laforge over 2 years ago

valgrind also makes the program work, but shows an impressive list of problems during startup of the program (See attachment)

#5 Updated by laforge over 2 years ago

stepping through the program in gdb points at gen_logging_level_cmd_strs(), and within that add_category_strings() and

 osmo_str_tolower (src=0x76fa508d "LGLOBAL") at utils.c:912
912             osmo_str_tolower_buf(buf, sizeof(buf), src);
(gdb) 
910     {
(gdb) 
912             osmo_str_tolower_buf(buf, sizeof(buf), src);
(gdb) 

osmo_str_tolower_buf (dest=0x76de9618 "-linux-armhf.so.3", dest_len=128, src=0x76fa508d "LGLOBAL")
    at utils.c:886

for sure we don't want ot write to somewhere that holds "-linux-armhf.so.3" ?!?

#6 Updated by laforge over 2 years ago

Note: The valgrind errors don't show up on x86!

#7 Updated by laforge over 2 years ago

git bisect states 171ef826e1489031bc48745f29fa2d4657bf165f is the culprit. This is what introduces thread-local storage to libosmocore static buffers. But why would those be different on raspbian?

#8 Updated by laforge over 2 years ago

so the test executes a chain of functions ending with osmo_str_tolower_buf(), which wants to use a thread-local static buffer for string lower case conversion and that fails. but why?

#9 Updated by laforge over 2 years ago

It's a mystery to me, why the __thread annotation for thread-local storage would fail to work on raspbian, but work with Debian on the same hardware, and also work with any other of the distributions/versions that we're working with.

It's unlikely that we're the first program on the planet wanting to use thread-local storage, either?

Also fascinating, when trying to use gdb:

(gdb) p buf
Cannot find thread-local storage for process 25857, shared library /home/pi/osmo/libosmocore/src/.libs/libosmocore.so.12:
Cannot find thread-local variables on this target

#10 Updated by laforge over 2 years ago

  • Status changed from In Progress to Stalled

I'm giving up on this. It's ridiculous that something as basic as __thread is failing on something considered a "stable" distribution.

#11 Updated by laforge over 2 years ago

Final update:
  • the bug is gone when building with clang instead of gcc
    • vty_test passes
    • valgrind doesn't complain at all anymore

#12 Updated by laforge about 2 years ago

getting back to this with some more information:

  • manually trying to reproduce this with a small binary and a small shared library outside of libosmocore failed for some reason
  • libosmocore contains the expected .tbss ELF section with "ALLOC, THREAD_LOCAL" flags

#13 Updated by laforge about 2 years ago

for the clang-built libosmocore:

pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ objdump -h ./libosmocore.so.12

./libosmocore.so.12:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
...
 16 .tbss         000022c8  00032b40  00032b40  00022b40  2**3
                  ALLOC, THREAD_LOCAL
<pre>

for the gcc-built one:
<pre>
15 .tbss         000022c8  0002db74  0002db74  0001db74  2**2
                  ALLOC, THREAD_LOCAL
</pre>

#14 Updated by laforge about 2 years ago

#15 Updated by laforge about 2 years ago

#16 Updated by laforge about 2 years ago

[earlier comments about which combination worked or didn't work were wrong].

The following steps result in a working binary:
  • building the main program with gcc against a gcc-built libosmocore
  • later swapping libosmocore.so.12 against a clang-built one

#17 Updated by laforge about 2 years ago

Adding the "objdump -d -S" output for the relevant function for both clang and gcc:

clang:

const char *osmo_str_tolower(const char *src)
{
    17ec:       e92d4830        push    {r4, r5, fp, lr}
    17f0:       e28db008        add     fp, sp, #8
    17f4:       e1a04000        mov     r4, r0
        static __thread char buf[128];
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    17f8:       e59f0044        ldr     r0, [pc, #68]   ; 1844 <osmo_str_tolower+0x58>
    17fc:       e08f0000        add     r0, pc, r0
    1800:       ebfffffe        bl      0 <__tls_get_addr>
    1804:       e1a05000        mov     r5, r0
    1808:       e59f0038        ldr     r0, [pc, #56]   ; 1848 <osmo_str_tolower+0x5c>
    180c:       e1a01005        mov     r1, r5
    1810:       e3a02080        mov     r2, #128        ; 0x80
    1814:       e08f0000        add     r0, pc, r0
    1818:       ebfffffe        bl      0 <printf>
        memset(buf, 0, sizeof(buf));
    181c:       e1a00005        mov     r0, r5
    1820:       e3a01000        mov     r1, #0
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    1824:       e3a02080        mov     r2, #128        ; 0x80
        memset(buf, 0, sizeof(buf));
    1828:       ebfffffe        bl      0 <memset>
        osmo_str_tolower_buf(buf, sizeof(buf), src);
    182c:       e1a00005        mov     r0, r5
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    1830:       e3a01080        mov     r1, #128        ; 0x80
        osmo_str_tolower_buf(buf, sizeof(buf), src);
    1834:       e1a02004        mov     r2, r4
    1838:       ebfffffe        bl      1698 <osmo_str_tolower_buf>
        return buf;
    183c:       e1a00005        mov     r0, r5
    1840:       e8bd8830        pop     {r4, r5, fp, pc}
    1844:       00000040        .word   0x00000040
    1848:       0000002c        .word   0x0000002c

gcc:

const char *osmo_str_tolower(const char *src)
{
        static __thread char buf[128];
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    152c:       e59f3054        ldr     r3, [pc, #84]   ; 1588 <osmo_str_tolower+0x5c>
{
    1530:       e92d4070        push    {r4, r5, r6, lr}
    1534:       e1a06000        mov     r6, r0
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    1538:       e08f0003        add     r0, pc, r3
    153c:       ebfffffe        bl      0 <__tls_get_addr>
    1540:       e59f4044        ldr     r4, [pc, #68]   ; 158c <osmo_str_tolower+0x60>
    1544:       e3a02080        mov     r2, #128        ; 0x80
        memset(buf, 0, sizeof(buf));
    1548:       e1a05002        mov     r5, r2
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    154c:       e0804004        add     r4, r0, r4
    1550:       e59f0038        ldr     r0, [pc, #56]   ; 1590 <osmo_str_tolower+0x64>
    1554:       e1a01004        mov     r1, r4
    1558:       e08f0000        add     r0, pc, r0
    155c:       ebfffffe        bl      0 <printf>
        memset(buf, 0, sizeof(buf));
    1560:       e1a02005        mov     r2, r5
    1564:       e1a00004        mov     r0, r4
    1568:       e3a01000        mov     r1, #0
    156c:       ebfffffe        bl      0 <memset>
        osmo_str_tolower_buf(buf, sizeof(buf), src);
    1570:       e1a02006        mov     r2, r6
    1574:       e1a01005        mov     r1, r5
    1578:       e1a00004        mov     r0, r4
    157c:       ebfffffe        bl      1400 <osmo_str_tolower_buf>
        return buf;
}

#18 Updated by laforge about 2 years ago

building libosmocore with gcc-5 doesn't change the behavior; it fails just like with gcc-6.

#19 Updated by laforge about 2 years ago

attaching my small "demonstrator" showing that tls works with a self-built shared library, but fails against libosocore.

#20 Updated by laforge about 2 years ago

one difference is the the 'struct tls_index' offset when comparing the gcc vs. clang version:

working clang:

trying to use TLS in own library

Breakpoint 2, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831
831     dl-tls.c: No such file or directory.
(gdb) p *ti
$1 = {ti_module = 2, ti_offset = 0}
(gdb) cont
Continuing.
tls_foo: buf=0x76e436d0
bar1='hello shlib'

trying to use TLS in libosmocore

Breakpoint 2, __GI___tls_get_addr (ti=0x23080) at dl-tls.c:831
831     in dl-tls.c
(gdb) p *ti
$2 = {ti_module = 1, ti_offset = 4360}
(gdb) cont
Continuing.
buf=0x76e435d0, sizeof(buf)=128
osmo_str_tolower_buf(0x76e435d0, 128, 0x12a4c)
i=0
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9
bar2='hello osmo'
[Inferior 1 (process 19031) exited normally]

non-working gcc:

trying to use TLS in own library

Breakpoint 1, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831
831     dl-tls.c: No such file or directory.
(gdb) p *ti
$1 = {ti_module = 2, ti_offset = 0}
(gdb) cont
Continuing.
tls_foo: buf=0x76e436d0
bar1='hello shlib'

trying to use TLS in libosmocore

Breakpoint 1, __GI___tls_get_addr (ti=0x23064) at dl-tls.c:831
831     in dl-tls.c
(gdb) p *ti
$2 = {ti_module = 1, ti_offset = 0}
(gdb) cont
Continuing.
buf=0x76e455c8, sizeof(buf)=128

Program received signal SIGSEGV, Segmentation fault.

The binaries used in bothe examples above use static linking of libosmocore/src/.libs/utils.o to exclude any influence from other parts of libosmocore.

#21 Updated by laforge about 2 years ago

further differences:

non-working gcc compiled libosmocore:

pi@rpi3-bplus:~/osmo/libosmocore-gcc/src/.libs $ readelf -r ./libosmocore.so | grep TLS
0002e480  00000011 R_ARM_TLS_DTPMOD3

working clang compiled libosmocore:

pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./libosmocore.so | grep TLS
0003334c  00000011 R_ARM_TLS_DTPMOD3
00033354  00000011 R_ARM_TLS_DTPMOD3
0003335c  00000011 R_ARM_TLS_DTPMOD3
00033364  00000011 R_ARM_TLS_DTPMOD3
0003336c  00000011 R_ARM_TLS_DTPMOD3
00033374  00000011 R_ARM_TLS_DTPMOD3
0003337c  00000011 R_ARM_TLS_DTPMOD3
00033384  00000011 R_ARM_TLS_DTPMOD3
0003338c  00000011 R_ARM_TLS_DTPMOD3
00033394  00000011 R_ARM_TLS_DTPMOD3
0003339c  00000011 R_ARM_TLS_DTPMOD3
000333a4  00000011 R_ARM_TLS_DTPMOD3

Those entries each correspond to one 'struct tls_index', and I'm somewhat surprised that the count differs from gcc to clang ?!?

#22 Updated by laforge about 2 years ago

specifically, looking at utils.o:

non-working gcc object:

0000009c  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
000000a4  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
0000054c  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
00000550  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
000005ac  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
000005b4  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
00000674  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
0000067c  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
000008e8  00005b6a R_ARM_TLS_LDO32   00001ff8   .LANCHOR2
000008ec  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
000008f0  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
000010f8  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
000010fc  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
00001304  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
00001308  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
00001588  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
0000158c  00005b6a R_ARM_TLS_LDO32   00001ff8   .LANCHOR2
000016a0  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
000016a4  00005b6a R_ARM_TLS_LDO32   00001ff8   .LANCHOR2
000003cf  00005c6a R_ARM_TLS_LDO32   00000000   namebuf
000003f2  00005d6a R_ARM_TLS_LDO32   00000100   hexd_buff
00000663  0000606a R_ARM_TLS_LDO32   00001188   buf.7531
00000840  00005f6a R_ARM_TLS_LDO32   00001108   buf.7509
00001597  00005e6a R_ARM_TLS_LDO32   00001100   buf.7331
00001bcc  00005e6a R_ARM_TLS_LDO32   00001100   buf.7331

working clang object:

pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./utils.o | grep TLS
0000008c  0000fb68 R_ARM_TLS_GD32    00000000   namebuf
0000059c  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
000005a0  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
000005a4  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000680  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000684  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000690  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
0000081c  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000820  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000828  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000ab4  0000fc68 R_ARM_TLS_GD32    00001100   osmo_encode_big_endian
00000ab8  0000fc68 R_ARM_TLS_GD32    00001100   osmo_encode_big_endian
00001374  0000fb68 R_ARM_TLS_GD32    00000000   namebuf
0000158c  0000fb68 R_ARM_TLS_GD32    00000000   namebuf
00001844  0000fe68 R_ARM_TLS_GD32    00001108   osmo_str_tolower.buf
00001a5c  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a60  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a64  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a68  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a6c  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a70  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a74  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
0000004a  0000fc6a R_ARM_TLS_LDO32   00001100   osmo_encode_big_endian
000000d5  0000fe6a R_ARM_TLS_LDO32   00001108   osmo_str_tolower.buf
00000122  0000ff6a R_ARM_TLS_LDO32   00001188   osmo_str_toupper.buf
000001c8  0000fb6a R_ARM_TLS_LDO32   00000000   namebuf
000001e6  0000fa6a R_ARM_TLS_LDO32   00000100   hexd_buff

and finally, from a working gcc object compiled with -O0 instead of -O2:

readelf -r ./utils.o | grep TLS
00000138  00000968 R_ARM_TLS_GD32    00000000   namebuf
00000140  00000968 R_ARM_TLS_GD32    00000000   namebuf
00000144  00000968 R_ARM_TLS_GD32    00000000   namebuf
00000a4c  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000ac0  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000ac8  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000bdc  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000be4  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000f80  00001c68 R_ARM_TLS_GD32    00001100   buf.7143
00000f84  00001c68 R_ARM_TLS_GD32    00001100   buf.7143
00002328  00000968 R_ARM_TLS_GD32    00000000   namebuf
00002824  00000968 R_ARM_TLS_GD32    00000000   namebuf
00002c3c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c44  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c48  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c4c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002e04  00003e68 R_ARM_TLS_GD32    00001188   buf.7337
00002e08  00003e68 R_ARM_TLS_GD32    00001188   buf.7337
000003cf  0000096a R_ARM_TLS_LDO32   00000000   namebuf
000003f2  00000e6a R_ARM_TLS_LDO32   00000100   hexd_buff
000005dc  00003e6a R_ARM_TLS_LDO32   00001188   buf.7337
000006d6  0000386a R_ARM_TLS_LDO32   00001108   buf.7318
00000ff5  00001c6a R_ARM_TLS_LDO32   00001100   buf.7143

We can judge by the size that only the lines with 1108 as size are relevant for osmo_str_tolower()
which gives us the following summary:

working gcc-O0:

00002c3c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c44  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c48  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c4c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c4c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
000006d6  0000386a R_ARM_TLS_LDO32   00001108   buf.7318

non-working gcc-O2:

00000840  00005f6a R_ARM_TLS_LDO32   00001108   buf.7509

So somehow, in the non-working case there is no R_ARM_TLS_GD32 entry at all. I don't know what it means but it's for sure a noticable difference.

#23 Updated by laforge about 2 years ago

According to https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf

R_ARM_TLS_GD32 causes two adjacent entries to be added to the dynamically relocated section (the Global
Offset Table, or GOT). The first of these is dynamically relocated by R_ARM_TLS_DTPMOD32, the second by
R_ARM_TLS_DTPOFF32. The place resolves to the offset of the first of the GOT entries from the place.

this sounds a lot like what's required with tls_index (ti_module + ti_offset).

R_ARM_TLS_LDO32 resolves to the offset of the referenced data object (which must be local to the module) fromthe origin of the TLS block for the current module.

#24 Updated by laforge about 2 years ago

gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516 also doesn't generate the GD32 relocations when "-O1" is enabled. So -O0 is the only option to generate working code with gcc.

This is 2019, TLS existed since late 2002...

#25 Updated by laforge about 2 years ago

  • Status changed from Stalled to In Progress
  • % Done changed from 30 to 50

Ok, so the relocations might have been a dead end. I now have a relatively small, self-contained reproducer (see attached tls-reproducer2.tar.bz2). It has no external dependencies, not even libosmocore or talloc.

If you compile this with -O2 on raspbian9 (likely also other 32bit little-endian ARM architectures), it will segfault.

  • if you change to -O0, the resulting binary will work.
  • if you shrink hexd_buff from 4096 to 4092 bytes or lower, the resulting binary will work
  • if you build with clang, the resulting binary will work.
  • if you build without -fPIC, the resulting binary will work.

WTF is going on here?

#27 Updated by laforge about 2 years ago

  • Assignee changed from laforge to pespin

assigning to pespin, please report upstream to gcc and/or analyze further.

#28 Updated by laforge about 2 years ago

might still be relocation related:

non-working gcc-O2 binary:

pi@rpi3-bplus:~/tls $ objdump -x utils.o  | grep buf
00000000 l       .tbss  00001000 hexd_buff
00001000 l       .tbss  00000080 buf.6264
00000337 R_ARM_TLS_LDO32   hexd_buff
00000397 R_ARM_TLS_LDO32   buf.6264

working gcc-O0 binary:

pi@rpi3-bplus:~/tls $ objdump -x utils.o  | grep buf
00000000 l       .tbss  00001000 hexd_buff
00001000 l       .tbss  00000080 buf.6111
0000027c R_ARM_TLS_GD32    hexd_buff
00000284 R_ARM_TLS_GD32    hexd_buff
00000558 R_ARM_TLS_GD32    buf.6111
00000560 R_ARM_TLS_GD32    buf.6111
00000564 R_ARM_TLS_GD32    buf.6111
00000568 R_ARM_TLS_GD32    buf.6111

working clang binary:

pi@rpi3-bplus:~/tls $ objdump -x utils.o  | grep buf
00000000 l       .tbss  00001000 hexd_buff
00001000 l       .tbss  00000080 osmo_str_tolower.buf
00000200 R_ARM_TLS_GD32    hexd_buff
00000204 R_ARM_TLS_GD32    hexd_buff
00000210 R_ARM_TLS_GD32    hexd_buff
00000428 R_ARM_TLS_GD32    osmo_str_tolower.buf
00000048 R_ARM_TLS_LDO32   osmo_str_tolower.buf
00000084 R_ARM_TLS_LDO32   hexd_buff

#29 Updated by laforge about 2 years ago

in terms of potential ugly interim workarounds, we could think about shrinking the size of our large __thread buffers. There's only two: the 4096 byte hexdump buffer in utils.c, and the 4100 byte buffer that's part of msgb_hexdump() in msgb.c. The point is without understanding the details of this bug: How can we be sure we shrink them sufficiently?

#30 Updated by pespin about 2 years ago

  • Assignee changed from pespin to laforge
  • % Done changed from 50 to 30

I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.

#31 Updated by laforge about 2 years ago

Hi Pau,

On Wed, Jul 31, 2019 at 10:44:56PM +0000, pespin [REDMINE] wrote:

Assignee changed from pespin to laforge
% Done changed from 50 to 30

were those two intentional changes?

I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.

I'm not sure that 4092 will do it for msgb, this still needs to be validated. I played a bit
around and it seems there is a "global" maximum, most likely per source code file.

Regards,
Harald

#32 Updated by pespin about 2 years ago

  • Assignee changed from laforge to pespin
  • % Done changed from 30 to 50

Hi Harald, no they were no intentional, I have no idea how those ended up being changed.

#33 Updated by pespin about 2 years ago

Checking at osmo-bsc runtime crash from same issue (osmo_str_to_lower()), here's some conclusions (and tests below):

I couldn't find the exact culprit of the issue, but some hints on requirements for the issue to happen:
  • It seems the file must have more than 1 TLS variable for bug to appear (so compiler applies some optimizations in that case, which are not worth if there's less than 2 variables).
  • TLS variable being inside or outside of function doesn't seem to make any difference to me (other than maybe ordering in memory, which I don't know).
  • Behavior seems to be OK as long as all variables fit inside the first mem page (4096 bytes).
  • Problems start when sum of TLS variables starts getting bigger than 1st mem page aprox, so to me it looks like issues with mem page boundaries.
  • It seems order of variables matter. For instance having only "hexd_buff=1 && buf=4094" is fine, but having "hexd_buff=4094 && buf=1" crashes.
  • Bug doesn't appear starting from a single threshold size. For instance "hexd_buff=4094 && buf=9000" is fine, but "hexd_buff=4094 && buf=1" crashes. So to me it's either memory page bounded related or there's some range of sizes affected somehow.
Let's start with changing only value for hexd_buf:
4096 crash
4092 crash
3900 crash
3850 crash
3830 crash
3829 crash  <--- first value crashing, used next to see if it still can work by droppng size of other buffers (to verify relations).
----------
3828 works
3824 works
3820 works
3810 works
3800 works
3000 works
2000 works

Let's remove the uint64_t __thread variable (removing __thread attr), that's 8 bytes less:
no-uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3836 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3837 + buf=128 + buf=128 + namebuf=255 crash <-- so in this case it's really the 8 bytes we removed, they can be passed from one var to another (so they are related). relation 1-1 bytes.

uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 crash
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=254 crash
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=253 crash
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=252 works  <-- so here it theoretically should be 1-3 bytes relation with hexd_buff and namebuf.
uint64_t + hexd_buff=3830 + buf=128 + buf=128 + namebuf=252 works  <-- let's increase the big buffer by 1, to see if relation is kept
uint64_t + hexd_buff=3831 + buf=128 + buf=128 + namebuf=252 works  <-- let's increase the big buffer by 1, to see if relation is kept.
uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=252 works  <-- let's increase the big buffer by 1, to see if relation is kept. So it seems the initial 1-3 relation might be related to padding between 2 buffers (to be 4-byte/word aligned?).
uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=252 crash  <-- Let's increase by 1 again, we break the 4-byte alignemnt so total space increases and we cross the file size "threshlold" line  again.
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=250 works  <-- here we continue dropping several buffers to check everything works fine. 
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=240 works
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=100 works

uint64_t + hexd_buff=3833 + buf=32 + buf=128 + namebuf=252 ?  <-- Crashing situation, but checking if 1nd buf affects it: it doesn't
uint64_t + hexd_buff=3833 + buf=128 + buf=32 + namebuf=252 crash  <-- Crashing situation, but checking if 2nd buf affects it: it doesn't
uint64_t + hexd_buff=3833 + buf=32 + buf=32 + namebuf=252 crash  <-- Crashing situation, but checking if both buf affects it: they don't
uint64_t + hexd_buff=3833 + buf=1 + buf=1 + namebuf=252 crash  <-- Crashing situation, but checking if both buf affects it: they don't
uint64_t + hexd_buff=3833 + no-buf + no-buf + namebuf=252 works  <-- Crashing situation, but cremoving __thread from both buf affects it: it does!!!

Let's try moving stuff around:
no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=255 crash  <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 255(+1) + 4 + 4 = 4101?)
no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=252 works  <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 252 + 4 + 4 = 4097?)
no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=254 crash  <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 254 + 4 + 4 = 4099?)
no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=254 works  <-- Crashing situation from above, modified (sum with 4-byte padding: (3836 + 254 + 4 + 4 = 4098?)

Let's try moving stuff around (to be 4-bytes without padding):
no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=256 works  <-- (sum with 4-byte padding: (3836 + 256 + 2 + 2 = 4096)
no-uint64_t + hexd_buff=3840 + buf=1 + buf=1 + namebuf=252 works  <-- Let's move 4 bytes from namebuf to hexd_buff (sum with 4-byte padding: (3840 + 252 + 2 + 2 = 4096)
no-uint64_t + hexd_buff=3844 + buf=1 + buf=1 + namebuf=252 crash  <-- Let's add 4 bytes to hexd_buff (sum with 4-byte padding: (3844 + 252 + 2 + 2 = 4100)
no-uint64_t + hexd_buff=4088 + buf=1 + buf=1 + namebuf=4 works  <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4088 +2 +2 + 4 = 4096)
no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=2 crash  <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4090 + 2 +2 + 2 = 4096).
no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=1 crash  <-- Let's go back to working situation, but moving most bytes to hexd_buff, seems still padding issues?.
no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + no-namebuf works  <-- works, so probably namebuf had padding to 4 bytes? 
no-uint64_t + hexd_buff=4094 + buf=1 + buf=1 + no-namebuf crash  <-- Let's increase for bytes to be higher than 4096, then we see it crashes and namebug had 4-byte padding.

no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash  <-- Let's remove the buf we can get rid of to simplify. We are under 4096 but still crashes (maybe it takes 4-byte boundaries?).
no-uint64_t + no-hexd_buff + no-buff + buf=4096 + no-namebuf works  <-- Getting rid of hexd_buf.... works!
no-uint64_t + no-hexd_buff + no-buff + buf=9000 + no-namebuf works  <-- Let's go higher than 4096... works! wtf
no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works  <-- Going back to have 2 variables, but most of the size is on the second one. Moving buf out of function still makes it work.
no-uint64_t + hexd_buff=4000 + no-buf + buf=9000 + no-namebuf works  <-- Adding 4000 to hexd_buf, still works.
no-uint64_t + hexd_buff=6000 + no-buf + buf=9000 + no-namebuf works <-- Adding 2000 to hexd_buf, still works!!! (moving buf inside the variable too, so being inside or outside doesn't seem to help there).
no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash <-- Going back to known to crash values. It crashes. So it seems there's no simple threshold at which point it crashes.
no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crah
no-uint64_t + hexd_buff=6000 + no-buf + buf=2100 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable still ends up in 2nd page).
no-uint64_t + hexd_buff=6000 + no-buf + buf=6296 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable now ends up in 4th page). Crash happens in different place! exit()->__call_tls_dtors().
no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=6296).
no-uint64_t + hexd_buff=4 + no-buf + buf=4092 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=4098).
no-uint64_t + hexd_buff=1 + no-buf + buf=4094 + no-namebuf works <-- So it works, contrary to the hexd_buff=4094 + buf=1, so order matters
no-uint64_t + hexd_buff=4094 + no-buf + buf=9000 + no-namebuf works
no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crahes 

#34 Updated by pespin about 2 years ago

Submitted a couple related workaround patches to have it working for now:

remote: https://gerrit.osmocom.org/c/libosmocore/+/15024 utils: share static buffer in osmo_str_to{lower,upper}()
remote: https://gerrit.osmocom.org/c/libosmocore/+/15025 Workaround TLS gcc compiler bug in ARM

I tested they work fine (osmo-bsc doesn't crash upon startup like it used to, after applying the patch).

I will document better the 2nd one if needed. I still need to report the issue upstream (not sure if it really make sense other than for inforamtional purposes since it seems to be fixed on newer toolchains).

#36 Updated by laforge about 2 years ago

"gcc-6 (Raspbian 6.5.0-1+rpi1+b1) 6.5.0 20181026" can reproduce the problem, while "gcc (Raspbian 8.3.0-6+rpi1) 8.3.0" cannot.

#37 Updated by laforge about 2 years ago

"gcc-7 (Raspbian 7.3.0-19) 7.3.0" also cannot reproduce the problem. So it seems anything < 7.x for sure has the bug.

#38 Updated by laforge about 2 years ago

"-mtls-dialect=gnu2" seems to be a viable way to avoid the segfault, as indicated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81142#c12

I just built libosmocore using gcc-6.5 on raspbian 10 using "-O2 -g -mtls-dialect=gnu2" and linked main programs compiled with normal "-O2 -g" and the resulting programs worked as expected.

#39 Updated by laforge about 2 years ago

So now the challenge is to find or write autoconf macros which detect the combination of 32bit arm and the compiler version < 7.1.x, and then tune the CFLAGS accordingly if that is the case.

For the gcc version, there's a suggestion in https://stackoverflow.com/questions/7067385/find-the-gcc-version.

#40 Updated by Hoernchen about 2 years ago

The gnu2 aka TLSDESC model has been marked experimental for ~10 years in https://static.docs.arm.com/ihi0044/g/aaelf32.pdf and clang/lld does not support it except for aarch64 where it's the default - so being a rather arcane non-standard option it's hard to tell how well it works, but I guess we will find that out the same way we found this bug, by just using it.

#41 Updated by pespin about 2 years ago

Patch to automatically workaround this bug (ARM32 and gcc < 7.3.0):
https://gerrit.osmocom.org/c/libosmocore/+/15037 configure: Autodetect TLS bug on ARM with old gcc and apply workaround

BTW, it seems sysmobts SDK's ld doesn't like the new CFLAG, it segfaults.

#42 Updated by pespin about 2 years ago

  • Status changed from In Progress to Feedback
  • % Done changed from 50 to 90

toolchain we use for sysmobts and oc2g in OE crashes when using the -mtls-dialect=gnu2 CFLAG. However, the toolchain seems to produce correct binaries which don't crash at runtime.
So I added a new commit to libosmocore which allows disabling the automatic workaround we added:
https://gerrit.osmocom.org/c/libosmocore/+/15061 configure: Allow disabling workaround for TLS bug in old ARM gcc versions

I already submitted a patch against meta-telephony laforge/nightly to use it, and also another patch for osmo-gsm-tester to fix the build scripts using SDKs from that toolchain:
https://gerrit.osmocom.org/c/osmo-gsm-tester/+/15067 contrib: Avoid sysmobts/oc2g toolchain ld crashes building new libosmocore

Once those are merged and we cross-check that everything's fine, we can close the ticket.

#43 Updated by pespin about 2 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

Everything's merged and included in release from this week, closing this ticket.

#44 Updated by laforge over 1 year ago

  • Related to Bug #4456: SIGABRT while opening USB device in osmo-remsim-client-st2 on OpenWRT with musl added

#45 Updated by fixeria 6 months ago

  • Related to Bug #5079: osmo_str_tolower() segfaults on sysmoBTS (arm-poky-linux-gnueabi) added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)