Project

General

Profile

Bug #4062

vty tests fails on arm (raspberry pi)

Added by lynxis 6 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
06/16/2019
Due date:
% Done:

100%

Spec Reference:

Description

The debian packages for raspbian fails on OBS because of a vty test failure.
Attached the debian source package.

https://build.opensuse.org/package/show/network:osmocom:nightly/libosmocore

libosmocore_1.1.0.68.a08e.dsc libosmocore_1.1.0.68.a08e.dsc 1.89 KB lynxis, 06/16/2019 12:14 PM
libosmocore_1.1.0.68.a08e.tar.xz libosmocore_1.1.0.68.a08e.tar.xz 753 KB lynxis, 06/16/2019 12:14 PM
_log.txt _log.txt 791 KB build log lynxis, 06/16/2019 12:14 PM
valgrind-vty_test.txt valgrind-vty_test.txt 21.5 KB laforge, 06/21/2019 06:29 PM
tls-demonstrator.tar.bz2 tls-demonstrator.tar.bz2 941 Bytes laforge, 07/31/2019 03:56 PM
tls-reproducer2.tar.bz2 tls-reproducer2.tar.bz2 1.88 KB bug reproducer laforge, 07/31/2019 08:09 PM

History

#1 Updated by laforge 6 months ago

  • Assignee set to laforge

according to the logs, there's actually a segfault during vty test execution, which is quite troubling.

#2 Updated by laforge 6 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30

I can reproduce the problem on a Raspi 3 with raspbian: vty_test segfaults. gdb doesn't show a backtrace but indicates the crash is in 'memcmp'

When compiling with "-g -O0", the test passes. Will try to debug further.

#3 Updated by laforge 6 months ago

It's not the optimization, but it's the "-g" which makes the problem vanish. weird.

strace shows:

mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f3f000
set_tls(0x76f3f4c0, 0x76f41e68, 0x76f4e050, 0x76f3f4c0, 0x76f4e050) = 0
mprotect(0x76e84000, 8192, PROT_READ)   = 0
mprotect(0x76d49000, 4096, PROT_READ)   = 0
mprotect(0x76ed6000, 4096, PROT_READ)   = 0
mprotect(0x76f05000, 4096, PROT_READ)   = 0
mprotect(0x76eb3000, 4096, PROT_READ)   = 0
mprotect(0x76f07000, 20480, PROT_READ|PROT_WRITE) = 0
mprotect(0x76f07000, 20480, PROT_READ|PROT_EXEC) = 0
cacheflush(0x76f07000, 0x76f0c000, 0, 0x15, 0) = 0
mprotect(0x76f1b000, 4096, PROT_READ)   = 0
mprotect(0x23000, 4096, PROT_READ)      = 0
mprotect(0x76f4d000, 4096, PROT_READ)   = 0
munmap(0x76f43000, 30037)               = 0
brk(NULL)                               = 0x1f9f000
brk(0x1fc0000)                          = 0x1fc0000
getcwd("/home/pi/osmo/libosmocore/tests", 4096) = 32
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x76f43618} ---
+++ killed by SIGSEGV +++
Segmentation fault

#4 Updated by laforge 6 months ago

valgrind also makes the program work, but shows an impressive list of problems during startup of the program (See attachment)

#5 Updated by laforge 6 months ago

stepping through the program in gdb points at gen_logging_level_cmd_strs(), and within that add_category_strings() and

 osmo_str_tolower (src=0x76fa508d "LGLOBAL") at utils.c:912
912             osmo_str_tolower_buf(buf, sizeof(buf), src);
(gdb) 
910     {
(gdb) 
912             osmo_str_tolower_buf(buf, sizeof(buf), src);
(gdb) 

osmo_str_tolower_buf (dest=0x76de9618 "-linux-armhf.so.3", dest_len=128, src=0x76fa508d "LGLOBAL")
    at utils.c:886

for sure we don't want ot write to somewhere that holds "-linux-armhf.so.3" ?!?

#6 Updated by laforge 6 months ago

Note: The valgrind errors don't show up on x86!

#7 Updated by laforge 6 months ago

git bisect states 171ef826e1489031bc48745f29fa2d4657bf165f is the culprit. This is what introduces thread-local storage to libosmocore static buffers. But why would those be different on raspbian?

#8 Updated by laforge 6 months ago

so the test executes a chain of functions ending with osmo_str_tolower_buf(), which wants to use a thread-local static buffer for string lower case conversion and that fails. but why?

#9 Updated by laforge 6 months ago

It's a mystery to me, why the __thread annotation for thread-local storage would fail to work on raspbian, but work with Debian on the same hardware, and also work with any other of the distributions/versions that we're working with.

It's unlikely that we're the first program on the planet wanting to use thread-local storage, either?

Also fascinating, when trying to use gdb:

(gdb) p buf
Cannot find thread-local storage for process 25857, shared library /home/pi/osmo/libosmocore/src/.libs/libosmocore.so.12:
Cannot find thread-local variables on this target

#10 Updated by laforge 6 months ago

  • Status changed from In Progress to Stalled

I'm giving up on this. It's ridiculous that something as basic as __thread is failing on something considered a "stable" distribution.

#11 Updated by laforge 6 months ago

Final update:
  • the bug is gone when building with clang instead of gcc
    • vty_test passes
    • valgrind doesn't complain at all anymore

#12 Updated by laforge 4 months ago

getting back to this with some more information:

  • manually trying to reproduce this with a small binary and a small shared library outside of libosmocore failed for some reason
  • libosmocore contains the expected .tbss ELF section with "ALLOC, THREAD_LOCAL" flags

#13 Updated by laforge 4 months ago

for the clang-built libosmocore:

pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ objdump -h ./libosmocore.so.12

./libosmocore.so.12:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
...
 16 .tbss         000022c8  00032b40  00032b40  00022b40  2**3
                  ALLOC, THREAD_LOCAL
<pre>

for the gcc-built one:
<pre>
15 .tbss         000022c8  0002db74  0002db74  0001db74  2**2
                  ALLOC, THREAD_LOCAL
</pre>

#14 Updated by laforge 4 months ago

#15 Updated by laforge 4 months ago

#16 Updated by laforge 4 months ago

[earlier comments about which combination worked or didn't work were wrong].

The following steps result in a working binary:
  • building the main program with gcc against a gcc-built libosmocore
  • later swapping libosmocore.so.12 against a clang-built one

#17 Updated by laforge 4 months ago

Adding the "objdump -d -S" output for the relevant function for both clang and gcc:

clang:

const char *osmo_str_tolower(const char *src)
{
    17ec:       e92d4830        push    {r4, r5, fp, lr}
    17f0:       e28db008        add     fp, sp, #8
    17f4:       e1a04000        mov     r4, r0
        static __thread char buf[128];
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    17f8:       e59f0044        ldr     r0, [pc, #68]   ; 1844 <osmo_str_tolower+0x58>
    17fc:       e08f0000        add     r0, pc, r0
    1800:       ebfffffe        bl      0 <__tls_get_addr>
    1804:       e1a05000        mov     r5, r0
    1808:       e59f0038        ldr     r0, [pc, #56]   ; 1848 <osmo_str_tolower+0x5c>
    180c:       e1a01005        mov     r1, r5
    1810:       e3a02080        mov     r2, #128        ; 0x80
    1814:       e08f0000        add     r0, pc, r0
    1818:       ebfffffe        bl      0 <printf>
        memset(buf, 0, sizeof(buf));
    181c:       e1a00005        mov     r0, r5
    1820:       e3a01000        mov     r1, #0
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    1824:       e3a02080        mov     r2, #128        ; 0x80
        memset(buf, 0, sizeof(buf));
    1828:       ebfffffe        bl      0 <memset>
        osmo_str_tolower_buf(buf, sizeof(buf), src);
    182c:       e1a00005        mov     r0, r5
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    1830:       e3a01080        mov     r1, #128        ; 0x80
        osmo_str_tolower_buf(buf, sizeof(buf), src);
    1834:       e1a02004        mov     r2, r4
    1838:       ebfffffe        bl      1698 <osmo_str_tolower_buf>
        return buf;
    183c:       e1a00005        mov     r0, r5
    1840:       e8bd8830        pop     {r4, r5, fp, pc}
    1844:       00000040        .word   0x00000040
    1848:       0000002c        .word   0x0000002c

gcc:

const char *osmo_str_tolower(const char *src)
{
        static __thread char buf[128];
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    152c:       e59f3054        ldr     r3, [pc, #84]   ; 1588 <osmo_str_tolower+0x5c>
{
    1530:       e92d4070        push    {r4, r5, r6, lr}
    1534:       e1a06000        mov     r6, r0
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    1538:       e08f0003        add     r0, pc, r3
    153c:       ebfffffe        bl      0 <__tls_get_addr>
    1540:       e59f4044        ldr     r4, [pc, #68]   ; 158c <osmo_str_tolower+0x60>
    1544:       e3a02080        mov     r2, #128        ; 0x80
        memset(buf, 0, sizeof(buf));
    1548:       e1a05002        mov     r5, r2
        printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf));
    154c:       e0804004        add     r4, r0, r4
    1550:       e59f0038        ldr     r0, [pc, #56]   ; 1590 <osmo_str_tolower+0x64>
    1554:       e1a01004        mov     r1, r4
    1558:       e08f0000        add     r0, pc, r0
    155c:       ebfffffe        bl      0 <printf>
        memset(buf, 0, sizeof(buf));
    1560:       e1a02005        mov     r2, r5
    1564:       e1a00004        mov     r0, r4
    1568:       e3a01000        mov     r1, #0
    156c:       ebfffffe        bl      0 <memset>
        osmo_str_tolower_buf(buf, sizeof(buf), src);
    1570:       e1a02006        mov     r2, r6
    1574:       e1a01005        mov     r1, r5
    1578:       e1a00004        mov     r0, r4
    157c:       ebfffffe        bl      1400 <osmo_str_tolower_buf>
        return buf;
}

#18 Updated by laforge 4 months ago

building libosmocore with gcc-5 doesn't change the behavior; it fails just like with gcc-6.

#19 Updated by laforge 4 months ago

attaching my small "demonstrator" showing that tls works with a self-built shared library, but fails against libosocore.

#20 Updated by laforge 4 months ago

one difference is the the 'struct tls_index' offset when comparing the gcc vs. clang version:

working clang:

trying to use TLS in own library

Breakpoint 2, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831
831     dl-tls.c: No such file or directory.
(gdb) p *ti
$1 = {ti_module = 2, ti_offset = 0}
(gdb) cont
Continuing.
tls_foo: buf=0x76e436d0
bar1='hello shlib'

trying to use TLS in libosmocore

Breakpoint 2, __GI___tls_get_addr (ti=0x23080) at dl-tls.c:831
831     in dl-tls.c
(gdb) p *ti
$2 = {ti_module = 1, ti_offset = 4360}
(gdb) cont
Continuing.
buf=0x76e435d0, sizeof(buf)=128
osmo_str_tolower_buf(0x76e435d0, 128, 0x12a4c)
i=0
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9
bar2='hello osmo'
[Inferior 1 (process 19031) exited normally]

non-working gcc:

trying to use TLS in own library

Breakpoint 1, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831
831     dl-tls.c: No such file or directory.
(gdb) p *ti
$1 = {ti_module = 2, ti_offset = 0}
(gdb) cont
Continuing.
tls_foo: buf=0x76e436d0
bar1='hello shlib'

trying to use TLS in libosmocore

Breakpoint 1, __GI___tls_get_addr (ti=0x23064) at dl-tls.c:831
831     in dl-tls.c
(gdb) p *ti
$2 = {ti_module = 1, ti_offset = 0}
(gdb) cont
Continuing.
buf=0x76e455c8, sizeof(buf)=128

Program received signal SIGSEGV, Segmentation fault.

The binaries used in bothe examples above use static linking of libosmocore/src/.libs/utils.o to exclude any influence from other parts of libosmocore.

#21 Updated by laforge 4 months ago

further differences:

non-working gcc compiled libosmocore:

pi@rpi3-bplus:~/osmo/libosmocore-gcc/src/.libs $ readelf -r ./libosmocore.so | grep TLS
0002e480  00000011 R_ARM_TLS_DTPMOD3

working clang compiled libosmocore:

pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./libosmocore.so | grep TLS
0003334c  00000011 R_ARM_TLS_DTPMOD3
00033354  00000011 R_ARM_TLS_DTPMOD3
0003335c  00000011 R_ARM_TLS_DTPMOD3
00033364  00000011 R_ARM_TLS_DTPMOD3
0003336c  00000011 R_ARM_TLS_DTPMOD3
00033374  00000011 R_ARM_TLS_DTPMOD3
0003337c  00000011 R_ARM_TLS_DTPMOD3
00033384  00000011 R_ARM_TLS_DTPMOD3
0003338c  00000011 R_ARM_TLS_DTPMOD3
00033394  00000011 R_ARM_TLS_DTPMOD3
0003339c  00000011 R_ARM_TLS_DTPMOD3
000333a4  00000011 R_ARM_TLS_DTPMOD3

Those entries each correspond to one 'struct tls_index', and I'm somewhat surprised that the count differs from gcc to clang ?!?

#22 Updated by laforge 4 months ago

specifically, looking at utils.o:

non-working gcc object:

0000009c  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
000000a4  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
0000054c  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
00000550  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
000005ac  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
000005b4  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
00000674  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
0000067c  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
000008e8  00005b6a R_ARM_TLS_LDO32   00001ff8   .LANCHOR2
000008ec  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
000008f0  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
000010f8  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
000010fc  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
00001304  00005a69 R_ARM_TLS_LDM32   00000000   .LANCHOR0
00001308  00005a6a R_ARM_TLS_LDO32   00000000   .LANCHOR0
00001588  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
0000158c  00005b6a R_ARM_TLS_LDO32   00001ff8   .LANCHOR2
000016a0  00005b69 R_ARM_TLS_LDM32   00001ff8   .LANCHOR2
000016a4  00005b6a R_ARM_TLS_LDO32   00001ff8   .LANCHOR2
000003cf  00005c6a R_ARM_TLS_LDO32   00000000   namebuf
000003f2  00005d6a R_ARM_TLS_LDO32   00000100   hexd_buff
00000663  0000606a R_ARM_TLS_LDO32   00001188   buf.7531
00000840  00005f6a R_ARM_TLS_LDO32   00001108   buf.7509
00001597  00005e6a R_ARM_TLS_LDO32   00001100   buf.7331
00001bcc  00005e6a R_ARM_TLS_LDO32   00001100   buf.7331

working clang object:

pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./utils.o | grep TLS
0000008c  0000fb68 R_ARM_TLS_GD32    00000000   namebuf
0000059c  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
000005a0  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
000005a4  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000680  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000684  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000690  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
0000081c  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000820  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000828  0000fa68 R_ARM_TLS_GD32    00000100   hexd_buff
00000ab4  0000fc68 R_ARM_TLS_GD32    00001100   osmo_encode_big_endian
00000ab8  0000fc68 R_ARM_TLS_GD32    00001100   osmo_encode_big_endian
00001374  0000fb68 R_ARM_TLS_GD32    00000000   namebuf
0000158c  0000fb68 R_ARM_TLS_GD32    00000000   namebuf
00001844  0000fe68 R_ARM_TLS_GD32    00001108   osmo_str_tolower.buf
00001a5c  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a60  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a64  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a68  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a6c  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a70  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
00001a74  0000ff68 R_ARM_TLS_GD32    00001188   osmo_str_toupper.buf
0000004a  0000fc6a R_ARM_TLS_LDO32   00001100   osmo_encode_big_endian
000000d5  0000fe6a R_ARM_TLS_LDO32   00001108   osmo_str_tolower.buf
00000122  0000ff6a R_ARM_TLS_LDO32   00001188   osmo_str_toupper.buf
000001c8  0000fb6a R_ARM_TLS_LDO32   00000000   namebuf
000001e6  0000fa6a R_ARM_TLS_LDO32   00000100   hexd_buff

and finally, from a working gcc object compiled with -O0 instead of -O2:

readelf -r ./utils.o | grep TLS
00000138  00000968 R_ARM_TLS_GD32    00000000   namebuf
00000140  00000968 R_ARM_TLS_GD32    00000000   namebuf
00000144  00000968 R_ARM_TLS_GD32    00000000   namebuf
00000a4c  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000ac0  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000ac8  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000bdc  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000be4  00000e68 R_ARM_TLS_GD32    00000100   hexd_buff
00000f80  00001c68 R_ARM_TLS_GD32    00001100   buf.7143
00000f84  00001c68 R_ARM_TLS_GD32    00001100   buf.7143
00002328  00000968 R_ARM_TLS_GD32    00000000   namebuf
00002824  00000968 R_ARM_TLS_GD32    00000000   namebuf
00002c3c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c44  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c48  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c4c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002e04  00003e68 R_ARM_TLS_GD32    00001188   buf.7337
00002e08  00003e68 R_ARM_TLS_GD32    00001188   buf.7337
000003cf  0000096a R_ARM_TLS_LDO32   00000000   namebuf
000003f2  00000e6a R_ARM_TLS_LDO32   00000100   hexd_buff
000005dc  00003e6a R_ARM_TLS_LDO32   00001188   buf.7337
000006d6  0000386a R_ARM_TLS_LDO32   00001108   buf.7318
00000ff5  00001c6a R_ARM_TLS_LDO32   00001100   buf.7143

We can judge by the size that only the lines with 1108 as size are relevant for osmo_str_tolower()
which gives us the following summary:

working gcc-O0:

00002c3c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c44  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c48  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c4c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
00002c4c  00003868 R_ARM_TLS_GD32    00001108   buf.7318
000006d6  0000386a R_ARM_TLS_LDO32   00001108   buf.7318

non-working gcc-O2:

00000840  00005f6a R_ARM_TLS_LDO32   00001108   buf.7509

So somehow, in the non-working case there is no R_ARM_TLS_GD32 entry at all. I don't know what it means but it's for sure a noticable difference.

#23 Updated by laforge 4 months ago

According to https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf

R_ARM_TLS_GD32 causes two adjacent entries to be added to the dynamically relocated section (the Global
Offset Table, or GOT). The first of these is dynamically relocated by R_ARM_TLS_DTPMOD32, the second by
R_ARM_TLS_DTPOFF32. The place resolves to the offset of the first of the GOT entries from the place.

this sounds a lot like what's required with tls_index (ti_module + ti_offset).

R_ARM_TLS_LDO32 resolves to the offset of the referenced data object (which must be local to the module) fromthe origin of the TLS block for the current module.

#24 Updated by laforge 4 months ago

gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516 also doesn't generate the GD32 relocations when "-O1" is enabled. So -O0 is the only option to generate working code with gcc.

This is 2019, TLS existed since late 2002...

#25 Updated by laforge 4 months ago

  • Status changed from Stalled to In Progress
  • % Done changed from 30 to 50

Ok, so the relocations might have been a dead end. I now have a relatively small, self-contained reproducer (see attached tls-reproducer2.tar.bz2). It has no external dependencies, not even libosmocore or talloc.

If you compile this with -O2 on raspbian9 (likely also other 32bit little-endian ARM architectures), it will segfault.

  • if you change to -O0, the resulting binary will work.
  • if you shrink hexd_buff from 4096 to 4092 bytes or lower, the resulting binary will work
  • if you build with clang, the resulting binary will work.
  • if you build without -fPIC, the resulting binary will work.

WTF is going on here?

#27 Updated by laforge 4 months ago

  • Assignee changed from laforge to pespin

assigning to pespin, please report upstream to gcc and/or analyze further.

#28 Updated by laforge 4 months ago

might still be relocation related:

non-working gcc-O2 binary:

pi@rpi3-bplus:~/tls $ objdump -x utils.o  | grep buf
00000000 l       .tbss  00001000 hexd_buff
00001000 l       .tbss  00000080 buf.6264
00000337 R_ARM_TLS_LDO32   hexd_buff
00000397 R_ARM_TLS_LDO32   buf.6264

working gcc-O0 binary:

pi@rpi3-bplus:~/tls $ objdump -x utils.o  | grep buf
00000000 l       .tbss  00001000 hexd_buff
00001000 l       .tbss  00000080 buf.6111
0000027c R_ARM_TLS_GD32    hexd_buff
00000284 R_ARM_TLS_GD32    hexd_buff
00000558 R_ARM_TLS_GD32    buf.6111
00000560 R_ARM_TLS_GD32    buf.6111
00000564 R_ARM_TLS_GD32    buf.6111
00000568 R_ARM_TLS_GD32    buf.6111

working clang binary:

pi@rpi3-bplus:~/tls $ objdump -x utils.o  | grep buf
00000000 l       .tbss  00001000 hexd_buff
00001000 l       .tbss  00000080 osmo_str_tolower.buf
00000200 R_ARM_TLS_GD32    hexd_buff
00000204 R_ARM_TLS_GD32    hexd_buff
00000210 R_ARM_TLS_GD32    hexd_buff
00000428 R_ARM_TLS_GD32    osmo_str_tolower.buf
00000048 R_ARM_TLS_LDO32   osmo_str_tolower.buf
00000084 R_ARM_TLS_LDO32   hexd_buff

#29 Updated by laforge 4 months ago

in terms of potential ugly interim workarounds, we could think about shrinking the size of our large __thread buffers. There's only two: the 4096 byte hexdump buffer in utils.c, and the 4100 byte buffer that's part of msgb_hexdump() in msgb.c. The point is without understanding the details of this bug: How can we be sure we shrink them sufficiently?

#30 Updated by pespin 4 months ago

  • Assignee changed from pespin to laforge
  • % Done changed from 50 to 30

I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.

#31 Updated by laforge 4 months ago

Hi Pau,

On Wed, Jul 31, 2019 at 10:44:56PM +0000, pespin [REDMINE] wrote:

Assignee changed from pespin to laforge
% Done changed from 50 to 30

were those two intentional changes?

I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.

I'm not sure that 4092 will do it for msgb, this still needs to be validated. I played a bit
around and it seems there is a "global" maximum, most likely per source code file.

Regards,
Harald

#32 Updated by pespin 4 months ago

  • Assignee changed from laforge to pespin
  • % Done changed from 30 to 50

Hi Harald, no they were no intentional, I have no idea how those ended up being changed.

#33 Updated by pespin 4 months ago

Checking at osmo-bsc runtime crash from same issue (osmo_str_to_lower()), here's some conclusions (and tests below):

I couldn't find the exact culprit of the issue, but some hints on requirements for the issue to happen:
  • It seems the file must have more than 1 TLS variable for bug to appear (so compiler applies some optimizations in that case, which are not worth if there's less than 2 variables).
  • TLS variable being inside or outside of function doesn't seem to make any difference to me (other than maybe ordering in memory, which I don't know).
  • Behavior seems to be OK as long as all variables fit inside the first mem page (4096 bytes).
  • Problems start when sum of TLS variables starts getting bigger than 1st mem page aprox, so to me it looks like issues with mem page boundaries.
  • It seems order of variables matter. For instance having only "hexd_buff=1 && buf=4094" is fine, but having "hexd_buff=4094 && buf=1" crashes.
  • Bug doesn't appear starting from a single threshold size. For instance "hexd_buff=4094 && buf=9000" is fine, but "hexd_buff=4094 && buf=1" crashes. So to me it's either memory page bounded related or there's some range of sizes affected somehow.
Let's start with changing only value for hexd_buf:
4096 crash
4092 crash
3900 crash
3850 crash
3830 crash
3829 crash  <--- first value crashing, used next to see if it still can work by droppng size of other buffers (to verify relations).
----------
3828 works
3824 works
3820 works
3810 works
3800 works
3000 works
2000 works

Let's remove the uint64_t __thread variable (removing __thread attr), that's 8 bytes less:
no-uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3836 + buf=128 + buf=128 + namebuf=255 works
no-uint64_t + hexd_buff=3837 + buf=128 + buf=128 + namebuf=255 crash <-- so in this case it's really the 8 bytes we removed, they can be passed from one var to another (so they are related). relation 1-1 bytes.

uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 crash
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=254 crash
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=253 crash
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=252 works  <-- so here it theoretically should be 1-3 bytes relation with hexd_buff and namebuf.
uint64_t + hexd_buff=3830 + buf=128 + buf=128 + namebuf=252 works  <-- let's increase the big buffer by 1, to see if relation is kept
uint64_t + hexd_buff=3831 + buf=128 + buf=128 + namebuf=252 works  <-- let's increase the big buffer by 1, to see if relation is kept.
uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=252 works  <-- let's increase the big buffer by 1, to see if relation is kept. So it seems the initial 1-3 relation might be related to padding between 2 buffers (to be 4-byte/word aligned?).
uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=252 crash  <-- Let's increase by 1 again, we break the 4-byte alignemnt so total space increases and we cross the file size "threshlold" line  again.
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=250 works  <-- here we continue dropping several buffers to check everything works fine. 
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=240 works
uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=100 works

uint64_t + hexd_buff=3833 + buf=32 + buf=128 + namebuf=252 ?  <-- Crashing situation, but checking if 1nd buf affects it: it doesn't
uint64_t + hexd_buff=3833 + buf=128 + buf=32 + namebuf=252 crash  <-- Crashing situation, but checking if 2nd buf affects it: it doesn't
uint64_t + hexd_buff=3833 + buf=32 + buf=32 + namebuf=252 crash  <-- Crashing situation, but checking if both buf affects it: they don't
uint64_t + hexd_buff=3833 + buf=1 + buf=1 + namebuf=252 crash  <-- Crashing situation, but checking if both buf affects it: they don't
uint64_t + hexd_buff=3833 + no-buf + no-buf + namebuf=252 works  <-- Crashing situation, but cremoving __thread from both buf affects it: it does!!!

Let's try moving stuff around:
no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=255 crash  <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 255(+1) + 4 + 4 = 4101?)
no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=252 works  <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 252 + 4 + 4 = 4097?)
no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=254 crash  <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 254 + 4 + 4 = 4099?)
no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=254 works  <-- Crashing situation from above, modified (sum with 4-byte padding: (3836 + 254 + 4 + 4 = 4098?)

Let's try moving stuff around (to be 4-bytes without padding):
no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=256 works  <-- (sum with 4-byte padding: (3836 + 256 + 2 + 2 = 4096)
no-uint64_t + hexd_buff=3840 + buf=1 + buf=1 + namebuf=252 works  <-- Let's move 4 bytes from namebuf to hexd_buff (sum with 4-byte padding: (3840 + 252 + 2 + 2 = 4096)
no-uint64_t + hexd_buff=3844 + buf=1 + buf=1 + namebuf=252 crash  <-- Let's add 4 bytes to hexd_buff (sum with 4-byte padding: (3844 + 252 + 2 + 2 = 4100)
no-uint64_t + hexd_buff=4088 + buf=1 + buf=1 + namebuf=4 works  <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4088 +2 +2 + 4 = 4096)
no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=2 crash  <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4090 + 2 +2 + 2 = 4096).
no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=1 crash  <-- Let's go back to working situation, but moving most bytes to hexd_buff, seems still padding issues?.
no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + no-namebuf works  <-- works, so probably namebuf had padding to 4 bytes? 
no-uint64_t + hexd_buff=4094 + buf=1 + buf=1 + no-namebuf crash  <-- Let's increase for bytes to be higher than 4096, then we see it crashes and namebug had 4-byte padding.

no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash  <-- Let's remove the buf we can get rid of to simplify. We are under 4096 but still crashes (maybe it takes 4-byte boundaries?).
no-uint64_t + no-hexd_buff + no-buff + buf=4096 + no-namebuf works  <-- Getting rid of hexd_buf.... works!
no-uint64_t + no-hexd_buff + no-buff + buf=9000 + no-namebuf works  <-- Let's go higher than 4096... works! wtf
no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works  <-- Going back to have 2 variables, but most of the size is on the second one. Moving buf out of function still makes it work.
no-uint64_t + hexd_buff=4000 + no-buf + buf=9000 + no-namebuf works  <-- Adding 4000 to hexd_buf, still works.
no-uint64_t + hexd_buff=6000 + no-buf + buf=9000 + no-namebuf works <-- Adding 2000 to hexd_buf, still works!!! (moving buf inside the variable too, so being inside or outside doesn't seem to help there).
no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash <-- Going back to known to crash values. It crashes. So it seems there's no simple threshold at which point it crashes.
no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crah
no-uint64_t + hexd_buff=6000 + no-buf + buf=2100 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable still ends up in 2nd page).
no-uint64_t + hexd_buff=6000 + no-buf + buf=6296 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable now ends up in 4th page). Crash happens in different place! exit()->__call_tls_dtors().
no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=6296).
no-uint64_t + hexd_buff=4 + no-buf + buf=4092 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=4098).
no-uint64_t + hexd_buff=1 + no-buf + buf=4094 + no-namebuf works <-- So it works, contrary to the hexd_buff=4094 + buf=1, so order matters
no-uint64_t + hexd_buff=4094 + no-buf + buf=9000 + no-namebuf works
no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crahes 

#34 Updated by pespin 4 months ago

Submitted a couple related workaround patches to have it working for now:

remote: https://gerrit.osmocom.org/c/libosmocore/+/15024 utils: share static buffer in osmo_str_to{lower,upper}()
remote: https://gerrit.osmocom.org/c/libosmocore/+/15025 Workaround TLS gcc compiler bug in ARM

I tested they work fine (osmo-bsc doesn't crash upon startup like it used to, after applying the patch).

I will document better the 2nd one if needed. I still need to report the issue upstream (not sure if it really make sense other than for inforamtional purposes since it seems to be fixed on newer toolchains).

#36 Updated by laforge 4 months ago

"gcc-6 (Raspbian 6.5.0-1+rpi1+b1) 6.5.0 20181026" can reproduce the problem, while "gcc (Raspbian 8.3.0-6+rpi1) 8.3.0" cannot.

#37 Updated by laforge 4 months ago

"gcc-7 (Raspbian 7.3.0-19) 7.3.0" also cannot reproduce the problem. So it seems anything < 7.x for sure has the bug.

#38 Updated by laforge 4 months ago

"-mtls-dialect=gnu2" seems to be a viable way to avoid the segfault, as indicated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81142#c12

I just built libosmocore using gcc-6.5 on raspbian 10 using "-O2 -g -mtls-dialect=gnu2" and linked main programs compiled with normal "-O2 -g" and the resulting programs worked as expected.

#39 Updated by laforge 4 months ago

So now the challenge is to find or write autoconf macros which detect the combination of 32bit arm and the compiler version < 7.1.x, and then tune the CFLAGS accordingly if that is the case.

For the gcc version, there's a suggestion in https://stackoverflow.com/questions/7067385/find-the-gcc-version.

#40 Updated by Hoernchen 4 months ago

The gnu2 aka TLSDESC model has been marked experimental for ~10 years in https://static.docs.arm.com/ihi0044/g/aaelf32.pdf and clang/lld does not support it except for aarch64 where it's the default - so being a rather arcane non-standard option it's hard to tell how well it works, but I guess we will find that out the same way we found this bug, by just using it.

#41 Updated by pespin 4 months ago

Patch to automatically workaround this bug (ARM32 and gcc < 7.3.0):
https://gerrit.osmocom.org/c/libosmocore/+/15037 configure: Autodetect TLS bug on ARM with old gcc and apply workaround

BTW, it seems sysmobts SDK's ld doesn't like the new CFLAG, it segfaults.

#42 Updated by pespin 4 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 50 to 90

toolchain we use for sysmobts and oc2g in OE crashes when using the -mtls-dialect=gnu2 CFLAG. However, the toolchain seems to produce correct binaries which don't crash at runtime.
So I added a new commit to libosmocore which allows disabling the automatic workaround we added:
https://gerrit.osmocom.org/c/libosmocore/+/15061 configure: Allow disabling workaround for TLS bug in old ARM gcc versions

I already submitted a patch against meta-telephony laforge/nightly to use it, and also another patch for osmo-gsm-tester to fix the build scripts using SDKs from that toolchain:
https://gerrit.osmocom.org/c/osmo-gsm-tester/+/15067 contrib: Avoid sysmobts/oc2g toolchain ld crashes building new libosmocore

Once those are merged and we cross-check that everything's fine, we can close the ticket.

#43 Updated by pespin 4 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

Everything's merged and included in release from this week, closing this ticket.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)