Bug #4062
vty tests fails on arm (raspberry pi)
100%
Description
The debian packages for raspbian fails on OBS because of a vty test failure.
Attached the debian source package.
https://build.opensuse.org/package/show/network:osmocom:nightly/libosmocore
Related issues
History
#1 Updated by laforge over 1 year ago
- Assignee set to laforge
according to the logs, there's actually a segfault during vty test execution, which is quite troubling.
#2 Updated by laforge over 1 year ago
- Status changed from New to In Progress
- % Done changed from 0 to 30
I can reproduce the problem on a Raspi 3 with raspbian: vty_test segfaults. gdb doesn't show a backtrace but indicates the crash is in 'memcmp'
When compiling with "-g -O0", the test passes. Will try to debug further.
#3 Updated by laforge over 1 year ago
It's not the optimization, but it's the "-g" which makes the problem vanish. weird.
strace shows:
mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f3f000 set_tls(0x76f3f4c0, 0x76f41e68, 0x76f4e050, 0x76f3f4c0, 0x76f4e050) = 0 mprotect(0x76e84000, 8192, PROT_READ) = 0 mprotect(0x76d49000, 4096, PROT_READ) = 0 mprotect(0x76ed6000, 4096, PROT_READ) = 0 mprotect(0x76f05000, 4096, PROT_READ) = 0 mprotect(0x76eb3000, 4096, PROT_READ) = 0 mprotect(0x76f07000, 20480, PROT_READ|PROT_WRITE) = 0 mprotect(0x76f07000, 20480, PROT_READ|PROT_EXEC) = 0 cacheflush(0x76f07000, 0x76f0c000, 0, 0x15, 0) = 0 mprotect(0x76f1b000, 4096, PROT_READ) = 0 mprotect(0x23000, 4096, PROT_READ) = 0 mprotect(0x76f4d000, 4096, PROT_READ) = 0 munmap(0x76f43000, 30037) = 0 brk(NULL) = 0x1f9f000 brk(0x1fc0000) = 0x1fc0000 getcwd("/home/pi/osmo/libosmocore/tests", 4096) = 32 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x76f43618} --- +++ killed by SIGSEGV +++ Segmentation fault
#4 Updated by laforge over 1 year ago
- File valgrind-vty_test.txt valgrind-vty_test.txt added
valgrind also makes the program work, but shows an impressive list of problems during startup of the program (See attachment)
#5 Updated by laforge over 1 year ago
stepping through the program in gdb points at gen_logging_level_cmd_strs(), and within that add_category_strings() and
osmo_str_tolower (src=0x76fa508d "LGLOBAL") at utils.c:912 912 osmo_str_tolower_buf(buf, sizeof(buf), src); (gdb) 910 { (gdb) 912 osmo_str_tolower_buf(buf, sizeof(buf), src); (gdb) osmo_str_tolower_buf (dest=0x76de9618 "-linux-armhf.so.3", dest_len=128, src=0x76fa508d "LGLOBAL") at utils.c:886
for sure we don't want ot write to somewhere that holds "-linux-armhf.so.3" ?!?
#7 Updated by laforge over 1 year ago
git bisect states 171ef826e1489031bc48745f29fa2d4657bf165f is the culprit. This is what introduces thread-local storage to libosmocore static buffers. But why would those be different on raspbian?
#8 Updated by laforge over 1 year ago
so the test executes a chain of functions ending with osmo_str_tolower_buf(), which wants to use a thread-local static buffer for string lower case conversion and that fails. but why?
#9 Updated by laforge over 1 year ago
It's a mystery to me, why the __thread
annotation for thread-local storage would fail to work on raspbian, but work with Debian on the same hardware, and also work with any other of the distributions/versions that we're working with.
It's unlikely that we're the first program on the planet wanting to use thread-local storage, either?
Also fascinating, when trying to use gdb:
(gdb) p buf Cannot find thread-local storage for process 25857, shared library /home/pi/osmo/libosmocore/src/.libs/libosmocore.so.12: Cannot find thread-local variables on this target
#10 Updated by laforge over 1 year ago
- Status changed from In Progress to Stalled
I'm giving up on this. It's ridiculous that something as basic as __thread is failing on something considered a "stable" distribution.
#11 Updated by laforge over 1 year ago
- the bug is gone when building with clang instead of gcc
- vty_test passes
- valgrind doesn't complain at all anymore
#12 Updated by laforge over 1 year ago
getting back to this with some more information:
- manually trying to reproduce this with a small binary and a small shared library outside of libosmocore failed for some reason
- libosmocore contains the expected .tbss ELF section with "ALLOC, THREAD_LOCAL" flags
#13 Updated by laforge over 1 year ago
for the clang-built libosmocore:
pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ objdump -h ./libosmocore.so.12 ./libosmocore.so.12: file format elf32-littlearm Sections: Idx Name Size VMA LMA File off Algn ... 16 .tbss 000022c8 00032b40 00032b40 00022b40 2**3 ALLOC, THREAD_LOCAL <pre> for the gcc-built one: <pre> 15 .tbss 000022c8 0002db74 0002db74 0001db74 2**2 ALLOC, THREAD_LOCAL </pre>
#14 Updated by laforge over 1 year ago
#15 Updated by laforge over 1 year ago
#16 Updated by laforge over 1 year ago
[earlier comments about which combination worked or didn't work were wrong].
The following steps result in a working binary:- building the main program with gcc against a gcc-built libosmocore
- later swapping libosmocore.so.12 against a clang-built one
#17 Updated by laforge over 1 year ago
Adding the "objdump -d -S" output for the relevant function for both clang and gcc:
clang:
const char *osmo_str_tolower(const char *src) { 17ec: e92d4830 push {r4, r5, fp, lr} 17f0: e28db008 add fp, sp, #8 17f4: e1a04000 mov r4, r0 static __thread char buf[128]; printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 17f8: e59f0044 ldr r0, [pc, #68] ; 1844 <osmo_str_tolower+0x58> 17fc: e08f0000 add r0, pc, r0 1800: ebfffffe bl 0 <__tls_get_addr> 1804: e1a05000 mov r5, r0 1808: e59f0038 ldr r0, [pc, #56] ; 1848 <osmo_str_tolower+0x5c> 180c: e1a01005 mov r1, r5 1810: e3a02080 mov r2, #128 ; 0x80 1814: e08f0000 add r0, pc, r0 1818: ebfffffe bl 0 <printf> memset(buf, 0, sizeof(buf)); 181c: e1a00005 mov r0, r5 1820: e3a01000 mov r1, #0 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 1824: e3a02080 mov r2, #128 ; 0x80 memset(buf, 0, sizeof(buf)); 1828: ebfffffe bl 0 <memset> osmo_str_tolower_buf(buf, sizeof(buf), src); 182c: e1a00005 mov r0, r5 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 1830: e3a01080 mov r1, #128 ; 0x80 osmo_str_tolower_buf(buf, sizeof(buf), src); 1834: e1a02004 mov r2, r4 1838: ebfffffe bl 1698 <osmo_str_tolower_buf> return buf; 183c: e1a00005 mov r0, r5 1840: e8bd8830 pop {r4, r5, fp, pc} 1844: 00000040 .word 0x00000040 1848: 0000002c .word 0x0000002c
gcc:
const char *osmo_str_tolower(const char *src) { static __thread char buf[128]; printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 152c: e59f3054 ldr r3, [pc, #84] ; 1588 <osmo_str_tolower+0x5c> { 1530: e92d4070 push {r4, r5, r6, lr} 1534: e1a06000 mov r6, r0 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 1538: e08f0003 add r0, pc, r3 153c: ebfffffe bl 0 <__tls_get_addr> 1540: e59f4044 ldr r4, [pc, #68] ; 158c <osmo_str_tolower+0x60> 1544: e3a02080 mov r2, #128 ; 0x80 memset(buf, 0, sizeof(buf)); 1548: e1a05002 mov r5, r2 printf("buf=%p, sizeof(buf)=%u\n", buf, sizeof(buf)); 154c: e0804004 add r4, r0, r4 1550: e59f0038 ldr r0, [pc, #56] ; 1590 <osmo_str_tolower+0x64> 1554: e1a01004 mov r1, r4 1558: e08f0000 add r0, pc, r0 155c: ebfffffe bl 0 <printf> memset(buf, 0, sizeof(buf)); 1560: e1a02005 mov r2, r5 1564: e1a00004 mov r0, r4 1568: e3a01000 mov r1, #0 156c: ebfffffe bl 0 <memset> osmo_str_tolower_buf(buf, sizeof(buf), src); 1570: e1a02006 mov r2, r6 1574: e1a01005 mov r1, r5 1578: e1a00004 mov r0, r4 157c: ebfffffe bl 1400 <osmo_str_tolower_buf> return buf; }
#18 Updated by laforge over 1 year ago
building libosmocore with gcc-5 doesn't change the behavior; it fails just like with gcc-6.
#19 Updated by laforge over 1 year ago
- File tls-demonstrator.tar.bz2 tls-demonstrator.tar.bz2 added
attaching my small "demonstrator" showing that tls works with a self-built shared library, but fails against libosocore.
#20 Updated by laforge over 1 year ago
one difference is the the 'struct tls_index' offset when comparing the gcc vs. clang version:
working clang:
trying to use TLS in own library Breakpoint 2, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831 831 dl-tls.c: No such file or directory. (gdb) p *ti $1 = {ti_module = 2, ti_offset = 0} (gdb) cont Continuing. tls_foo: buf=0x76e436d0 bar1='hello shlib' trying to use TLS in libosmocore Breakpoint 2, __GI___tls_get_addr (ti=0x23080) at dl-tls.c:831 831 in dl-tls.c (gdb) p *ti $2 = {ti_module = 1, ti_offset = 4360} (gdb) cont Continuing. buf=0x76e435d0, sizeof(buf)=128 osmo_str_tolower_buf(0x76e435d0, 128, 0x12a4c) i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 bar2='hello osmo' [Inferior 1 (process 19031) exited normally]
non-working gcc:
trying to use TLS in own library Breakpoint 1, __GI___tls_get_addr (ti=0x76fb7024) at dl-tls.c:831 831 dl-tls.c: No such file or directory. (gdb) p *ti $1 = {ti_module = 2, ti_offset = 0} (gdb) cont Continuing. tls_foo: buf=0x76e436d0 bar1='hello shlib' trying to use TLS in libosmocore Breakpoint 1, __GI___tls_get_addr (ti=0x23064) at dl-tls.c:831 831 in dl-tls.c (gdb) p *ti $2 = {ti_module = 1, ti_offset = 0} (gdb) cont Continuing. buf=0x76e455c8, sizeof(buf)=128 Program received signal SIGSEGV, Segmentation fault.
The binaries used in bothe examples above use static linking of libosmocore/src/.libs/utils.o to exclude any influence from other parts of libosmocore.
#21 Updated by laforge over 1 year ago
further differences:
non-working gcc compiled libosmocore:
pi@rpi3-bplus:~/osmo/libosmocore-gcc/src/.libs $ readelf -r ./libosmocore.so | grep TLS 0002e480 00000011 R_ARM_TLS_DTPMOD3
working clang compiled libosmocore:
pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./libosmocore.so | grep TLS 0003334c 00000011 R_ARM_TLS_DTPMOD3 00033354 00000011 R_ARM_TLS_DTPMOD3 0003335c 00000011 R_ARM_TLS_DTPMOD3 00033364 00000011 R_ARM_TLS_DTPMOD3 0003336c 00000011 R_ARM_TLS_DTPMOD3 00033374 00000011 R_ARM_TLS_DTPMOD3 0003337c 00000011 R_ARM_TLS_DTPMOD3 00033384 00000011 R_ARM_TLS_DTPMOD3 0003338c 00000011 R_ARM_TLS_DTPMOD3 00033394 00000011 R_ARM_TLS_DTPMOD3 0003339c 00000011 R_ARM_TLS_DTPMOD3 000333a4 00000011 R_ARM_TLS_DTPMOD3
Those entries each correspond to one 'struct tls_index', and I'm somewhat surprised that the count differs from gcc to clang ?!?
#22 Updated by laforge over 1 year ago
specifically, looking at utils.o:
non-working gcc object:
0000009c 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 000000a4 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 0000054c 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 00000550 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 000005ac 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 000005b4 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 00000674 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 0000067c 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 000008e8 00005b6a R_ARM_TLS_LDO32 00001ff8 .LANCHOR2 000008ec 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 000008f0 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 000010f8 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 000010fc 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 00001304 00005a69 R_ARM_TLS_LDM32 00000000 .LANCHOR0 00001308 00005a6a R_ARM_TLS_LDO32 00000000 .LANCHOR0 00001588 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 0000158c 00005b6a R_ARM_TLS_LDO32 00001ff8 .LANCHOR2 000016a0 00005b69 R_ARM_TLS_LDM32 00001ff8 .LANCHOR2 000016a4 00005b6a R_ARM_TLS_LDO32 00001ff8 .LANCHOR2 000003cf 00005c6a R_ARM_TLS_LDO32 00000000 namebuf 000003f2 00005d6a R_ARM_TLS_LDO32 00000100 hexd_buff 00000663 0000606a R_ARM_TLS_LDO32 00001188 buf.7531 00000840 00005f6a R_ARM_TLS_LDO32 00001108 buf.7509 00001597 00005e6a R_ARM_TLS_LDO32 00001100 buf.7331 00001bcc 00005e6a R_ARM_TLS_LDO32 00001100 buf.7331
working clang object:
pi@rpi3-bplus:~/osmo/libosmocore-clang/src/.libs $ readelf -r ./utils.o | grep TLS 0000008c 0000fb68 R_ARM_TLS_GD32 00000000 namebuf 0000059c 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 000005a0 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 000005a4 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000680 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000684 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000690 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 0000081c 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000820 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000828 0000fa68 R_ARM_TLS_GD32 00000100 hexd_buff 00000ab4 0000fc68 R_ARM_TLS_GD32 00001100 osmo_encode_big_endian 00000ab8 0000fc68 R_ARM_TLS_GD32 00001100 osmo_encode_big_endian 00001374 0000fb68 R_ARM_TLS_GD32 00000000 namebuf 0000158c 0000fb68 R_ARM_TLS_GD32 00000000 namebuf 00001844 0000fe68 R_ARM_TLS_GD32 00001108 osmo_str_tolower.buf 00001a5c 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a60 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a64 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a68 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a6c 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a70 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 00001a74 0000ff68 R_ARM_TLS_GD32 00001188 osmo_str_toupper.buf 0000004a 0000fc6a R_ARM_TLS_LDO32 00001100 osmo_encode_big_endian 000000d5 0000fe6a R_ARM_TLS_LDO32 00001108 osmo_str_tolower.buf 00000122 0000ff6a R_ARM_TLS_LDO32 00001188 osmo_str_toupper.buf 000001c8 0000fb6a R_ARM_TLS_LDO32 00000000 namebuf 000001e6 0000fa6a R_ARM_TLS_LDO32 00000100 hexd_buff
and finally, from a working gcc object compiled with -O0
instead of -O2
:
readelf -r ./utils.o | grep TLS 00000138 00000968 R_ARM_TLS_GD32 00000000 namebuf 00000140 00000968 R_ARM_TLS_GD32 00000000 namebuf 00000144 00000968 R_ARM_TLS_GD32 00000000 namebuf 00000a4c 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000ac0 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000ac8 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000bdc 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000be4 00000e68 R_ARM_TLS_GD32 00000100 hexd_buff 00000f80 00001c68 R_ARM_TLS_GD32 00001100 buf.7143 00000f84 00001c68 R_ARM_TLS_GD32 00001100 buf.7143 00002328 00000968 R_ARM_TLS_GD32 00000000 namebuf 00002824 00000968 R_ARM_TLS_GD32 00000000 namebuf 00002c3c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c44 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c48 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c4c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002e04 00003e68 R_ARM_TLS_GD32 00001188 buf.7337 00002e08 00003e68 R_ARM_TLS_GD32 00001188 buf.7337 000003cf 0000096a R_ARM_TLS_LDO32 00000000 namebuf 000003f2 00000e6a R_ARM_TLS_LDO32 00000100 hexd_buff 000005dc 00003e6a R_ARM_TLS_LDO32 00001188 buf.7337 000006d6 0000386a R_ARM_TLS_LDO32 00001108 buf.7318 00000ff5 00001c6a R_ARM_TLS_LDO32 00001100 buf.7143
We can judge by the size that only the lines with 1108 as size are relevant for osmo_str_tolower()
which gives us the following summary:
working gcc-O0:
00002c3c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c44 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c48 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c4c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 00002c4c 00003868 R_ARM_TLS_GD32 00001108 buf.7318 000006d6 0000386a R_ARM_TLS_LDO32 00001108 buf.7318
non-working gcc-O2:
00000840 00005f6a R_ARM_TLS_LDO32 00001108 buf.7509
So somehow, in the non-working case there is no R_ARM_TLS_GD32 entry at all. I don't know what it means but it's for sure a noticable difference.
#23 Updated by laforge over 1 year ago
According to https://static.docs.arm.com/ihi0044/e/IHI0044E_aaelf.pdf
R_ARM_TLS_GD32 causes two adjacent entries to be added to the dynamically relocated section (the Global
Offset Table, or GOT). The first of these is dynamically relocated by R_ARM_TLS_DTPMOD32, the second by
R_ARM_TLS_DTPOFF32. The place resolves to the offset of the first of the GOT entries from the place.
this sounds a lot like what's required with tls_index (ti_module + ti_offset).
R_ARM_TLS_LDO32 resolves to the offset of the referenced data object (which must be local to the module) fromthe origin of the TLS block for the current module.
#24 Updated by laforge over 1 year ago
gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516 also doesn't generate the GD32 relocations when "-O1" is enabled. So -O0 is the only option to generate working code with gcc.
This is 2019, TLS existed since late 2002...
#25 Updated by laforge over 1 year ago
- Status changed from Stalled to In Progress
- % Done changed from 30 to 50
Ok, so the relocations might have been a dead end. I now have a relatively small, self-contained reproducer (see attached tls-reproducer2.tar.bz2). It has no external dependencies, not even libosmocore or talloc.
If you compile this with -O2 on raspbian9 (likely also other 32bit little-endian ARM architectures), it will segfault.
- if you change to -O0, the resulting binary will work.
- if you shrink hexd_buff from 4096 to 4092 bytes or lower, the resulting binary will work
- if you build with clang, the resulting binary will work.
- if you build without -fPIC, the resulting binary will work.
WTF is going on here?
#26 Updated by laforge over 1 year ago
- File tls-reproducer2.tar.bz2 tls-reproducer2.tar.bz2 added
#27 Updated by laforge over 1 year ago
- Assignee changed from laforge to pespin
assigning to pespin, please report upstream to gcc and/or analyze further.
#28 Updated by laforge over 1 year ago
might still be relocation related:
non-working gcc-O2 binary:
pi@rpi3-bplus:~/tls $ objdump -x utils.o | grep buf 00000000 l .tbss 00001000 hexd_buff 00001000 l .tbss 00000080 buf.6264 00000337 R_ARM_TLS_LDO32 hexd_buff 00000397 R_ARM_TLS_LDO32 buf.6264
working gcc-O0 binary:
pi@rpi3-bplus:~/tls $ objdump -x utils.o | grep buf 00000000 l .tbss 00001000 hexd_buff 00001000 l .tbss 00000080 buf.6111 0000027c R_ARM_TLS_GD32 hexd_buff 00000284 R_ARM_TLS_GD32 hexd_buff 00000558 R_ARM_TLS_GD32 buf.6111 00000560 R_ARM_TLS_GD32 buf.6111 00000564 R_ARM_TLS_GD32 buf.6111 00000568 R_ARM_TLS_GD32 buf.6111
working clang binary:
pi@rpi3-bplus:~/tls $ objdump -x utils.o | grep buf 00000000 l .tbss 00001000 hexd_buff 00001000 l .tbss 00000080 osmo_str_tolower.buf 00000200 R_ARM_TLS_GD32 hexd_buff 00000204 R_ARM_TLS_GD32 hexd_buff 00000210 R_ARM_TLS_GD32 hexd_buff 00000428 R_ARM_TLS_GD32 osmo_str_tolower.buf 00000048 R_ARM_TLS_LDO32 osmo_str_tolower.buf 00000084 R_ARM_TLS_LDO32 hexd_buff
#29 Updated by laforge over 1 year ago
in terms of potential ugly interim workarounds, we could think about shrinking the size of our large __thread buffers. There's only two: the 4096 byte hexdump buffer in utils.c, and the 4100 byte buffer that's part of msgb_hexdump() in msgb.c. The point is without understanding the details of this bug: How can we be sure we shrink them sufficiently?
#30 Updated by pespin over 1 year ago
- Assignee changed from pespin to laforge
- % Done changed from 50 to 30
I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.
#31 Updated by laforge over 1 year ago
Hi Pau,
On Wed, Jul 31, 2019 at 10:44:56PM +0000, pespin [REDMINE] wrote:
Assignee changed from pespin to laforge
% Done changed from 50 to 30
were those two intentional changes?
I think it makes sense for now to change both buffers to 4092 for now as a quick workaround, and later on find a better solution once we report the bug and receive more feedback from gcc guys.
I'm not sure that 4092 will do it for msgb, this still needs to be validated. I played a bit
around and it seems there is a "global" maximum, most likely per source code file.
Regards,
Harald
#32 Updated by pespin over 1 year ago
- Assignee changed from laforge to pespin
- % Done changed from 30 to 50
Hi Harald, no they were no intentional, I have no idea how those ended up being changed.
#33 Updated by pespin over 1 year ago
Checking at osmo-bsc runtime crash from same issue (osmo_str_to_lower()), here's some conclusions (and tests below):
I couldn't find the exact culprit of the issue, but some hints on requirements for the issue to happen:- It seems the file must have more than 1 TLS variable for bug to appear (so compiler applies some optimizations in that case, which are not worth if there's less than 2 variables).
- TLS variable being inside or outside of function doesn't seem to make any difference to me (other than maybe ordering in memory, which I don't know).
- Behavior seems to be OK as long as all variables fit inside the first mem page (4096 bytes).
- Problems start when sum of TLS variables starts getting bigger than 1st mem page aprox, so to me it looks like issues with mem page boundaries.
- It seems order of variables matter. For instance having only "hexd_buff=1 && buf=4094" is fine, but having "hexd_buff=4094 && buf=1" crashes.
- Bug doesn't appear starting from a single threshold size. For instance "hexd_buff=4094 && buf=9000" is fine, but "hexd_buff=4094 && buf=1" crashes. So to me it's either memory page bounded related or there's some range of sizes affected somehow.
Let's start with changing only value for hexd_buf: 4096 crash 4092 crash 3900 crash 3850 crash 3830 crash 3829 crash <--- first value crashing, used next to see if it still can work by droppng size of other buffers (to verify relations). ---------- 3828 works 3824 works 3820 works 3810 works 3800 works 3000 works 2000 works Let's remove the uint64_t __thread variable (removing __thread attr), that's 8 bytes less: no-uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3836 + buf=128 + buf=128 + namebuf=255 works no-uint64_t + hexd_buff=3837 + buf=128 + buf=128 + namebuf=255 crash <-- so in this case it's really the 8 bytes we removed, they can be passed from one var to another (so they are related). relation 1-1 bytes. uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=255 crash uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=254 crash uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=253 crash uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=252 works <-- so here it theoretically should be 1-3 bytes relation with hexd_buff and namebuf. uint64_t + hexd_buff=3830 + buf=128 + buf=128 + namebuf=252 works <-- let's increase the big buffer by 1, to see if relation is kept uint64_t + hexd_buff=3831 + buf=128 + buf=128 + namebuf=252 works <-- let's increase the big buffer by 1, to see if relation is kept. uint64_t + hexd_buff=3832 + buf=128 + buf=128 + namebuf=252 works <-- let's increase the big buffer by 1, to see if relation is kept. So it seems the initial 1-3 relation might be related to padding between 2 buffers (to be 4-byte/word aligned?). uint64_t + hexd_buff=3833 + buf=128 + buf=128 + namebuf=252 crash <-- Let's increase by 1 again, we break the 4-byte alignemnt so total space increases and we cross the file size "threshlold" line again. uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=250 works <-- here we continue dropping several buffers to check everything works fine. uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=240 works uint64_t + hexd_buff=3829 + buf=128 + buf=128 + namebuf=100 works uint64_t + hexd_buff=3833 + buf=32 + buf=128 + namebuf=252 ? <-- Crashing situation, but checking if 1nd buf affects it: it doesn't uint64_t + hexd_buff=3833 + buf=128 + buf=32 + namebuf=252 crash <-- Crashing situation, but checking if 2nd buf affects it: it doesn't uint64_t + hexd_buff=3833 + buf=32 + buf=32 + namebuf=252 crash <-- Crashing situation, but checking if both buf affects it: they don't uint64_t + hexd_buff=3833 + buf=1 + buf=1 + namebuf=252 crash <-- Crashing situation, but checking if both buf affects it: they don't uint64_t + hexd_buff=3833 + no-buf + no-buf + namebuf=252 works <-- Crashing situation, but cremoving __thread from both buf affects it: it does!!! Let's try moving stuff around: no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=255 crash <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 255(+1) + 4 + 4 = 4101?) no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=252 works <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 252 + 4 + 4 = 4097?) no-uint64_t + hexd_buff=3837 + buf=1 + buf=1 + namebuf=254 crash <-- Crashing situation from above, modified (sum with 4-byte padding: (3837 + 254 + 4 + 4 = 4099?) no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=254 works <-- Crashing situation from above, modified (sum with 4-byte padding: (3836 + 254 + 4 + 4 = 4098?) Let's try moving stuff around (to be 4-bytes without padding): no-uint64_t + hexd_buff=3836 + buf=1 + buf=1 + namebuf=256 works <-- (sum with 4-byte padding: (3836 + 256 + 2 + 2 = 4096) no-uint64_t + hexd_buff=3840 + buf=1 + buf=1 + namebuf=252 works <-- Let's move 4 bytes from namebuf to hexd_buff (sum with 4-byte padding: (3840 + 252 + 2 + 2 = 4096) no-uint64_t + hexd_buff=3844 + buf=1 + buf=1 + namebuf=252 crash <-- Let's add 4 bytes to hexd_buff (sum with 4-byte padding: (3844 + 252 + 2 + 2 = 4100) no-uint64_t + hexd_buff=4088 + buf=1 + buf=1 + namebuf=4 works <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4088 +2 +2 + 4 = 4096) no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=2 crash <-- Let's go back to working situation, but moving most bytes to hexd_buff (sum with 4-byte padding: (4090 + 2 +2 + 2 = 4096). no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + namebuf=1 crash <-- Let's go back to working situation, but moving most bytes to hexd_buff, seems still padding issues?. no-uint64_t + hexd_buff=4090 + buf=1 + buf=1 + no-namebuf works <-- works, so probably namebuf had padding to 4 bytes? no-uint64_t + hexd_buff=4094 + buf=1 + buf=1 + no-namebuf crash <-- Let's increase for bytes to be higher than 4096, then we see it crashes and namebug had 4-byte padding. no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash <-- Let's remove the buf we can get rid of to simplify. We are under 4096 but still crashes (maybe it takes 4-byte boundaries?). no-uint64_t + no-hexd_buff + no-buff + buf=4096 + no-namebuf works <-- Getting rid of hexd_buf.... works! no-uint64_t + no-hexd_buff + no-buff + buf=9000 + no-namebuf works <-- Let's go higher than 4096... works! wtf no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works <-- Going back to have 2 variables, but most of the size is on the second one. Moving buf out of function still makes it work. no-uint64_t + hexd_buff=4000 + no-buf + buf=9000 + no-namebuf works <-- Adding 4000 to hexd_buf, still works. no-uint64_t + hexd_buff=6000 + no-buf + buf=9000 + no-namebuf works <-- Adding 2000 to hexd_buf, still works!!! (moving buf inside the variable too, so being inside or outside doesn't seem to help there). no-uint64_t + hexd_buff=4094 + no-buf + buf=1 + no-namebuf crash <-- Going back to known to crash values. It crashes. So it seems there's no simple threshold at which point it crashes. no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crah no-uint64_t + hexd_buff=6000 + no-buf + buf=2100 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable still ends up in 2nd page). no-uint64_t + hexd_buff=6000 + no-buf + buf=6296 + no-namebuf crash <-- Let's increase hexd_buff to previous working condition. Still crashes. (2nd variable now ends up in 4th page). Crash happens in different place! exit()->__call_tls_dtors(). no-uint64_t + hexd_buff=4 + no-buf + buf=9000 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=6296). no-uint64_t + hexd_buff=4 + no-buf + buf=4092 + no-namebuf works <-- Let's drop most of size from first var. works! (it also works with buf=4098). no-uint64_t + hexd_buff=1 + no-buf + buf=4094 + no-namebuf works <-- So it works, contrary to the hexd_buff=4094 + buf=1, so order matters no-uint64_t + hexd_buff=4094 + no-buf + buf=9000 + no-namebuf works no-uint64_t + hexd_buff=4094 + no-buf + buf=4 + no-namebuf crahes
#34 Updated by pespin over 1 year ago
Submitted a couple related workaround patches to have it working for now:
remote: https://gerrit.osmocom.org/c/libosmocore/+/15024 utils: share static buffer in osmo_str_to{lower,upper}()
remote: https://gerrit.osmocom.org/c/libosmocore/+/15025 Workaround TLS gcc compiler bug in ARM
I tested they work fine (osmo-bsc doesn't crash upon startup like it used to, after applying the patch).
I will document better the 2nd one if needed. I still need to report the issue upstream (not sure if it really make sense other than for inforamtional purposes since it seems to be fixed on newer toolchains).
#35 Updated by Hoernchen over 1 year ago
This looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81142
#36 Updated by laforge over 1 year ago
"gcc-6 (Raspbian 6.5.0-1+rpi1+b1) 6.5.0 20181026" can reproduce the problem, while "gcc (Raspbian 8.3.0-6+rpi1) 8.3.0" cannot.
#37 Updated by laforge over 1 year ago
"gcc-7 (Raspbian 7.3.0-19) 7.3.0" also cannot reproduce the problem. So it seems anything < 7.x for sure has the bug.
#38 Updated by laforge over 1 year ago
"-mtls-dialect=gnu2" seems to be a viable way to avoid the segfault, as indicated in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81142#c12
I just built libosmocore using gcc-6.5 on raspbian 10 using "-O2 -g -mtls-dialect=gnu2" and linked main programs compiled with normal "-O2 -g" and the resulting programs worked as expected.
#39 Updated by laforge over 1 year ago
So now the challenge is to find or write autoconf macros which detect the combination of 32bit arm and the compiler version < 7.1.x, and then tune the CFLAGS accordingly if that is the case.
For the gcc version, there's a suggestion in https://stackoverflow.com/questions/7067385/find-the-gcc-version.
#40 Updated by Hoernchen over 1 year ago
The gnu2 aka TLSDESC model has been marked experimental for ~10 years in https://static.docs.arm.com/ihi0044/g/aaelf32.pdf and clang/lld does not support it except for aarch64 where it's the default - so being a rather arcane non-standard option it's hard to tell how well it works, but I guess we will find that out the same way we found this bug, by just using it.
#41 Updated by pespin over 1 year ago
Patch to automatically workaround this bug (ARM32 and gcc < 7.3.0):
https://gerrit.osmocom.org/c/libosmocore/+/15037 configure: Autodetect TLS bug on ARM with old gcc and apply workaround
BTW, it seems sysmobts SDK's ld doesn't like the new CFLAG, it segfaults.
#42 Updated by pespin over 1 year ago
- Status changed from In Progress to Feedback
- % Done changed from 50 to 90
toolchain we use for sysmobts and oc2g in OE crashes when using the -mtls-dialect=gnu2 CFLAG. However, the toolchain seems to produce correct binaries which don't crash at runtime.
So I added a new commit to libosmocore which allows disabling the automatic workaround we added:
https://gerrit.osmocom.org/c/libosmocore/+/15061 configure: Allow disabling workaround for TLS bug in old ARM gcc versions
I already submitted a patch against meta-telephony laforge/nightly to use it, and also another patch for osmo-gsm-tester to fix the build scripts using SDKs from that toolchain:
https://gerrit.osmocom.org/c/osmo-gsm-tester/+/15067 contrib: Avoid sysmobts/oc2g toolchain ld crashes building new libosmocore
Once those are merged and we cross-check that everything's fine, we can close the ticket.
#43 Updated by pespin over 1 year ago
- Status changed from Feedback to Resolved
- % Done changed from 90 to 100
Everything's merged and included in release from this week, closing this ticket.