Bug #5523
closedsigsegv race condition destroying everything due to "reset_all_state"
100%
Description
There seems to be some use-after-free when a tunnel is freed. This seems to be a race conditions between thread "tun_device_thread" and the main thread.
It can be triggered by running PGW_Tests.TC_createSession_ping4_256. It doesn't always show a backtrace though, probably depending on the memory chunk state after being freed. In general osmo-uecups seems to be end up in some weird hanging stuff, which can be seen by running extra tests afterwards. It actually never answers the "reset_all_state" with "reset_all_state_res".
PGW_Tests.TC_createSession_ping4
20220411142037407 DTUN tun_device.c:391 ping251: Destroying 20220411142037407 DGT gtp_tunnel.c:136 ping252-Rd39cdfc4-T000003f2: Destroying 20220411142037407 DEP gtp_endpoint.c:229 172.18.18.20:2152: Release; new use_count=3 20220411142037408 DTUN tun_device.c:391 ping252: Destroying 20220411142037408 DGT gtp_tunnel.c:136 ping253-Rf59cc758-T000003f6: Destroying 20220411142037409 DEP gtp_endpoint.c:229 172.18.18.20:2152: Release; new use_count=2 20220411142037409 DTUN tun_device.c:391 ping253: Destroying 20220411142037409 DGT gtp_tunnel.c:136 ping254-R104d44b1-T000003fa: Destroying 20220411142037410 DEP gtp_endpoint.c:229 172.18.18.20:2152: Release; new use_count=1 [Thread 0x7f2e5c510700 (LWP 778) exited] 20220411142037467 DTUN tun_device.c:391 ping254: Destroying 20220411142037467 DGT gtp_tunnel.c:136 ping255-R24acc174-T000003fe: Destroying 20220411142037467 DEP gtp_endpoint.c:183 172.18.18.20:2152: Destroying 20220411142037467 DTUN tun_device.c:391 ping255: Destroying [Thread 0x7f2e7b54e700 (LWP 592) exited] [Thread 0x7f2e5bd0f700 (LWP 779) exited] Thread 257 "osmo-uecups-dae" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f2e5b50e700 (LWP 784)] 0x0000564ea59b2e19 in _gtp_tunnel_find_eua (tun=0x564ea5f670e0, sa=0x7f2e5b4fac20, proto=58 ':') at gtp_tunnel.c:125 125 llist_for_each_entry(t, &d->gtp_tunnels, list) { #0 0x0000564ea59b2e19 in _gtp_tunnel_find_eua (tun=0x564ea5f670e0, sa=0x7f2e5b4fac20, proto=58 ':') at gtp_tunnel.c:125 #1 0x0000564ea59afe0e in tun_device_thread (arg=0x564ea5f670e0) at tun_device.c:166 #2 0x00007f2edb8cdea7 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007f2edb692def in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) quit A debugging session is active.
MAIN THREAD: cups_client_handle_json cups_client_handle_reset_all_state pthread_rwlock_wrlock foreach_tun { _gtp_tunnel_destroy } _gtp_endpoint_release _gtp_endpoint_destroy (ep->use_count is 0) pthread_cancel(ep->thread); _tun_device_release _tun_device_destroy (tun->use_count is 0) pthread_cancel(tun->thread); close(tun->fd); pthread_rwlock_unlock Thread 257: tun_device_thread(data = tun) read(tun->fd) pthread_rwlock_rdlock _gtp_tunnel_find_eua(tun) <-- accessing tun crashes pthread_rwlock_unlock
The problem here seems to be that tun pointer being passed to the thread is not fully protected through the rwlock, only partially. Which means read() can happen properly, then main thread freed the tun object, then tun pointer is accessed in the worker thread.
This happens probably because pthread_cancel will stop the thread only when reaching syscall points.