Project

General

Profile

Bug #3715

osmo-gsm-tester: Mainloop.wait() hangs forever in Mainloop.poll()

Added by pespin 16 days ago. Updated 12 days ago.

Status:
New
Priority:
Low
Assignee:
Target version:
-
Start date:
11/29/2018
Due date:
% Done:

0%

Spec Reference:

Description

It was spotted today that an osmo-gsm-tester run was running for more than a day. I actually checked that job the day before, and I could see that it was still exactly at same place the day after.

It was running following test: gprs:trx-umtrx+mod-bts0-dynts-ipa iperf3m4.py

osmo-gsm-tester extended (debug) log showed last lines:

05:33:54.235706 tst                    iperf3m4.py:8: waiting for modems to attach...
05:33:55.049684 tst                          /gobi_3: Already registered with network ('901', '70')
05:34:10.177197 tst                          /gobi_4: Already registered with network ('901', '70')
05:34:15.523556 tst                    iperf3m4.py:8: waiting for modems to attach to data services...

I attached to the python3 process and did a apply thread all bt:

Thread 3 (Thread 0x7f3f1dcfb700 (LWP 23377)):
#0  0x00007f3f2300767d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f3f1fe519f6 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f3f1fe51d82 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f3f1f72f656 in ?? () from /usr/lib/x86_64-linux-gnu/libgio-2.0.so.0
#4  0x00007f3f1fe793d5 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f3f23e1d494 in start_thread (arg=0x7f3f1dcfb700) at pthread_create.c:333
#6  0x00007f3f23010acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 2 (Thread 0x7f3f1e5cf700 (LWP 23376)):
#0  0x00007f3f2300767d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f3f1fe519f6 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f3f1fe51b0c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f3f1fe51b51 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007f3f1fe793d5 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f3f23e1d494 in start_thread (arg=0x7f3f1e5cf700) at pthread_create.c:333
#6  0x00007f3f23010acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 1 (Thread 0x7f3f24237700 (LWP 23375)):
#0  0x00007f3f2300767d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f3f1fe519f6 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007f3f1fe51b0c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f3f1fc04038 in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#4  0x00007f3f1fc03a9a in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#5  0x00007f3f207cf9cc in ?? () from /usr/lib/python3/dist-packages/gi/_gi.cpython-35m-x86_64-linux-gnu.so
#6  0x00007f3f207d1478 in ?? () from /usr/lib/python3/dist-packages/gi/_gi.cpython-35m-x86_64-linux-gnu.so
#7  0x00007f3f207c5209 in ?? () from /usr/lib/python3/dist-packages/gi/_gi.cpython-35m-x86_64-linux-gnu.so
#8  0x000055e932850647 in PyObject_Call ()
#9  0x000055e9327c7ee1 in PyEval_EvalFrameEx ()
#10 0x000055e9327cc7bf in ?? ()
#11 0x000055e9327c80a9 in PyEval_EvalFrameEx ()
#12 0x000055e9327cc286 in ?? ()
#13 0x000055e9327c80a9 in PyEval_EvalFrameEx ()
#14 0x000055e9327c793f in PyEval_EvalFrameEx ()
#15 0x000055e9327cc7bf in ?? ()
#16 0x000055e9327c84c9 in PyEval_EvalFrameEx ()
#17 0x000055e9327cc7bf in ?? ()
#18 0x000055e9327c84c9 in PyEval_EvalFrameEx ()
#19 0x000055e9327c793f in PyEval_EvalFrameEx ()
#20 0x000055e9327c793f in PyEval_EvalFrameEx ()
#21 0x000055e9327cc286 in ?? ()
#22 0x000055e9327c80a9 in PyEval_EvalFrameEx ()
#23 0x000055e9327cc286 in ?? ()
---Type <return> to continue, or q <return> to quit---
#24 0x000055e9327c80a9 in PyEval_EvalFrameEx ()
#25 0x000055e9327c793f in PyEval_EvalFrameEx ()
#26 0x000055e9327cc286 in ?? ()
#27 0x000055e9327ccf9f in PyEval_EvalCode ()
#28 0x000055e93289a8f2 in ?? ()
#29 0x000055e93289ce1d in PyRun_FileExFlags ()
#30 0x000055e93289d5be in PyRun_SimpleFileExFlags ()
#31 0x000055e9328cb4d7 in Py_Main ()
#32 0x000055e93275bc01 in main ()
(gdb)

After I dettached the process from gdb, it continued and log showed following error:

05:34:15.573785 tst                    iperf3m4.py:8: waiting for modems to attach to data services...  [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪iperf3m4.py:8]  [testlib.py:92]
05:34:16.690121 tst                          /gobi_1: DBG: attached: True  [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪/gobi_1]  [modem.py:589]
05:34:16.695481 tst                          /gobi_1: DBG: activate_context {apn='inet46', protocol='ip', user='ogt'}  [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪/gobi_1]  [modem.py:609]
12:34:56.809299 tst                          /gobi_4: DBG: 'org.ofono.ConnectionManager'.PropertyChanged() -> Attached=True  [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪/gobi_4]  [modem.py:759]
12:39:57.157456 tst                    iperf3m4.py:8: ERR: Error: Wait timeout {condition=<function Modem.activate_context.<locals>.<lambda> at 0x7f3f1c3a8048>, timeout=300, timestep=1}  [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪iperf3m4.py:8]  [event_loop.py:111: raise log.Error('Wait timeout', condition=condition, timeout=timeout, timestep=timestep)]
12:39:57.168774 tst                    iperf3m4.py:8: TRACEBACK: Traceback (most recent call last):
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/test.py", line 61, in run
    self.path)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/util.py", line 351, in run_python_file
    spec.loader.exec_module( importlib.util.module_from_spec(spec) )
  File "<frozen importlib._bootstrap_external>", line 673, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/suites/gprs/iperf3m4.py", line 8, in <module>
    setup_run_iperf3_test_parallel(4)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/suites/gprs/lib/testlib.py", line 96, in setup_run_iperf3_test_parallel
    ctx_id_v4 = ms.activate_context(apn='inet46', protocol=ms.CTX_PROT_IPv4)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 622, in activate_context
    MainLoop.wait(self, lambda: ctx.GetProperties()['Active'] == True)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/event_loop.py", line 111, in wait
    raise log.Error('Wait timeout', condition=condition, timeout=timeout, timestep=timestep)
osmo_gsm_tester.log.Error: Wait timeout {condition=<function Modem.activate_context.<locals>.<lambda> at 0x7f3f1c3a8048>, timeout=300, timestep=1}
  [trial-1680↪gprs:trx-umtrx+mod-bts0-dynts-ipa↪iperf3m4.py:8]  [test.py:104]

So it seems it was somehow blocked forever in Mainloop.poll() until probably detaching the process made self.gctx.iteration(may_block=True) (and poll() behind it) return an -EAGAIN or similar stuff. I'm not sure how can that happen since we add a timeout immediatelly before polling:

        wait_req = WaitRequest(condition, condition_args, condition_kwargs, timeout, timestep)
        wait_id = GObject.timeout_add(timestep*1000, self._trigger_cb_func, wait_req.condition_check)

_trigger_cb_func always returns True so it is always rearmed, and as a consequence either condition_ack or timeout_ack will become True. Only conditions I can think of:
  • self.gctx.iteration(may_block=True) doesn't return sometimes for whataver reason after firing the timeout.
  • Time returned by time.time() stops working sanely.
  • Some bug in glib.

History

#1 Updated by pespin 12 days ago

  • Priority changed from Normal to Low

Moving to low since it was only spotted once and thus not usually happening.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)