Bug #3062
closedosmo-msc crash while running osmo-gsm-tester voice:nanobts
100%
Description
dmesg:
[3019686.470628] osmo-msc[5749]: segfault at 10 ip 00007fb7d25be03f sp 00007ffcf5f136e8 error 4 in libosmocore.so.9.0.0[7fb7d25aa000+22000]
Attached crash report with core file and binary used.
Content of osmo-msc_git_hashes.txt, containing git revision for each component in the rootfs:
b10ec0be5ffcd8759bb8b1461549a7eaf570bd9b libosmocore d3b8b69977207235056f6033e5828796e5620049 libosmo-abis 9d726d0e56665b4439bb4efe782be06be2e83ffb libosmo-netif 0cad3b4538710a91881ddbb80424c2d7c62de962 libsmpp34 58fcc5344ea0e914c7f9be69c309998d243142bd libosmo-sccp 4f7613eeb1f5cbb107af11920f9270721a4720ae osmo-mgw fa10eda6d70a4ecb37b6dfe629e665a16f8d3f59 osmo-msc
Files
Related issues
Updated by pespin about 6 years ago
- File run-success.tar.gz run-success.tar.gz added
Run same test scenario next time, everything runs successfully.
There can be seen a few difference from ofono point of view which may give hints on the difference of steps:
Crashing one:
15:02:39.919850 tst /sierra_1: DBG: Dialing: 2349 15:02:40.030812 tst /sierra_1: DBG: Adding /sierra_1/voicecall01 to call list 15:02:40.031886 tst /sierra_2: DBG: Waiting for incoming call from: 2348 15:02:40.035831 tst /sierra_1: DBG: 'org.ofono.VoiceCallManager'.CallAdded() -> /sierra_1/voicecall01="{'Name': '', 'RemoteHeld': False, 'RemoteMultiparty': False, 'Emergency': False, 'State': 'dialing', 'Multiparty': False, 'LineIdentification': '2349'}" 15:02:40.036713 tst /sierra_1: DBG: Call already exists '/sierra_1/voicecall01' 15:02:44.063539 tst /sierra_2: DBG: 'org.ofono.VoiceCallManager'.CallAdded() -> /sierra_2/voicecall01="{'Name': '', 'RemoteHeld': False, 'RemoteMultiparty': False, 'Emergency': False, 'State': 'dialing', 'Multiparty': False, 'LineIdentification': '2348'}" 15:02:45.085959 tst mo_mt_call.py:43: dial success 15:02:45.208064 tst /sierra_1: DBG: call state: alerting 15:02:45.219170 tst /sierra_2: DBG: call state: incoming 15:02:45.219887 tst /sierra_2: DBG: Answer call /sierra_2/voicecall01 15:02:45.230227 tst /sierra_2: DBG: call state: incoming 15:02:45.293917 tst /sierra_1: DBG: call state: alerting 15:02:46.307953 tst /sierra_1: DBG: call state: alerting 15:02:47.322194 tst /sierra_1: DBG: call state: alerting 15:02:48.336572 tst /sierra_1: DBG: call state: alerting 15:02:49.350422 tst /sierra_1: DBG: call state: alerting 15:02:50.364705 tst /sierra_1: DBG: call state: alerting 15:02:51.378847 tst /sierra_1: DBG: call state: alerting 15:02:52.382323 tst /sierra_2: DBG: 'org.ofono.VoiceCallManager'.CallRemoved() -> /sierra_2/voicecall01 15:02:52.394862 tst /sierra_1: DBG: call state: alerting 15:02:53.410070 tst /sierra_1: DBG: call state: alerting 15:02:54.424374 tst /sierra_1: DBG: call state: alerting 15:02:55.438432 tst /sierra_1: DBG: call state: alerting 15:02:56.452517 tst /sierra_1: DBG: call state: alerting 15:02:57.466964 tst /sierra_1: DBG: call state: alerting 15:02:58.481001 tst /sierra_1: DBG: call state: alerting 15:02:59.495322 tst /sierra_1: DBG: call state: alerting 15:03:00.497691 run osmo-msc_10.42.42.6(pid=5749): DBG: Cleanup 15:03:00.498565 run osmo-msc_10.42.42.6(pid=5749): ERR: Terminated: ERROR {rc=-11} [trial↪voice:nanobts↪osmo-msc_10.42.42.6↪osmo-msc_10.42.42.6(pid=5749)]
Working one:
16:15:34.867474 tst /sierra_1: DBG: Dialing: 2351 16:15:34.982822 tst /sierra_1: DBG: Adding /sierra_1/voicecall01 to call list 16:15:34.983774 tst /sierra_2: DBG: Waiting for incoming call from: 2350 16:15:34.987333 tst /sierra_1: DBG: 'org.ofono.VoiceCallManager'.CallAdded() -> /sierra_1/voicecall01="{'State': 'dialing', 'RemoteHeld': False, 'LineIdentification': '2351', 'Emergency': False, 'Multiparty': False, 'RemoteMultiparty': False, 'Name': ''}" 16:15:34.988151 tst /sierra_1: DBG: Call already exists '/sierra_1/voicecall01' 16:15:35.994692 tst /sierra_2: DBG: 'org.ofono.NetworkRegistration'.PropertyChanged() -> Strength=60 16:15:39.016523 tst /sierra_2: DBG: 'org.ofono.VoiceCallManager'.CallAdded() -> /sierra_2/voicecall01="{'State': 'dialing', 'RemoteHeld': False, 'LineIdentification': '2350', 'Emergency': False, 'Multiparty': False, 'RemoteMultiparty': False, 'Name': ''}" 16:15:40.038712 tst mo_mt_call.py:43: dial success 16:15:40.170234 tst /sierra_1: DBG: call state: alerting 16:15:40.181010 tst /sierra_2: DBG: call state: incoming 16:15:40.181749 tst /sierra_2: DBG: Answer call /sierra_2/voicecall01 16:15:40.192153 tst /sierra_2: DBG: call state: incoming 16:15:40.243483 tst /sierra_1: DBG: call state: alerting 16:15:41.257555 tst /sierra_1: DBG: call state: active 16:15:41.269610 tst /sierra_2: DBG: call state: active 16:15:41.270376 tst mo_mt_call.py:49: answer success, call established and ongoing 16:15:46.404230 tst /sierra_1: DBG: call state: active 16:15:46.416582 tst /sierra_2: DBG: call state: active 16:15:46.417448 tst /sierra_2: DBG: Hang up call /sierra_2/voicecall01 16:15:46.471815 tst /sierra_1: DBG: call_id_list: ['/sierra_1/voicecall01'] 16:15:47.475741 tst /sierra_2: DBG: 'org.ofono.VoiceCallManager'.CallRemoved() -> /sierra_2/voicecall01 16:15:47.476564 tst /sierra_1: DBG: 'org.ofono.VoiceCallManager'.CallRemoved() -> /sierra_1/voicecall01 16:15:47.477449 tst /sierra_1: DBG: call_id_list: [] 16:15:47.477985 tst /sierra_2: DBG: call_id_list: [] 16:15:47.478525 tst mo_mt_call.py:56: hangup success 16:15:47.593366 tst mo_mt_call.py: Test passed (212.5 sec)
Updated by neels about 6 years ago
- Status changed from New to In Progress
- Assignee set to neels
Updated by neels about 6 years ago
- % Done changed from 0 to 30
Got the failure reproduced locally by a test added on branch neels/os3062.
The cause is that when receiving a BSSMAP Clear Request from the BSC, the MSC shoots the conn to the moon right away, while the T308 would still like to complete the CC Release -> CC Release Complete dance. We thus have a T308 timer running on an already freed connection, and once it fires 10 seconds later, it ends in a segfault.
Still need to figure out how to fix the situation among the intricacies of the conn FSM and the externally fired trans.cc.timer. A proper way might be to incorporate the timer we have always had into the relatively new subscriber conn FSM... but for now it would suffice to stay around until the timers are done. I believe the main fix is to trigger a less drastic event when we receive a BSSMAP Clear Request from the BSC.
Also a safeguard should make sure that no timers are running when the transaction is freed, and loudly error-log about it.
Updated by neels about 6 years ago
- % Done changed from 30 to 40
got an effective safeguard in https://gerrit.osmocom.org/7273 which fixes the segfault, but the underlying failure to gracefully tear down is still pending.
Updated by neels about 6 years ago
- Related to Bug #3122: fix subscr_conn fsm: safely catch all compl-l3 failures, properly handle all release situations added
Updated by neels almost 6 years ago
- % Done changed from 40 to 90
tests https://gerrit.osmocom.org/7703 (merged) https://gerrit.osmocom.org/7712 (almost merged)
fixed by https://gerrit.osmocom.org/7273 and https://gerrit.osmocom.org/7704
Updated by neels almost 6 years ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100