Project

General

Profile

Actions

Bug #5255

closed

ttcn3-bsc-test-latest: CBSP and LCLS test cases fail since build #1095

Added by fixeria over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
10/12/2021
Due date:
% Done:

100%

Spec Reference:

Description

It looks like some test case(s) cause a segmentation fault of the IUT:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/1095/artifact/logs/bsc/core

so the remaining CBSP/LCLS test cases cannot talk to it anymore:

Stacktrace

"VTY Timeout for prompt: enable" 
      BSC_Tests_LCLS.ttcn:742 BSC_Tests_LCLS control part
      BSC_Tests_LCLS.ttcn:254 TC_lcls_gcr_only testcase
Actions #1

Updated by fixeria over 2 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 20

I found the culprit:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/1095/artifact/logs/bsc/osmo-bsc.log/*view*/

20210930074737198 DLGLOBAL <0015> logging_vty.c:1113 TTCN3 f_logp(): TC_lost_sdcch_during_assignment() start
Segmentation fault (core dumped)

This test case was introduced quite recently:

commit 92cfa1c45ae1cb52d5aefb774f93468fef607417
Author: Neels Hofmeyr <nhofmeyr@sysmocom.de>
Date:   Tue Sep 28 18:29:44 2021 +0200

    bsc: add TC_lost_sdcch_during_assignment()

and the aim is to reproduce a segfault described in SYS#5627.

Actions #2

Updated by fixeria over 2 years ago

  • Status changed from In Progress to Stalled
  • % Done changed from 20 to 40

I decided to back-port a patch fixing the segfault and create a patch release (1.7.0 -> 1.7.1):

https://gerrit.osmocom.org/c/osmo-bsc/+/25753 assignment_fsm: Check for conn->lchan

osmith, pespin, may I ask one of you to help with createing the actual patch release? I used to have a docker image with Debian and all the tools needed for osmo-release.sh, but then did 'docker system prune --all' and lost it.

Actions #3

Updated by osmith over 2 years ago

  • Status changed from Stalled to Resolved
  • % Done changed from 40 to 100
Actions #4

Updated by fixeria over 2 years ago

osmith wrote:

Sure, done: https://git.osmocom.org/osmo-bsc/commit/?h=1.7.1

Thank you very much!

Actions #5

Updated by fixeria over 2 years ago

  • Status changed from Resolved to In Progress
  • % Done changed from 100 to 80

Unfortunately, latest osmo-bsc still crashes when TC_lost_sdcch_during_assignment is being executed:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/1109/artifact/logs/bsc/core

This time we get a bit further and see some more logging:

20211014074531510 DLGLOBAL <0015> logging_vty.c:1113 TTCN3 f_logp(): TC_lost_sdcch_during_assignment() start
20211014074531758 DAS <0011> assignment_fsm.c:618 assignment(msc0-conn198_subscr-IMSI-001019876543210_0-0-1-TCH_F-0)[0x5579ec6b3070]{WAIT_RR_ASS_COMPLETE}: (bts=0,trx=0,ts=1,ss=0) Assignment failed in state WAIT_RR_ASS_COMPLETE, cause EQUIPMENT FAILURE: Unable to send RR Assignment Command: conn without lchan
20211014074531758 DAS <0011> assignment_fsm.c:148 assignment(msc0-conn198_subscr-IMSI-001019876543210_0-0-1-TCH_F-0)[0x5579ec6b3070]{WAIT_RR_ASS_COMPLETE}: (bts=0,trx=0,ts=1,ss=0) Assignment failed
20211014074531758 DMSC <0007> assignment_fsm.c:149 SUBSCR_CONN(msc0-conn198_subscr-IMSI-001019876543210)[0x5579ec69bd60]{CLEARING}: Event ASSIGNMENT_END not permitted
20211014074531759 DCHAN <000f> lchan_fsm.c:837 lchan(0-0-1-TCH_F-0)[0x5579ec6acf70]{WAIT_RF_RELEASE_ACK}: transition to state WAIT_RLL_RTP_ESTABLISH not permitted!
20211014074531779 DLMGCP <0025> mgcp_client.c:691 Cannot find matching MGCP transaction for trans_id 420
20211014074533758 DCHAN <000f> lchan_fsm.c:81 lchan(0-0-1-TCH_F-0)[0x5579ec6acf70]{WAIT_RF_RELEASE_ACK}: (type=TCH_F) lchan allocation failed in state WAIT_RF_RELEASE_ACK: Timeout
20211014074533759 DCHAN <000f> lchan_fsm.c:116 lchan(0-0-1-TCH_F-0)[0x5579ec6acf70]{WAIT_RF_RELEASE_ACK}: (type=TCH_F) Signalling Assignment FSM of error (lchan allocation failed in state WAIT_RF_RELEASE_ACK: Timeout)
Segmentation fault (core dumped)
Actions #6

Updated by fixeria over 2 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from fixeria to neels

Unfortunately, latest osmo-bsc still crashes when TC_lost_sdcch_during_assignment is being executed: [...]

neels could you please take a look? I was trying to figure out why it still segfaults, but could not find anything suspicious.

Actions #7

Updated by fixeria over 2 years ago

Interestingly enough, I cannot reproduce the segfault locally with osmo-bsc 1.7.1-0-gf20b3086a.

Actions #8

Updated by neels over 2 years ago

fixeria wrote:

Unfortunately, latest osmo-bsc still crashes when TC_lost_sdcch_during_assignment is being executed: [...]

neels could you please take a look? I was trying to figure out why it still segfaults, but could not find anything suspicious.

osmo-bsc does not crash for me anymore during this test, using current master, where pmaier's fix is merged.
The test also passes on jenkins. Where / how did you still see a crash?

Actions #9

Updated by fixeria over 2 years ago

neels wrote:

osmo-bsc does not crash for me anymore during this test, using current master, where pmaier's fix is merged.
The test also passes on jenkins. Where / how did you still see a crash?

The recent master does not crash, but latest release (1.7.1) does, see for instance:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/1113/artifact/logs/bsc/

1.7.1 is basically a patch release with pmaier's fix applied. And somehow it still segfaults on Jenkins.

Actions #10

Updated by fixeria over 2 years ago

Good news: I managed to reproduce the segfault in a docker container by running it this way:

docker run -it --rm --network=host -v osmo-ttcn3-hacks:/data fixeria/osmo-bsc-latest /usr/bin/osmo-bsc -c /data/bsc/osmo-bsc.cfg

and I am even getting the same logging output. Here is a backtrace:

#0  _lchan_on_activation_failure (lchan=lchan@entry=0x7f037ea25748, activ_for=<optimized out>,
                                  for_conn=0x0, line=line@entry=1574, file=0x563e3f8d910d "lchan_fsm.c") at lchan_fsm.c:117
#1  0x0000563e3f882317 in _lchan_on_activation_failure (line=1574, file=0x563e3f8d910d "lchan_fsm.c",
                                                        for_conn=<optimized out>, activ_for=<optimized out>, 
                                                        lchan=0x7f037ea25748) at lchan_fsm.c:1574
#2  lchan_fsm_timer_cb (fi=0x563e401a3d00) at lchan_fsm.c:1574
#3  0x00007f037ddd5f16 in fsm_tmr_cb (data=0x563e401a3d00) at fsm.c:325
#4  0x00007f037ddd01a6 in osmo_timers_update () at timer.c:273
#5  0x00007f037ddd0b67 in _osmo_select_main (polling=0) at select.c:373
#6  0x00007f037ddd0ce6 in osmo_select_main_ctx (polling=<optimized out>) at select.c:434
#7  0x0000563e3f81e6bf in main (argc=<optimized out>, argv=<optimized out>) at osmo_bsc_main.c:1039
Actions #11

Updated by fixeria over 2 years ago

  • Status changed from Feedback to Stalled
  • Assignee changed from neels to osmith
  • % Done changed from 80 to 90

We need to back-port another change from the recent master:

commit dfd7bef6644d0c0837f7e5498bc5c86362b668dc
Author: Vadim Yanitskiy <vyanitskiy@sysmocom.de>
Date:   Sun Jul 11 13:19:22 2021 +0600

    lchan_fsm: fix potential NULL-pointer dereference

    Change-Id: I373855b95f8bde0ce8f9c2ae7bf95c9135d33484
    Related: SYS#5526

I submitted a cherry-pick to Gerrit:

https://gerrit.osmocom.org/c/osmo-bsc/+/25836 lchan_fsm: fix potential NULL-pointer dereference

And again, I would need some help from osmith to create a patch release. This time 1.7.2.

Actions #12

Updated by fixeria over 2 years ago

I also cherry-picked both patches to the '2021q1':

https://gerrit.osmocom.org/c/osmo-bsc/+/25837 assignment_fsm: Check for conn->lchan [NEW]
https://gerrit.osmocom.org/c/osmo-bsc/+/25838 lchan_fsm: fix potential NULL-pointer dereference [NEW]

Actions #13

Updated by fixeria over 2 years ago

  • Assignee changed from osmith to pespin

Oliver is on holidays this week, Pau agreed to help (thanks!).

Actions #14

Updated by pespin over 2 years ago

  • Status changed from Stalled to Feedback
  • Assignee changed from pespin to fixeria

tag 1.7.2 pushed with commit "lchan_fsm: fix potential NULL-pointer dereference" in it.

Reassigning to fixeria .

Actions #15

Updated by fixeria over 2 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

Good news: latest osmo-bsc (1.7.2) does not crash anymore:

https://jenkins.osmocom.org/jenkins/view/TTCN3-centos/job/TTCN3-centos-bsc-test-latest/228/ (no core file, -36 failures)
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/1116/ (no core file, -36 failures)

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)