Project

General

Profile

Actions

Bug #5366

closed

ttcn3-{bsc,msc,sgsn,smlc,sccp}-test regressions due to timeout waiting for RESET-ACK

Added by fixeria over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
12/21/2021
Due date:
% Done:

100%

Spec Reference:


Related issues

Related to Cellular Network Infrastructure - Bug #4573: [centos] ttcn3-msc-test: 177 failures!Resolvedlaforge06/01/2020

Actions
Related to libosmo-netif - Bug #5368: consider using SCTP_EVENT instead of SCTP_EVENTSFeedback12/22/2021

Actions
Actions #1

Updated by fixeria over 2 years ago

I managed to reproduce the failures in a local docker environment. What brought my attention is this message:

M3UA_Emulation.ttcn:554 Warning: Setsockopt error!

which may be originated from several places:

$ grep -r "Setsockopt error" .
./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc:         TTCN_warning("Setsockopt error!");
./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc:        TTCN_warning("Setsockopt error!");
./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc:        TTCN_warning("Setsockopt error!");
./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc:    TTCN_warning("Setsockopt error!");
./deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc:    TTCN_warning("Setsockopt error!");
./deps/titan.TestPorts.SCTPasp/doc/SCTPasp_Description.adoc:`*Setsockopt error!*`
./deps/titan.TestPorts.IPL4asp/src/IPL4asp_PT.cc:        TTCN_error("Setsockopt error: SO_BROADCAST");
./deps/titan.TestPorts.IPL4asp/src/IPL4asp_PT.cc:        TTCN_warning("Setsockopt error!");
./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc:      TTCN_error("Setsockopt error: SO_BROADCAST");
./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc:     TTCN_error("Setsockopt error: SO_REUSEADDR");
./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc:        TTCN_error("UDP test port: Setsockopt error: SO_BROADCAST");
./deps/titan.TestPorts.UDPasp/src/UDPasp_PT.cc:      TTCN_error("UDP test port: Setsockopt error: SO_BROADCAST");

Most likely, it comes from deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc. I'll continue the investigation...

Actions #2

Updated by fixeria over 2 years ago

  • % Done changed from 0 to 10

fixeria wrote in #note-1:

Most likely, it comes from deps/titan.TestPorts.SCTPasp/src/SCTPasp_PT.cc.

Indeed it does. I confirmed this by enabling debug logging:

MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Calling user_start().
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Leaving user_start().
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Calling user_map(sctp).
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Running in NORMAL MODE.
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Leaving user_map().
MTC@0b2d69031686: Connecting BSSAP RAN_Emulation to SCCP_SP_PORT
MTC@0b2d69031686: Starting RAN_Emulation
MSC_Test_0-M3UA(18)@0b2d69031686: *************************************************
MSC_Test_0-M3UA(18)@0b2d69031686: M3UA emulation initiated, the test can be started
MSC_Test_0-M3UA(18)@0b2d69031686: *************************************************
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Calling outgoing_send (ASP_SCTP_CONNECTFROM).
MSC_Test_0-SCCP(16)@0b2d69031686: v_sccp_pdu_maxlen:268
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Creating SCTP socket.
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Setting SCTP socket options (initmsg).
MSC_Test_0-M3UA(18)@0b2d69031686: SCTPasp Test Port (SCTP_PORT): Setting SCTP socket options (events).
MSC_Test_0-M3UA(18)@0b2d69031686: Warning: Setsockopt error!

This narrows down the scope to SCTPasp__PT_PROVIDER::create_socket():

  log("Setting SCTP socket options (initmsg).");
  if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_INITMSG, &initmsg,
    sizeof(struct sctp_initmsg)) < 0) 
  {
    TTCN_warning("Setsockopt error!");
    errno = 0; 
  }

  log("Setting SCTP socket options (events).");
  if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_EVENTS, &events, sizeof (events)) < 0) 
  {
    TTCN_warning("Setsockopt error!");
    errno = 0; 
  }
Actions #3

Updated by fixeria over 2 years ago

  • Related to Bug #4573: [centos] ttcn3-msc-test: 177 failures! added
Actions #4

Updated by fixeria over 2 years ago

  • Status changed from In Progress to New
  • Assignee changed from fixeria to laforge
  • % Done changed from 10 to 40

I think I know what's going on: we're dealing with the same problem as in #4573 - kernel ABI breakage.

This patch was useful during debugging:

commit 6ed17697914e74410004ebd8588d03ee04b97426
Author: Dock Er <docker@dock.er>
Date:   Tue Dec 21 17:24:07 2021 +0000

    HACK: log() generates event TTCN_ERROR regardless of debug flag

diff --git a/src/SCTPasp_PT.cc b/src/SCTPasp_PT.cc
index 552946c..4ce343d 100644
--- a/src/SCTPasp_PT.cc
+++ b/src/SCTPasp_PT.cc
@@ -1302,17 +1302,13 @@ void SCTPasp__PT_PROVIDER::handle_event(void *buf)

 void SCTPasp__PT_PROVIDER::log(const char *fmt, ...)
 {
-  if(debug)
-  {
     va_list ap;
     va_start(ap, fmt);
-    TTCN_Logger::begin_event(TTCN_DEBUG);
+    TTCN_Logger::begin_event(TTCN_ERROR);
     TTCN_Logger::log_event("SCTPasp Test Port (%s): ", get_name());
     TTCN_Logger::log_event_va_list(fmt, ap);
     TTCN_Logger::end_event();
     va_end(ap);
-
-  }
 }

Applying this patch makes the test cases behave as usual:

commit 3046518e09fd9eef8de2ef17c2803420cdef5045 (HEAD)
Author: Dock Er <docker@dock.er>
Date:   Tue Dec 21 17:25:07 2021 +0000

    create_socket(): workaround kernel ABI breakage

diff --git a/src/SCTPasp_PT.cc b/src/SCTPasp_PT.cc
index 4ce343d..e0c1cec 100644
--- a/src/SCTPasp_PT.cc
+++ b/src/SCTPasp_PT.cc
@@ -1566,8 +1566,11 @@ int SCTPasp__PT_PROVIDER::create_socket(int addr_family)
   log("Setting SCTP socket options (events).");
   if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_EVENTS, &events, sizeof (events)) < 0)
   {
-    TTCN_warning("Setsockopt error!");
-    errno = 0;
+    /* See https://gerrit.osmocom.org/c/libosmo-netif/+/18097 */
+    if (setsockopt(local_fd, IPPROTO_SCTP, SCTP_EVENTS, &events, 13) < 0)
+    {
+      TTCN_warning("Setsockopt error!");
+      errno = 0;
+    }
   }
   return local_fd;
 }

laforge I believe you have more experience with the kernel API/ABI, so I hand over this task to you.

Actions #5

Updated by laforge over 2 years ago

  • Assignee changed from laforge to daniel
  • Priority changed from High to Immediate

This needs to be fixed immediately as it breaks most of our test suites. Unfortunately my backlog today alone has grown by another week, so I cannot really look into this anywhere near enough.

The work-around of libosmo* in #4573 needs to be ported into the SCTPasp_PT.

Actions #6

Updated by daniel over 2 years ago

  • % Done changed from 40 to 70

I pushed a patch to https://github.com/osmocom/titan.TestPorts.SCTPasp (didn't look like we're doing pull requests there).

Gerrit change for TTCN3 here: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26674

I wasn't able to reproduce that locally, but I hacked up the jenkins SCCP test for one run to call
OSMO_TTCN3_BRANCH=daniel/sctp_events ./jenkins.sh

See https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-sccp-test/713/

Actions #7

Updated by laforge over 2 years ago

  • Related to Bug #5368: consider using SCTP_EVENT instead of SCTP_EVENTS added
Actions #8

Updated by laforge over 2 years ago

  • Status changed from New to In Progress
Actions #9

Updated by daniel over 2 years ago

  • Priority changed from Immediate to High
  • % Done changed from 70 to 90
Actions #10

Updated by fixeria over 2 years ago

Great, thank you Daniel!

Interestingly enough, -latest is generally not affected, except ttcn3-msc-test:

I am still wondering why? Perhaps the build machines have different kernel versions in use:

  • ttcn3-bsc-test: "Building remotely on build2-deb9build-ansible ..."
  • ttcn3-bsc-test-latest: "Building remotely on gtp0-deb9build ..."
  • ttcn3-msc-test: "Building remotely on build2-deb9build-ansible ..."
  • ttcn3-msc-test-latest: "Building remotely on build2-deb9build-ansible ..."
Actions #11

Updated by daniel about 2 years ago

  • Status changed from In Progress to Stalled

Waiting for upstream -> stalled

fixeria wrote in #note-10:

Interestingly enough, -latest is generally not affected, except ttcn3-msc-test:

I am still wondering why? Perhaps the build machines have different kernel versions in use:

  • ttcn3-bsc-test: "Building remotely on build2-deb9build-ansible ..."
  • ttcn3-bsc-test-latest: "Building remotely on gtp0-deb9build ..."
  • ttcn3-msc-test: "Building remotely on build2-deb9build-ansible ..."
  • ttcn3-msc-test-latest: "Building remotely on build2-deb9build-ansible ..."

Yeah, kernel version most likely.

Actions #12

Updated by fixeria about 2 years ago

  • Status changed from Stalled to Resolved
  • % Done changed from 90 to 100

daniel wrote in #note-9:

Looks like the tests are passing again.

I also created pull requests about this issue at the eclipse gitlab:

https://gitlab.eclipse.org/eclipse/titan/titan.TestPorts.IPL4asp/-/merge_requests/2
https://gitlab.eclipse.org/eclipse/titan/titan.TestPorts.SCTPasp/-/merge_requests/4

Both pull requests have been merged. Tests are passing, marking as resolved.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)