Project

General

Profile

Actions

Bug #5230

open

hnbgw fails to connect to the sgsn

Added by lynxis over 2 years ago. Updated over 2 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Target version:
-
Start date:
09/07/2021
Due date:
% Done:

0%

Spec Reference:

Description

On the event gsm setup I've noticed that sometimes the packets doesn't flow between the daemons even all are up.
I would guess it's a stale connection or a wrong routing.

It's a full setup with osmo-bsc/msc/sgsn/hnbgw (+ the other non stp services).
I restart a daemon and the data doesn't flow even the daemon is reconnecting.

My current failure is with a nano3g <> hnbgw <> stp <> sgsn.
While the CS path to the msc works.

The UE is sending a GPRS-ATTACH-REQ which correlates to the following osmo-hnbgw failure message:

Sep 07 17:48:57 rc3-gsm osmo-hnbgw[9345]: 20210907174857511 DLSCCP ERROR Received unknown conn-id 1002 for primitive N-DATA.request (sccp_scoc.c:1758)
[...] for every Attach Request 1

I've noticed this bug from time to time while working with the event setup. I could image a bug in the sccp or stp.


Files

bug_2021_09_07.tar.gz bug_2021_09_07.tar.gz 5.26 KB hnbgw,sgsn,stp logs lynxis, 09/07/2021 04:07 PM
Actions #1

Updated by laforge over 2 years ago

On Tue, Sep 07, 2021 at 04:08:36PM +0000, lynxis [REDMINE] wrote:

> Sep 07 17:48:57 rc3-gsm osmo-hnbgw[9345]: 20210907174857511 DLSCCP ERROR Received unknown conn-id 1002 for primitive N-DATA.request (sccp_scoc.c:1758)
> [...] for every Attach Request 1
> 

I've noticed this bug from time to time while working with the event setup. I could image a bug in the sccp or stp.

I would think it either means the hnbgw has been re-started, or somehow the SCCP connections have timed out on the hnbgw but not on the SGSN. That's the only reasonable explanation why we get a inbound SCCP message for a non/no-longer existing conn.

Might also be related to the SCCP IT (interval timer) message which acts as keepalive.

Actions #2

Updated by laforge over 2 years ago

  • Status changed from New to Feedback
  • Assignee set to lynxis

FYI: osmo-stp is not involved in interpreting/modifying/generating those SCCP messages. SCCP is just passed through on top of M3UA. So you can exclude that from being the cause, it is something between the two SCCP users, i.e. hnbgw and sgsn. A reasonably long pcap file [to catch any earlier connections that timed out meanwhile] would be useful, if it doesn't have the obious cause I stated before: one of the two sides [in your example, osmo-hnbgw] has been re-started after SCCP connections were already established between hnbgw and sgsn.

Actions #3

Updated by lynxis over 2 years ago

So I've re-started the osmo-hnbgw.

The problem is the SCCP-SCOC fsm is terminated and get never reinitiated. E.g. when the remote is unavailable.
It also drops the messages when the connection is down at that time.
After the fsm has been terminated, the osmo-hnbgw still tries to pass down messages with a connection id to which no instance can be found.

20211013205159495 DLSCCP <0011> sccp_scoc.c:933 SCCP-SCOC(1001)[0x612000002c20]{CONN_PEND_OUT}: state_chg to IDLE
20211013205159495 DLSCCP <0011> sccp_scoc.c:520 SCCP-SCOC(1001)[0x612000002c20]{IDLE}: Terminating (cause = OSMO_FSM_TERM_REQUEST)
20211013205159495 DLSCCP <0011> sccp_scoc.c:520 SCCP-SCOC(1001)[0x612000002c20]{IDLE}: Freeing instance
20211013205159495 DLSCCP <0011> fsm.c:573 SCCP-SCOC(1001)[0x612000002c20]{IDLE}: Deallocated
...

20211013205159693 DRUA <0002> hnbgw_rua.c:223 000295-0000123456@ipaccess.com rua_to_scu() IuPS to RI=2,PC=188,SSN=142, rua_ctx_id 23 scu_conn_id 1001
20211013205159693 DLSCCP <0011> sccp_scoc.c:1731 Received SCCP User Primitive (N-DATA.request)
20211013205159693 DLSCCP <0011> sccp_scoc.c:1757 Received unknown conn-id 1001 for primitive N-DATA.request

Actions #4

Updated by lynxis over 2 years ago

  • Status changed from Feedback to In Progress
Actions #5

Updated by laforge over 2 years ago

On Wed, Oct 13, 2021 at 07:02:42PM +0000, lynxis [REDMINE] wrote:

The problem is the SCCP-SCOC fsm is terminated and get never reinitiated. E.g. when the remote is unavailable.

The way how the SCCP-User-SAP works is that this is the specified
behavior. It is the job of the SCCP user to crate new connections or to
re-try, not the SCCP-Provider. Just like in TCP: Once your connection
is dead, the application needs to take care of establishing a new one.

It also drops the messages when the connection is down at that time.
After the fsm has been terminated, the osmo-hnbgw still tries to pass down messages with a connection id to which no instance can be found.

Sounds like the hnbgw doesn't process the disconnect/release indication
on the SCCP User SAP.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)