Project

General

Profile

Bug #4629

statically configured Gb interface not recovering after SGSN restart

Added by laforge over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
06/23/2020
Due date:
% Done:

100%

Spec Reference:
TS 48.018 Section 8.4

Description

In a situation when OsmoSGSN is interworking via Gb with a third-party BSS, we have a problem recovering after a SGSN restart.

The BSS continues to send uplink BSSGP PDUs like nothing happened, and OsmoSGSN responds with BSSGP STATUS (Cause = BVCI unknown). Normally, we would expect the BSS to understand that and follow up with a BVC-RESET in order to re-create the BVC for that BVCI. However, nothing of that sort happens.

In theory, the SGSN could also do a BVC-RESET. But it's a bit of a chicken-and-egg situation: If the BVC does not exist, as the SGSN has just restarted and lost all state, how would it know which BSSes exist out there, and send BVC-RESET to all of them?

So we'd have to cheat a bit and wait until any BSSGP PDU for a non-existant BVC is received, and then use the BVCI from that to send a SGSN-originated BSSGP RESET.

Associated revisions

Revision 78af6ba5 (diff)
Added by laforge over 1 year ago

gprs_ns: Set sockaddr_in.sin_family for persistent NSVCs

We cannot just set sockaddr_in.sin_addr + sin_port, we also must
initializa sin_family. The reason this has worked so far is
because we probably always first received a NS packet from the
peer, rather than being the first one to send.

Change-Id: I6cefc2cd5516c7a4c01a2cc040afca454e59dd57
Related: OS#4629

Revision e717f0b6
Added by laforge over 1 year ago

Send a BVC-RESET to all persistent Gb interfaces at start-up

3GPP TS 48.018 Section 8.4:

After any failure affecting the NSE, the party (BSS or SGSN) where
the failure resided shall reset the signalling BVC. After sending or
receiving a BVC-RESET PDU for the signalling BVC, the BSS shall stop all
traffic and initiate the BVC-RESET procedure for all BVCs corresponding
to PTP functional entities of the underlying network service entity. The
BSS must complete the BVC-RESET procedure for signalling BVC before
starting PTP BVC-RESET procedures.

TODO: We should not just trigger a single outbound BVC-RESET message,
but we should re-transmit them until we get a response. This would
likely entail adding FSMs to libosmogb, which we will leave for a later
point - it's anticipated that the NS + BSSGP code is undergoing quite
some changes in the coming months anyway, so leave it for then.

Change-Id: I0b46035b40709c38bb9ab9493c11031a577e3ee0
Closes: OS#4629
Depends: libosmocore.git I353adc1aa72377f7d4b3336d2ff47791fb73d62c

History

#1 Updated by laforge over 1 year ago

  • Spec Reference set to TS 48.018 Section 8.4

From 3GPP TS 48.018 Section 8.4

A BVC-RESET procedure is performed because of recovery procedures related to:
- a system failure in the SGSN or BSS that affects GPRS BVC functionality (e.g. processor recovery);
...
The BSS may also send BVC-RESET as a means to create the initial mapping between BVCIs and cell identifications. After any of the possible events stated above, the status of the affected BVCs may be inconsistent at the SGSN and the BSS. After performing the BVC Reset procedure all affected BVCs are assumed to be unblocked at the SGSN. The reset procedure forces a consistent state upon SGSN and BSS by requiring that after the completion of the BVC-Reset procedure the BSS initiates the block procedure for all affected BVCs that are marked as blocked at the BSS.

Even more interesting, section 8.4.1 seems to hold the key:

After any failure affecting the NSE, the party (BSS or SGSN) where the failure resided shall reset the signalling BVC. After sending or receiving a BVC-RESET PDU for the signalling BVC, the BSS shall stop all traffic and initiate the BVC-RESET procedure for all BVCs corresponding to PTP functional entities of the underlying network service entity. The BSS must complete the BVC-RESET procedure for signalling BVC before starting PTP BVC-RESET procedures.

So the SGSN does not need to know the BVCI of the individual PtP-BVCs, but it should simply send a BVC-RESET for the signaling BVC (BVCI=0), which should then trigger the related recovery. Let's try to implement that and test it.

#2 Updated by ipse over 1 year ago

Does OsmoSGSN/OsmoPCU support actual static Gb configuration? When we tried to configure that on the OsmoPCU/OsmoGbProxy side, we had to patch the code to achieve static Gb configuration (see our branch). The code was not clean enough to submit it for the master, though.

We also saw that in our case, a commercial SGSN sends BVC-RESET to our PCU as soon as it detects the NSE down, and keeps re-sending it until our PCU responds with ACK. I can share some traces if they could help.

#3 Updated by laforge over 1 year ago

On Thu, Jun 25, 2020 at 09:42:53AM +0000, ipse [REDMINE] wrote:

Does OsmoSGSN/OsmoPCU support actual static Gb configuration?

yes, OsmoSGSN is working here with a static Gb configuration and a
not-to-be-named third party PCU/BSC, except for the problem of recovery
described here.

When we tried to configure that on the OsmoPCU/OsmoGbProxy side, we had
to patch the code to achieve static Gb configuration (see our branch).

gbproxy should always have supported it, as especially on the FR/GRE
side, there are only static Gb configurations.

The code was not clean enough to submit it for the master, though.

I'll have a look if I find time :)

We also saw that in our case, a commercial SGSN sends BVC-RESET to our PCU as soon as it detects the NSE down, and keeps re-sending it until our PCU responds with ACK. I can share some traces if they could help.

Interesting behavior. I believe the spec state the exact opposite: As
long as the NSE/NSVC is down, it should not send BVC-RESET and only
start sending them once NS is up. However, for a static NS-IP Gb of
course you don't know when it's up or down as none of the
NS-BLOCK/UNBLOCK/RESET procedures are to be used.

#4 Updated by laforge over 1 year ago

https://gerrit.osmocom.org/c/osmo-sgsn/+/19027 should fix this. Adding a TTCN3 test case is not straight-forward as the BVC-RESET is only sent at start-up of the process, and we don't restart the SGSN during tests - and only start tests well after the SGSN has been started.

#5 Updated by laforge about 1 year ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

patch long merged

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)