Project

General

Profile

Actions

Bug #4952

closed

Fix NRI routing in case SGSN is down

Added by laforge almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
01/15/2021
Due date:
% Done:

100%

Spec Reference:

Related issues

Related to osmo-gbproxy - Feature #4951: more TTCN3 tests for SGSN poolingIn Progressdaniel01/15/2021

Actions
Related to osmo-gbproxy - Bug #4897: gbproxy2: Re-introduce handling of NS_AFF_CAUSE_FAILURENewdaniel12/08/2020

Actions
Actions #1

Updated by laforge almost 2 years ago

  • Subject changed from ix NRI routing in case SGSN is down to Fix NRI routing in case SGSN is down
  • Priority changed from Normal to High

I believe the NAS Node Selection Function must take into account whether or not the given SGSN is currently available.

Let's assume we have two SGSNs in the pool:
  • SGSN 0 serves NRI 3
  • SGSN 1 serves NRI 4

Now assume SGSN 0 has an outage.

Any traffic without TLLI or with TLLI mapping to the NULL NRI will now trigger the selection function. We currently choose any configured SGSN, unless it is administratively disabled with "no allow-attach". However, we do not check if the given PTP-BVC at that SGSN is currently available or not.

Actions #2

Updated by laforge almost 2 years ago

Another interesting question is what is supposed to happen with traffic with a NRI for the now-defunct SGSN pool member. If we simply route it to any other SGSN, that SGSN will not know what to do with that traffic? But at least it should then return some error to the MS, so the MS can re-attach?

Actions #3

Updated by laforge almost 2 years ago

  • Related to Feature #4951: more TTCN3 tests for SGSN pooling added
Actions #4

Updated by laforge almost 2 years ago

laforge wrote:

Another interesting question is what is supposed to happen with traffic with a NRI for the now-defunct SGSN pool member. If we simply route it to any other SGSN, that SGSN will not know what to do with that traffic? But at least it should then return some error to the MS, so the MS can re-attach?

daniel , lynxis any feedback on this one? Any ideas? What is the expected behavior in your understanding of the specs?

Actions #5

Updated by daniel almost 2 years ago

  • Related to Bug #4897: gbproxy2: Re-introduce handling of NS_AFF_CAUSE_FAILURE added
Actions #6

Updated by daniel almost 2 years ago

laforge wrote:

Another interesting question is what is supposed to happen with traffic with a NRI for the now-defunct SGSN pool member. If we simply route it to any other SGSN, that SGSN will not know what to do with that traffic? But at least it should then return some error to the MS, so the MS can re-attach?

Yeah, I believe it will work just like you describe, I don't really see how SGSN pooling can help with this sort of failure (other than offering a new SGSN to reconnect to).

If an outage can be planned in advance you should use the load-redistribution function. For that you would
  • Mark the SGSN as no allow-attach in gb-proxy so new connections ignore this SGSN
  • Have the SGSN reallocate its NULL-NRI on (periodic) RA update, set the update timer to min. value and force the MS to stand-by

See https://projects.sysmocom.de/attachments/download/4350/SGSNs_in_Pool.pdf (pg. 8-10)

That way all MS currently on that SGSN will slowly migrate away and the SGSN can be taken offline.

But you are right that gbproxy currently doesn't handle the case correctly where an SGSN is down (e.g. because NS failed)

Actions #7

Updated by daniel almost 2 years ago

  • Assignee set to daniel
Actions #8

Updated by daniel over 1 year ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 20
Actions #9

Updated by daniel over 1 year ago

  • % Done changed from 20 to 60

TTCN3 Test here: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24442
Fails without the osmo-gbproxy fix.

Fix for osmo-gbproxy: https://gerrit.osmocom.org/c/osmo-gbproxy/+/24443

Actions #10

Updated by daniel over 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 60 to 100

Patches are merged and the new test should pass on the next run

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)