Project

General

Profile

Bug #4952

Fix NRI routing in case SGSN is down

Added by laforge 9 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
01/15/2021
Due date:
% Done:

100%

Spec Reference:

Related issues

Related to osmo-gbproxy - Feature #4951: more TTCN3 tests for SGSN poolingIn Progress01/15/2021

Related to osmo-gbproxy - Bug #4897: gbproxy2: Re-introduce handling of NS_AFF_CAUSE_FAILURENew12/08/2020

Associated revisions

Revision 8d9fcf44 (diff)
Added by daniel 5 months ago

gbproxy: Test routing if an SGSN in a pool is down

If an SGSN in a pool is down we expect the messages to instead be sent
to a different SGSN in the pool. That SGSN will not necessarily know
what to do with those messages, but it should )implicitly) detach that
UE so that it can reattach at the new SGSN. Otherwise UEs on a failed
SGSN would simply stop working as the messages would never be forwarded
anywhere.

This test also adjusts the NS timers so the failed NSVCs are detected
faster.

Change-Id: I46a6b8082441843f428a7681566228e5de375bcb
Related: OS#4952

Revision 37518b34 (diff)
Added by daniel 5 months ago

Don't route messages to an SGSN if it is down

If an SGSN in a pool is down we expect the messages to instead be sent
to a different SGSN in the pool. That SGSN will not necessarily know
what to do with those messages, but it should (implicitly) detach that
UE so that it can reattach at the new SGSN. Otherwise UEs on a failed
SGSN would simply stop working as the messages would never be forwarded
anywhere.

Fixes: OS#4952
Change-Id: I3f794659866e1f31496a39ca631b3b042a60aa27

Revision 87c03a40 (diff)
Added by daniel 5 months ago

ttcn3-gbproxy-test*: Update gbproxy NS timers

Change-Id: I7d436327bb57a3f6c6b071c28308c8a74561d93c
Related: OS#4952

History

#1 Updated by laforge 9 months ago

  • Subject changed from ix NRI routing in case SGSN is down to Fix NRI routing in case SGSN is down
  • Priority changed from Normal to High

I believe the NAS Node Selection Function must take into account whether or not the given SGSN is currently available.

Let's assume we have two SGSNs in the pool:
  • SGSN 0 serves NRI 3
  • SGSN 1 serves NRI 4

Now assume SGSN 0 has an outage.

Any traffic without TLLI or with TLLI mapping to the NULL NRI will now trigger the selection function. We currently choose any configured SGSN, unless it is administratively disabled with "no allow-attach". However, we do not check if the given PTP-BVC at that SGSN is currently available or not.

#2 Updated by laforge 9 months ago

Another interesting question is what is supposed to happen with traffic with a NRI for the now-defunct SGSN pool member. If we simply route it to any other SGSN, that SGSN will not know what to do with that traffic? But at least it should then return some error to the MS, so the MS can re-attach?

#3 Updated by laforge 9 months ago

  • Related to Feature #4951: more TTCN3 tests for SGSN pooling added

#4 Updated by laforge 9 months ago

laforge wrote:

Another interesting question is what is supposed to happen with traffic with a NRI for the now-defunct SGSN pool member. If we simply route it to any other SGSN, that SGSN will not know what to do with that traffic? But at least it should then return some error to the MS, so the MS can re-attach?

daniel , lynxis any feedback on this one? Any ideas? What is the expected behavior in your understanding of the specs?

#5 Updated by daniel 9 months ago

  • Related to Bug #4897: gbproxy2: Re-introduce handling of NS_AFF_CAUSE_FAILURE added

#6 Updated by daniel 9 months ago

laforge wrote:

Another interesting question is what is supposed to happen with traffic with a NRI for the now-defunct SGSN pool member. If we simply route it to any other SGSN, that SGSN will not know what to do with that traffic? But at least it should then return some error to the MS, so the MS can re-attach?

Yeah, I believe it will work just like you describe, I don't really see how SGSN pooling can help with this sort of failure (other than offering a new SGSN to reconnect to).

If an outage can be planned in advance you should use the load-redistribution function. For that you would
  • Mark the SGSN as no allow-attach in gb-proxy so new connections ignore this SGSN
  • Have the SGSN reallocate its NULL-NRI on (periodic) RA update, set the update timer to min. value and force the MS to stand-by

See https://projects.sysmocom.de/attachments/download/4350/SGSNs_in_Pool.pdf (pg. 8-10)

That way all MS currently on that SGSN will slowly migrate away and the SGSN can be taken offline.

But you are right that gbproxy currently doesn't handle the case correctly where an SGSN is down (e.g. because NS failed)

#7 Updated by daniel 8 months ago

  • Assignee set to daniel

#8 Updated by daniel 5 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 20

#9 Updated by daniel 5 months ago

  • % Done changed from 20 to 60

TTCN3 Test here: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24442
Fails without the osmo-gbproxy fix.

Fix for osmo-gbproxy: https://gerrit.osmocom.org/c/osmo-gbproxy/+/24443

#10 Updated by daniel 5 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 60 to 100

Patches are merged and the new test should pass on the next run

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)