Project

General

Profile

Feature #2623

SCCP/M3UA: detect restart of osmo-msc and osmo-sgsn

Added by neels over 3 years ago. Updated 3 months ago.

Status:
New
Priority:
High
Assignee:
Target version:
-
Start date:
11/07/2017
Due date:
% Done:

50%

Spec Reference:

Description

Connecting osmo-bsc and osmo-hnbgw to the MSC and SGSN via an OsmoSTP instance, it is currently not possible to detect that the MSC or SGSN has restarted.

Scenario: using a sysmoBTS as a NITB, change MSC config, restart MSC -- now osmo-bsc happily continues to run and does not even notice that it is an entirely new MSC instance running in the core net now.

In the old days, the SCCPlite link would go down, but since now OsmoSTP is in-between and has no concept of who depends on who, no-one is notifying BSC or HNBGW that MSC or SGSN have gone down. Find out how this is intended to be solved if at all, and devise a way how osmo-bsc will restart and/or reconnect to a new MSC instance, and so forth.


Related issues

Related to OsmoSGSN - Bug #3403: osmo-sgsn doesn not connect properly with via SCCP when restartedNew07/17/2018

Related to Cellular Network Infrastructure - Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnectedResolved08/11/2020

Associated revisions

Revision bfc85e43 (diff)
Added by pespin 9 months ago

ctrl: Fix CTRL TRAP for {msc.X,msc_)connection_status not sent

The tx TRAP callback is triggered through a signal which is never sent in
osmo-bsc code, and never was as far as I can tell going quite far in the
logs.
In the meanwhile, the msc_connection_status was left in favour of
multi-msc msc.X.connection_status CTRL variable, so let's prepre the cb
function to work for that onei too, dropping global variables which may lead
to wrong output in multi-msc environments, and simply use msc->nr==0 for
the old variable "msc_connection_status".

The signal is now triggered in a_reset when the A conn becomes connected
or disconnected. As a result, a user waiting for the disconnect event
may notice that the status may be changed with a noticeable delay, since
the A conn may be reset only due to high layer timeouts after several
repeated failures (T4, BAD_CONNECTION_THRESOLD).

Related: OS#2623
Related: OS#4701
Related: SYS#5046
Change-Id: I645d198e8e1acd0aba09d05cb3ae90443946acf8

Revision 6cb841b9 (diff)
Added by laforge 3 months ago

xua: Implement SNM availability/unavailability messaging

M3UA and SUA have one sub-protocol called [S]SNM, through which the
SG informs the ASP about certain destinations (point codes) becoming
available (DAVA) or unavailable (DUNA) in the SS7 network.

This patch adds support for
  • generating DAVA/DUAN on a SGP when the AS FSM changes to/from AS-ACTIVE
  • receiving DAVA/DUNA on an ASP and informing other "SG role" AS/ASP
  • processing DAUD from ASP received by SG, generating relate DAVA/DUNA
    responses

Related: OS#2623
Change-Id: Id92be4691b0fd77598a6edb642c028bbd8c5b623

Revision 943affdd (diff)
Added by laforge 3 months ago

sccp: Notify users of point code available/unavailable

  • add N-PCSTATE.ind and N-STATE.ind definitions to SCCP user SAP
  • add minimal SCMG (SCCP Management) and LBCS (Local Broadcast)
  • generate MTP-PAUSE.ind/MTP-RESUME.ind based on received xUA DUNA/DAVA
  • generate N-PCSTATE.ind towards the local SCCP users

Change-Id: Idb799f7d7ab329ad12f07b7cbe6336da0891ae92
Related: OS#2623, OS#3403, OS#4701

History

#1 Updated by laforge over 3 years ago

  • Assignee set to laforge

neels wrote:

Connecting osmo-bsc and osmo-hnbgw to the MSC and SGSN via an OsmoSTP instance, it is currently not possible to detect that the MSC or SGSN has restarted.

The specified way to treat this is the A interface RESET procedure (and I'm sure Iu has the same?). So the MSC should perform a RESET procedure towards the BSC after it has started new, to erase all state in the BSC.

What's problematic here is that with our "dynamically accept any BSC from any point code" approach, the re-started MSC has no clue about where BSCs might be. One possible (but ugly) approach would be to simply flood this RESET to an entire range of point codes that's configurable at the MSC.

Scenario: using a sysmoBTS as a NITB, change MSC config, restart MSC -- now osmo-bsc happily continues to run and does not even notice that it is an entirely new MSC instance running in the core net now.

I presume you're hinting that the "MSC config change" included a change of the MSC's point code?

One could implement the classic SCCP messsages / primitives for infomring the BSC that the MSC is no longer reachable at the old point code. On the MTP-level, this is a MTP-STATUS.ind from the MTP up into the SCCP stack. The SCCP stack then would use N-PCSTATE.ind (Q.711 6.3.2.3.3)

The BSC would then receive a N-PCSTATE.ind and thus know the MSC is (at least temporarily) gone. However, an intermittent failure of the intermediate signaling network would look exactly the same, so there would probably need to be some kind of timeout, i.e. if the MSC is not again reachable shortly after it is gone, we behave as if we received an implicit RESET.

Another way to move forward is for the MSC to keep local state as to which BSCs were connected, so that after a crash it can send RESET to all of those point codes.

In the old days, the SCCPlite link would go down, but since now OsmoSTP is in-between and has no concept of who depends on who, no-one is notifying BSC or HNBGW that MSC or SGSN have gone down. Find out how this is intended to be solved if at all, and devise a way how osmo-bsc will restart and/or reconnect to a new MSC instance, and so forth.

Well, with an entire (routed!) signaling network between BSC and MSC, the status of the signaling link (M3UA connection) has nothing to do anymore with whether or not the MSC is reachable. Let's imagine one or multiuple STPs in between: Any of them can go down, or any of the links can temporarily fail and recover without the BSC or MSC ever being down or losing their state.

So I guess the best we can do is for the BSC to detect "MSC unavailability" by timeout on any of its SCCP connections or connectionless procedures. If and when the MSC detects any message from an (unknown) BSC, the MSC will generate a RESET procedure and remove all state. Isn't this what's already happening now?

#2 Updated by neels over 3 years ago

laforge wrote:

The specified way to treat this is the A interface RESET procedure (and I'm sure Iu has the same?). So the MSC should perform a RESET procedure towards the BSC after it has started new, to erase all state in the BSC.

What's problematic here is that with our "dynamically accept any BSC from any point code" approach, the re-started MSC has no clue about where BSCs might be. One possible (but ugly) approach would be to simply flood this RESET to an entire range of point codes that's configurable at the MSC.

Scenario: using a sysmoBTS as a NITB, change MSC config, restart MSC -- now osmo-bsc happily continues to run and does not even notice that it is an entirely new MSC instance running in the core net now.

I presume you're hinting that the "MSC config change" included a change of the MSC's point code?

Not really. I mean if the MSC changes configuration items that affect the BSC (though nothing comes to mind); actually I think I also meant that subscribers are still considered attached though the restarted MSC does not. The point is, usually we restart programs when their "server" restarts, like when the BSC goes down, we restart the BTS and hence are sure they are in sync. If the MSC goes down, the BSC currently doesn't ever get notified, because of the above mentioned: we only do the BSSAP Reset dance when the BSC re-attaches.

One could implement the classic SCCP messsages / primitives for infomring the BSC that the MSC is no longer reachable at the old point code. On the MTP-level, this is a MTP-STATUS.ind from the MTP up into the SCCP stack. The SCCP stack then would use N-PCSTATE.ind (Q.711 6.3.2.3.3)

The BSC would then receive a N-PCSTATE.ind and thus know the MSC is (at least temporarily) gone. However, an intermittent failure of the intermediate signaling network would look exactly the same, so there would probably need to be some kind of timeout, i.e. if the MSC is not again reachable shortly after it is gone, we behave as if we received an implicit RESET.

who would trigger that, the OsmoSTP?

Another way to move forward is for the MSC to keep local state as to which BSCs were connected, so that after a crash it can send RESET to all of those point codes.

i.e. keep persistent state. That would solve it, but we would need persistent state ;)

So I guess the best we can do is for the BSC to detect "MSC unavailability" by timeout on any of its SCCP connections or connectionless procedures. If and when the MSC detects any message from an (unknown) BSC, the MSC will generate a RESET procedure and remove all state. Isn't this what's already happening now?

hmm, need to check

#3 Updated by laforge over 3 years ago

Hi Neels,

On Mon, Dec 04, 2017 at 12:11:42PM +0000, neels [REDMINE] wrote:

actually I think I also meant that subscribers are still considered
attached though the restarted MSC does not.

This is what the RESET procedure is for.

The point is, usually we restart programs when their "server"
restarts, like when the BSC goes down, we restart the BTS and hence
are sure they are in sync.

BTS+BSC is different: They cannot function without each other. A BTS
cannot even allocate a radio channel without a BSC. So there's very
tight integration.

Between BSC and MSC it's a bit different. At least with modern features
like MOCN, you can actually have a BSC talking to multiple MSCs (even of
different operators) and there is not such a strict inter-dependency.

If the MSC goes down, the BSC currently doesn't ever get notified,
because of the above mentioned: we only do the BSSAP Reset dance when
the BSC re-attaches.

The BSC should get notified as soon as it sends the first packet to the
MSC, which triggers a RESET procedure from the MSC as it doesn't know
any state about.

This should be rather quick. The only situation in which this takes a
long time is if there's absolutely no activity from any MS.

One could implement the classic SCCP messsages / primitives for
infomring the BSC that the MSC is no longer reachable at the old
point code. On the MTP-level, this is a MTP-STATUS.ind from the MTP
up into the SCCP stack. The SCCP stack then would use
N-PCSTATE.ind (Q.711 6.3.2.3.3)

The BSC would then receive a N-PCSTATE.ind and thus know the MSC is
(at least temporarily) gone. However, an intermittent failure of
the intermediate signaling network would look exactly the same, so
there would probably need to be some kind of timeout, i.e. if the
MSC is not again reachable shortly after it is gone, we behave as if
we received an implicit RESET.

who would trigger that, the OsmoSTP?

Yes, the STP (or also the local libosmo-sigtran on the client side)
would generate such messages whenever point codes become available or
unavailable.

Another way to move forward is for the MSC to keep local state as to which BSCs were connected, so that after a crash it can send RESET to all of those point codes.

i.e. keep persistent state. That would solve it, but we would need persistent state ;)

well, this is why normally one configures the point code of each BSC
in the MSC. At that point the MSC can send a RESET after start to each
of them :)

So I guess the best we can do is for the BSC to detect "MSC
unavailability" by timeout on any of its SCCP connections or
connectionless procedures. If and when the MSC detects any message
from an (unknown) BSC, the MSC will generate a RESET procedure and
remove all state. Isn't this what's already happening now?

hmm, need to check

yes, if that's not the case we're definitely broken.

#4 Updated by laforge over 2 years ago

  • Related to Bug #3403: osmo-sgsn doesn not connect properly with via SCCP when restarted added

#5 Updated by laforge almost 2 years ago

  • Priority changed from High to Urgent

#6 Updated by laforge over 1 year ago

  • Assignee changed from laforge to neels
  • Priority changed from Urgent to High

ticket should have been moved back to neels after my response 5 months ago

#7 Updated by neels 9 months ago

  • Related to Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnected added

#8 Updated by laforge 3 months ago

  • % Done changed from 0 to 50

I've implemented notification of SCCP users via the SCCP-User SAP in https://gerrit.osmocom.org/c/libosmo-sccp/+/22778

This means that every time a [remote] point code becomes available or unavailable, the SCCP user application (MSC, BSC, SGSN, SMLC, ...) should now receive a N-PCSTATE.ind stating the "affected point code". and either
sp_status OSMO_SCCP_SP_S_INACCESSIBLE or OSMO_SCCP_SP_S_ACCESSIBLE.

This indication should then be used by the applications to trigger whatever internal logic that needs to happen if a remote point-code disappears or re-appears. I'll leave that part to @neels. I'll just implement some tests to ensure we get those notifications as expected now.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)