Bug #5023: race condition in NS over IP sub-network - libosmocore - Open Source Mobile Communications

Actions

Copy link

Bug #5023

closed

race condition in NS over IP sub-network

Added by laforge about 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

laforge

Category:

Target version:

Start date:

02/13/2021

Due date:

% Done:

90%

Spec Reference:

Description

for FR transport layer, TS 48.016 specifies very clearly what is the alive/dead state of a NS-VC in the context of block/unblock/reset procedures
for IP transport with IP-SNS, it doesn't make any similar specifications
our current assumption is that after SNS-CONFIG, we create all NS-VCs as dead, start the test/alive procedure, and mark them as alive once we see a NS-ALIVE-ACK.

Furthermore, TS 48.016 specifies that each side can start the alive/test procedure at any time, independent of each other. However, it does not specify if the ALIVE/DEAD state should be kept separately on each side, or if that state is presumed to be global.

This leads to problems in our TTCN3 tests:

SNS-CONFIG completes
gbproxy sends NS-ALIVE to SGSN
- SGSN responds with NS-ALIVE-ACK
- NS-VC now assumed as ALIVE on gbproxy side
gbproxy sends NS-UNITDATA) to SGSN
SGSN considers the NS-VC still as dead, as it did not yet (start or) complete the NS-ALIVE procedure
SGSN returns NS-STATUS

In the above example "SGSN" is a simulated SGSN in our TTCN-3 test suite, but this could just as well happen with a real SGSN.

We have the following options:

consider receiving a NS-ALIVE from a peer as an indication that the link must be ALIVE, irrespective of what our own alive/test procedures say
- that's a bit of a hack, IMHO
consider all NS-VC after SNS-CONFIG to be ALIVE, and only mark them DEAD after the test procedure fails for NS-ALIVE-RETRIES number of times
- this seems to be one possible reading of TS 48.016, as it doesn't specify if NS-VC are considered ALIVE or DEAD when SNS-CONFIG completes
always have some delay after SNS-CONFIG in the SGSN
- doesn't solve the problem if a third party SGSN as no such delay
don't wait only for a locally-originated NS-ALIVE procedure to succeed, but also wait until a first NS-ALIVE from the peer has been received, before considering a NS-VC as ALIVE

To me, option 2 seems most attractive so far.

Related issues

Actions

Copy link

Updated by laforge about 3 years ago

lol. Actually, NS_Emulation.ttcn already sets the status to ALIVE_UNBLOCKED after the SNS-CONFIG-ACK is received.

However, as the NS_Emulation has multiple components, and SNS is processed in the NS_CT while the alive/dead state is in a per-NSVC component, there is some concurrency happening:

the NsCtrlRequest:StartAliveProcedure message from NS_CT to NSVC_CT takes longer to arrive than the NS-UNITDATA from the BSS (gbproxy in this case).

So it's a very classic race condition.

Actions

Copy link

Updated by laforge about 3 years ago

Related to Bug #4974: gbproxy-ttcn3-test over framerelay are unstable added

Actions

Copy link

Updated by laforge about 3 years ago

Ideally, we would

stop processing incoming messages from the NS/UDP port once we received one NS message
do whatever processing in whatever number of components
only re-enable processing of incoming messages on the port once any potential state changes have settled

However, this is not possible in TTCN-3. One can halt() the port, but any subsequent start() will clear all pending incoming messages.

An alternative would be to use a procedure port between NSVC_CT and NS_CT. This would allow us to "call" into NS_CT, and once it is done, perform any state changes before we process the next incoming NS/UDP message. However, that doesn't work as there are many NSVC components, and all of them would have to update their state, not just the one through which the state-changing SNS PDU was received.

Actions

Copy link