Bug #5023
closedrace condition in NS over IP sub-network
90%
Description
- for FR transport layer, TS 48.016 specifies very clearly what is the alive/dead state of a NS-VC in the context of block/unblock/reset procedures
- for IP transport with IP-SNS, it doesn't make any similar specifications
- our current assumption is that after SNS-CONFIG, we create all NS-VCs as dead, start the test/alive procedure, and mark them as alive once we see a NS-ALIVE-ACK.
Furthermore, TS 48.016 specifies that each side can start the alive/test procedure at any time, independent of each other. However, it does not specify if the ALIVE/DEAD state should be kept separately on each side, or if that state is presumed to be global.
This leads to problems in our TTCN3 tests:- SNS-CONFIG completes
- gbproxy sends NS-ALIVE to SGSN
- SGSN responds with NS-ALIVE-ACK
- NS-VC now assumed as ALIVE on gbproxy side
- gbproxy sends NS-UNITDATA) to SGSN
- SGSN considers the NS-VC still as dead, as it did not yet (start or) complete the NS-ALIVE procedure
- SGSN returns NS-STATUS
In the above example "SGSN" is a simulated SGSN in our TTCN-3 test suite, but this could just as well happen with a real SGSN.
We have the following options:- consider receiving a NS-ALIVE from a peer as an indication that the link must be ALIVE, irrespective of what our own alive/test procedures say
- that's a bit of a hack, IMHO
- consider all NS-VC after SNS-CONFIG to be ALIVE, and only mark them DEAD after the test procedure fails for NS-ALIVE-RETRIES number of times
- this seems to be one possible reading of TS 48.016, as it doesn't specify if NS-VC are considered ALIVE or DEAD when SNS-CONFIG completes
- always have some delay after SNS-CONFIG in the SGSN
- doesn't solve the problem if a third party SGSN as no such delay
- don't wait only for a locally-originated NS-ALIVE procedure to succeed, but also wait until a first NS-ALIVE from the peer has been received, before considering a NS-VC as ALIVE
To me, option 2 seems most attractive so far.
Related issues
Updated by laforge about 3 years ago
lol. Actually, NS_Emulation.ttcn already sets the status to ALIVE_UNBLOCKED after the SNS-CONFIG-ACK is received.
However, as the NS_Emulation has multiple components, and SNS is processed in the NS_CT while the alive/dead state is in a per-NSVC component, there is some concurrency happening:
- the NsCtrlRequest:StartAliveProcedure message from NS_CT to NSVC_CT takes longer to arrive than the NS-UNITDATA from the BSS (gbproxy in this case).
So it's a very classic race condition.
Updated by laforge about 3 years ago
- Related to Bug #4974: gbproxy-ttcn3-test over framerelay are unstable added
Updated by laforge about 3 years ago
- stop processing incoming messages from the NS/UDP port once we received one NS message
- do whatever processing in whatever number of components
- only re-enable processing of incoming messages on the port once any potential state changes have settled
However, this is not possible in TTCN-3. One can halt() the port, but any subsequent start() will clear all pending incoming messages.
An alternative would be to use a procedure port between NSVC_CT and NS_CT. This would allow us to "call" into NS_CT, and once it is done, perform any state changes before we process the next incoming NS/UDP message. However, that doesn't work as there are many NSVC components, and all of them would have to update their state, not just the one through which the state-changing SNS PDU was received.
Updated by laforge about 3 years ago
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/22899 contains a first work-around.
Updated by laforge about 3 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 90
patch merged.