Project

General

Profile

Bug #4312

GSUP keepalives / connection loss detection

Added by neels about 2 months ago.

Status:
New
Priority:
High
Assignee:
-
Target version:
-
Start date:
12/06/2019
Due date:
% Done:

0%


Description

In the presence of unreliable back-haul mesh between villages, the GSUP
connection can also not be seen as reliable. We would expect to see TCP
stalls due to packet loss, etc.

Have you considered this in your implementation and/or done any testing
based on simulated lossy networks to ensure we properly use either TCP
keepalives or IPA application-level PING/PONG to detect lost connections
and recover from such situations (by closing the old and
re-establishing)?

Unreliable networks can be easily simulated by Linux built-in 'tc netem'
for providing configurable packet loss / latency / jitter.

I also saw some comments / code related to "if a second connection using
the same IPA ID arrives, we're screwed" (paraphrasing here). I would
expect this not to be uncommon even if every MSC/HLR out there is
configred correctly exactly because e.g .the remote MSC/HLR has already
decided that the TCP/GSUP is dead and starts to reconnect by performing
a local-end release, while the "local" MSC/HLR still thinks the old
connection is alive. If the old connection "wins" (i.e. is preferred)
I see potential trouble here.

Situations like that probably warrant some carefully designed tests to
create exactly those situations.

Goals:
a) ensuring that keepalive on either TCP or IPA is enabled and works, and
b) creating situations where the same peer establishes a second new connection
while the old one is still not torn down (timeout not expired yet, FIN packets
lost, ...)

(Keeping as one issue because these aspects are tightly related...)

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)