Feature #4259

consider mslookup behavior more like Paging instead of "wait for the slowest node"

Added by neels 3 months ago. Updated 2 months ago.

Target version:
Start date:
Due date:
% Done:



The current mslookup design has one weak point:
To request remote locations of a subscriber, we have to set a fixed timeout:
we wait N seconds that must be sufficient to get all responses, and after that timeout evaluate all replies to pick the youngest one.

This has drawbacks:

a) Scalability: we always have to pick a timeout that is longer than the slowest responder.
Say there is this one remote village with very large network latency, then we have to increase the mslookup timeout to ensure enough time for that remote village.

b) Delay: we always have to wait the entire timeout before we can carry on connecting the call, even if the subscriber is indeed very close by.

Instead, we could maybe already launch a Paging when a village is fairly sure that the subscriber is still in the local network,
and as soon as a Paging Response has come back, we could send an mslookup response with age=0.
An age of 0 cannot get any better, so at that point the mslookup client can exit the timeout and carry on right away.

To do such a Paging, there could be a custom GSUP message that osmo-hlr sends to osmo-msc.
osmo-msc would Page without having a voice/SMS transaction ready yet, and would reply to osmo-hlr as soon as the Paging has succeeded.
osmo-msc could then keep the channel open for a given timeout until the actual voice call / SMS /... arrives.


#1 Updated by neels 3 months ago

Extrapolating that behavior, we could actually extend the mslookup client's timeout to be very very long, and count on it to get a Paging Response (age=0) reply.

#2 Updated by neels 2 months ago

  • Status changed from New to Rejected

An interesting aspect of SIP routing:

What SIP can do instead is start routing calls to every responder as soon as an mslookup response comes in.
The first one to actually pick up wins. If one mslookup result is not the youngest one, we will end up paging at that site
and the subscriber will not respond.

For that to work, we've tweaked the mslookup client so that it can choose to receive only one result in the end,
or it can choose a timeout after which each and every incoming result is returned immediately (which can also be 0 = immediately).

So now we can choose a super long final failure timeout, while servicing calls immediately as responses are coming in,
but still don't lose those far outlier sites hat are taking long to respond, and we are freak-latency-surge tolerant.
For SMS, it is not harmful to wait as much as 10 seconds.

So it seems that this Paging behavior is already implemented by dispatching SIP calls immediately.
For SMS it won't be necessary.
If future services come up where it might be necessary, then we can reconsider.

This is an idea that we are unlikely to ever implement.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)