Feature #1969: add IND for proper UMTS auth resync - OsmoHLR - Open Source Mobile Communications

Actions

Copy link

Feature #1969

closed

add IND for proper UMTS auth resync

Added by neels over 7 years ago. Updated about 7 years ago.

Status:

Closed

Priority:

High

Assignee:

neels

Target version:

Start date:

03/07/2017

Due date:

% Done:

100%

Spec Reference:

Description

USIMs have an IND value denoting the lower number of bits that are "below" the synchronized SQN significance,
i.e. we need to flip at least the (IND+1)th bit when doing AUTS resync. See 3GPP TS 33.102.

To be able to do that, OsmoHLR needs to tell osmo_auth_gen_vec_auts() the USIM's IND value.
Each USIM can technically be configured with a different IND, so add an IND column in the auc_3g table.

A default for IND seems to be 5.

Related issues

Actions

Copy link

Updated by neels over 7 years ago

Related to Bug #1968: upon auth resync with osmo_auth_gen_vec_auts(), use MS.SQN + (2 ^ IND) added

Actions

Copy link

Updated by neels over 7 years ago

Basically all we need to do to get sysmoUSIM-SJS1 working with OsmoHLR is to have a default SQN of 32 in the database.
See #1965-13.

However, this does not mean that all other USIMs behave in the same way.

Actions

Copy link

Updated by neels over 7 years ago

neels wrote:

Basically all we need to do to get sysmoUSIM-SJS1 working with OsmoHLR is to have a default SQN of 32 in the database.
See #1965-13.

Rather, see #1965-17 through #1965-20.
We need to increment SEQ (== increment SQN by (1<<IND)) for each tuple dealt out.

Actions

Copy link

Updated by laforge over 7 years ago

Related to Support #1965: use sysmoUSIM-SJS1 with 3G OsmoMSC added

Actions

Copy link

Updated by neels over 7 years ago

Status changed from New to In Progress
Priority changed from Normal to High

Actions

Copy link

Updated by neels over 7 years ago

Has duplicate Feature #1970: separate SQN for CS and PS domain added

Actions

Copy link

Updated by neels over 7 years ago

We also need to set the i for each generated auth tuple according to which HLR client we're generating for,
i.e. use separate IND indexes for CS and PS domain. This is suggested in 3GPP TS 33.102 annex C.

The following is a discussion of the various implementations for indexing the clients in the SQN's IND range. I swerve back and forth at least twice, coming to a conclusion at the end. It may be too much to read, but now that it's there I'm not going to spend time making it shorter. Feel free to skip to the end and refer back to the rest if questions remain.

The initial idea was to use a hash on the HLR client ID to derive the index.
There is some difficulty with that: even though practically all USIMs will have the same IND == 5,
technically each USIM can have a different IND value and would need a different hash range.

we could decide to set the IND globally and disregard individual USIMs' IND values.
It seems at this point that 5 will be the only IND value we are likely to see ever.
OTOH if we're implementing this now, we might as well do this properly from the start.

we could re-hash the HLR client's ID for each USIM's individual IND setting.
Puts additional load per auth tuple generation; hard to avoid hash collisions (see below).

we could hash the HLR client's ID to e.g. a uint32_t and modulo that to the IND range per USIM.
Hard to avoid hash collisions.

we could query the database for the smallest used IND value and then use that length as global hash.
But the smallest used IND value could change at any time, as subscribers are added to the db.

we could count the number of connected clients to the HLR and internally use that as the range for the hash.
As long as all USIMs' INDs have a larger-or-equal range than that, all is well.
But the amount of clients can change at any time.

a global config item defaulting to 5 could define an IND value that is lower-or-equal than all
the USIMs' INDs that are going to be used, and at the same time sufficiently large to accomodate the nr
of HLR clients that are expected to connect to the HLR. This seems to be the optimal compromise right now.

Hash collisions: I do also have the idea to add a check that tries to avoid two clients using the same IND slot,
in case they happen to end up with the same hash -- if we have 32 IND slots and two clients, it would be
clumsy to place both on the same IND slot: each time the one client uses an auth tuple, the other
client's unused auth tuples are invalidated and discarded by an AUTS; this would repeat every time the
one or the other client comes back to use an auth tuple. We would see a lot more auth tuples being generated
than necessary, most of which would be discarded unused.

The idea behind the hash is to get the same IND index even after a restart of the HLR and/or its clients.
But if we are restarting once in a while and clients end up in a different IND slot, that would cause
AUTS resync once. If we're using the same slot across clients, that would cause AUTS all the time.

Under this aspect it seems more appropriate to use the HLR's client array index as IND index instead of a hash.
If we want clients to remain on the same IND slot, we could sort the client array list alphabetically.
This would only be effective if all clients are always present in the same constellation. If at some point
a new client arrives far up in the list, all other clients' IND indexes change, causing an AUTS resync
"burst". So the list index also isn't really that elegant of a solution.

After this discussion, my plan is to have a config item defining a global IND so that optimally
nr-of-clients-range <= global-IND <= all used USIMs' IND
where the default of 5 will most likely accomodate all use cases everywhere.

I would still add an IND value per USIM in the auc_3g table, which may be larger than the
global-IND setting. This way each USIM's SQN would still increment with sufficiently large
steps, and each USIM's IND range would be guaranteed to accomodate the IND-indexes for the
clients. Should any used USIM have an IND smaller than required to accomodate a client's index,
this should produce a warning message in the log, but the HLR would still continue to operate
using a modulo-index. There could be more AUTS than necessary, but things would still work.

Why do we need the global-IND in the first place? Because then we can both hash the
HLR clients' IDs to obtain an IND-index, so that after HLR and/or client restarts each
client tends to get the same IND index right away; and we can also easily avoid index
collisions, because we know what global index range to check for. Store the hashed index
for each client upon connecting, and if an index collision occurs, just set a different
value for one of the clients (e.g. the one connecting later, or the one later in the alphabet).
This scheme thus would guarantee optimally minimal AUTS resyncing.

(conclusion:)

On the other hand, if we accept that after HLR or clients restart, the clients' indexes may
change, causing a short period of heightened AUTS activity:

we don't need a hash
we don't need a global-IND setting
we don't need to check for collisions

because we can simply use the list index in the client list. The clients will get an index according
to their sequence of connecting to the HLR, probably causing a different index number after each
reconnect, in turn probably causing some "unnecessary" AUTS, but only once per subscriber, client and
reconnect. This seems to me a small price to pay for the benefits of:

dramatically less complex code, i.e.
- less reviewer confusion,
- less code maintenance
- less bug surface,
one less config item i.e.
- less user confusion,
- again less code
guaranteed avoidance of index collisions (from hash collision) i.e.
- avoiding re-AUTS-ing all subscribers constantly

With minor code we could avoid that one client disconnecting causes all other clients to
change their indexes (on each connect, look for the smallest unused index in the list,
then store index in the client struct).

If we then encounter a client index too large for a given USIM's IND range, we modulo the index
to fit in the IND and print warnings in the log.

So an occasional re-AUTS from a hickup in the core network is the only drawback,
and such hickups should be rare in a well maintained setup. Even if not rare, it would likely
cause less re-AUTS than an index collision (arising from using a hash and some bad luck).

Actions

Copy link