Project

General

Profile

Actions

Bug #4721

open

osmo-msc creates evil-twin entries in the VLR when an already attached IMSI does a LU by an unknown TMSI (was: MSC_Tests.TC_lu_by_tmsi_noauth_unknown fails sporadically locally)

Added by laforge over 3 years ago. Updated 24 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/19/2020
Due date:
% Done:

90%

Resolution:
Spec Reference:

Description

When running this test locally against a (sanitize-enabled) osmo-msc, I get sporadic failures.

The differnence in the log files seems to start:

successful case:

  Wed Aug 19 08:10:24 2020 DVLR <000e> gsm_04_08.c:1394 SUBSCR(IMSI-262420000000013:MSISDN-491230000013:TMSI-0x01020304) VLR: update for IMSI=262420000000013 (MSISDN=491230000013)

failing case:

Wed Aug 19 08:10:29 2020 DVLR <000e> gsm_04_08.c:1394 SUBSCR(IMSI-262420000000013:MSISDN-491230000013:TMSI-0x3BEDCD7C) VLR: update for IMSI=262420000000013 (MSISDN=491230000013) (NO CONN!)                      

The pcap file containing both cases is attached. They seem exactly identical, it's just that in the first (successful) case, there is a LU ACCEPT immediately, while in the second (failing) case, there is a LU REJECT after timeout of 5s.

I've seen the failure both when running a single test case, as well as when running the entire MSC_Tests.control in one batch.


Files

lu_success_and_fail.pcap lu_success_and_fail.pcap 6.55 KB laforge, 08/19/2020 06:23 AM
lu_success.log lu_success.log 20.8 KB laforge, 08/19/2020 06:23 AM
lu_fail.log lu_fail.log 9.51 KB laforge, 08/19/2020 06:23 AM

Related issues

Related to OsmoMSC - Bug #4191: vlr.c:762 Trying to dispatch event 1 to non-existent FSM instance!Resolvedneels09/06/2019

Actions
Actions #1

Updated by laforge over 3 years ago

actually it seems that it always fails when executed the second (and any subsequent) time after an osmo-msc start. It only passes once after MSC start. So some state appears to be leaking?

Actions #2

Updated by neels over 3 years ago

I added some pointer value logging, and apparently the vlr_subscr from the first test run sticks around in the VLR.
The second test run creates a duplicate vlr_subscr for the same IMSI.
When the second test run sends the GSUP subscriber update, it gets directed to the first vlr_subscr, while the active connection is associated with the second vlr_subscr.

There should be all sorts of provisions to avoid duplicate vlr_subscr.
I am now trying to figure out how this evil twin vlr_subscr is possible at all.

Test suite wise the question is whether we should clear the state of osmo-msc, i.e. issue some vty command to clear the entire VLR at the start of each test.

osmo-msc stability wise we should still figure out how this can happen and fix it.

Actions #3

Updated by neels over 3 years ago

Looking at the code there apparently is a gaping hole in osmo-msc's implementation, and the case that an already attached IMSI does a LU with an unknown TMSI is not covered.

Upon LU by TMSI, a new vlr_subscr gets created with that (so far unknown) TMSI.
The VLR asks for the IMSI identity.
When the response comes back, osmo-msc should in fact look up whether this IMSI is already attached in the VLR, which it fails to do.
Instead the new vlr_subscr also gets assigned that IMSI, and hence we have an evil twin in the VLR.

This occurs because the TC_lu_by_tmsi_noauth_unknown does not do an IMSI-Detach in the end,
but it still acks the TMSI-Reallocation, after which the initial TMSI is no longer kept in the VLR.
The second test run starts with the initial TMSI again, which is then regarded as unknown...

We need a separate test case playing through this scenario: attached subscr does LU with an unknown TMSI.

(btw, the failure is not at all related to the invalid TMSI that is also sent in this test case.)

Actions #4

Updated by neels over 3 years ago

  • Assignee changed from neels to laforge

it's not that trivial though: at the time of the ID Response, the subscriber may not yet be authenticated.
So anyone could come along, send an arbitrary IMSI in the ID Response, and essentially DoS on the VLR state of an already attached authentic subscriber.
The pre-existing VLR state must not be affected by an unauthenticated request.

It seems that osmo-msc must keep an unvalidated duplicate vlr_subscr entry, and only ensure a single validated vlr_subscr entry at the time of successful auth.

This potentially goes pretty deep into the VLR design: so far the assumption is that at most one vlr_subscr per IMSI exists.
The GSUP response is identified by IMSI, hence it potentially has to update multiple vlr_subscr entries.
Are there other code paths that need to deal with multiple vlr_subscr for the same IMSI?

Idea: make the current vlr_subscr_find_by_imsi() return only the one entry that has passed authentication.
Code paths that deal with unauthenticated vlr_subscr could use a separate vlr_subscr_find_by_imsi2() (or so) API.

A quick solution for now could remove the previous vlr_subscr from the VLR and add the new one, in the hope that it will also authenticate later...
(considering that the new vlr_subscr already may have lu_fsm, auth_fsm etc associated on it)

So, how important is this aspect at this point in time?

Actions #5

Updated by neels over 3 years ago

  • Subject changed from MSC_Tests.TC_lu_by_tmsi_noauth_unknown fails sporadically locally to osmo-msc creates evil-twin entries in the VLR when an already attached IMSI does a LU by an unknown TMSI (was: MSC_Tests.TC_lu_by_tmsi_noauth_unknown fails sporadically locally)

all of this seemed vaguely familiar, and now I found this patch from about a year ago:
http://git.osmocom.org/osmo-msc/commit/?h=neels/vlr_evil_twin3&id=be707bf7c7e30e4b1943fe7487d84e7ed70eb1cb
That patch does not cover the DoS aspect, I guess that is why I did not submit it for review.

Actions #7

Updated by neels over 3 years ago

In above mentioned patch from a year ago, I made provision to also handle ID Responses during CM Service Request and Paging Response.
Adding ttcn3 tests I realize that a CM Service Request for an unknown TMSI gets rejected immediately (pending a LU from the MS).

Tests for those were still missing:

CM-Service: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/19719

Paging Response: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/19721
actually uncovers a crash in osmo-msc, see #4724

Actions #8

Updated by neels over 3 years ago

  • Related to Bug #4191: vlr.c:762 Trying to dispatch event 1 to non-existent FSM instance! added
Actions #9

Updated by laforge about 3 years ago

  • Status changed from Feedback to Stalled
  • Assignee changed from laforge to neels

neels wrote:

So, how important is this aspect at this point in time?

I think we care most about consistency of our internal state (and passing the existing test suite) than to care about a DoS possibility at this point. OsmoMSC is primarily used in lab / test environments, for small/private networks etc.

Probably best to separate that second part out as a separate issue?

Actions #10

Updated by neels 24 days ago

  • Status changed from Stalled to In Progress
  • % Done changed from 0 to 90

I have hit this same problem again, when testing LU by TMSI identity when 'no assign-tmsi' is configured, for a customer.

This time I have implemented a solution that discards a previous evil twin VLR entry, adopting any pending paging from it.

This fix also makes the long-failing MSC_Tests.TC_attached_imsi_lu_unknown_tmsi finally pass.
It's been three and a half years! Nice to finally catch up with this one.

https://gerrit.osmocom.org/c/osmo-msc/+/36452

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)