Project

General

Profile

Bug #4960

VTY doesn't show BVCs getting blocked on transport network failure

Added by laforge 9 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
01/19/2021
Due date:
% Done:

100%

Spec Reference:

Description

If I start osmo-gbproxy, run a single TTCN3 test against it (so all BVC get up once), the output looks as follows:

OsmoGbProxy> show gbproxy bvc bss
NSEI  2003, SIG-BVCI     0 [UNBLOCKED]
NSEI  2003, PTP-BVCI 20031, RAI 262-42-13135-1 [UNBLOCKED]
NSEI  2003, PTP-BVCI 20032, RAI 262-42-13300-0 [UNBLOCKED]
NSEI  2003, PTP-BVCI 20033, RAI 262-42-13300-0 [UNBLOCKED]
NSEI  2001, SIG-BVCI     0 [UNBLOCKED]
NSEI  2001, PTP-BVCI 20011, RAI 262-42-13135-0 [UNBLOCKED]
NSEI  2002, SIG-BVCI     0 [UNBLOCKED]
NSEI  2002, PTP-BVCI 20021, RAI 262-42-13135-1 [UNBLOCKED]
NSEI  2002, PTP-BVCI 20022, RAI 262-42-13135-2 [UNBLOCKED]
OsmoGbProxy> show gbproxy bvc bss
NSEI  2003, SIG-BVCI     0 [UNBLOCKED]
NSEI  2003, PTP-BVCI 20031, RAI 262-42-13135-1 [UNBLOCKED]
NSEI  2003, PTP-BVCI 20032, RAI 262-42-13300-0 [UNBLOCKED]
NSEI  2003, PTP-BVCI 20033, RAI 262-42-13300-0 [UNBLOCKED]
NSEI  2001, SIG-BVCI     0 [UNBLOCKED]
NSEI  2001, PTP-BVCI 20011, RAI 262-42-13135-0 [UNBLOCKED]
NSEI  2002, SIG-BVCI     0 [UNBLOCKED]
NSEI  2002, PTP-BVCI 20021, RAI 262-42-13135-1 [UNBLOCKED]
NSEI  2002, PTP-BVCI 20022, RAI 262-42-13135-2 [UNBLOCKED]

However, even 10 minutes after the TTCN3 tester terminates (and hence all BSS and SGSN peers are gone), the output is still unchanged.

I guess a normal user would have expected that the BVCs would go into BLOCKED or some kind of recovery state if the underlying NSE disappears / becomes unavailable.

The same applies to

OsmoGbProxy> show gbproxy cell     
BVCI 20031 RAI 262-42-13135-1: BSS NSEI  2003, SGSN NSEI   101   102 
BVCI 20021 RAI 262-42-13135-1: BSS NSEI  2002, SGSN NSEI   101   102 
BVCI 20011 RAI 262-42-13135-0: BSS NSEI  2001, SGSN NSEI   101   102 
BVCI 20032 RAI 262-42-13300-0: BSS NSEI  2003, SGSN NSEI   101   102 
BVCI 20022 RAI 262-42-13135-2: BSS NSEI  2002, SGSN NSEI   101   102 
BVCI 20033 RAI 262-42-13300-0: BSS NSEI  2003, SGSN NSEI   101   102 

where the NSEI are shown even a long time after those NSEI are gone. Interestingly, when you start another test, they temporarily become

OsmoGbProxy> show gbproxy cell 
BVCI 20031 RAI 262-42-13135-1: BSS NSEI <none>, SGSN NSEI   101   102 
BVCI 20021 RAI 262-42-13135-1: BSS NSEI <none>, SGSN NSEI   101   102 
BVCI 20011 RAI 262-42-13135-0: BSS NSEI <none>, SGSN NSEI   101   102 
BVCI 20032 RAI 262-42-13300-0: BSS NSEI <none>, SGSN NSEI   101   102 
BVCI 20022 RAI 262-42-13135-2: BSS NSEI <none>, SGSN NSEI   101   102 
BVCI 20033 RAI 262-42-13300-0: BSS NSEI <none>, SGSN NSEI   101   102 

only to go bac kto 2001/2002/2003 a few seconds later. So the state is lost (maybe on BVC RESET?) In that case, maybe if the BVC would go to BLOCKED or some kind of other state, this would solve itself?

This may not seem super critical, but from an operational point of view, we will be wondering about this as soon as we go into deployment/testing, as will our users, AFAICT.

Associated revisions

Revision a631a3a2 (diff)
Added by daniel 3 months ago

gbproxy_peer: Free a cell as soon as no BSS BVC uses it

This patch adds gbproxy_cell_cleanup_bvc() which removes the bvc pointer
to the cell. If the BSS BVC of this cell is removed it frees the whole
cell (removing all the SGSN BVC pointers to the cell). The SGSN-side
BVCs are blocked at this point and will only be reestablished if this
BVC is reset again from the BSS.

Before this patch cells were never freed and might accumulate over time.
They would only be reused if the bvci matched that of a previous cell.

Related: OS#4960
Change-Id: Ib874cbebcea58fa4bf15e1ff40fe11601573e531

History

#1 Updated by laforge 9 months ago

And yes, I'm aware I wrote that code, so I'm not saying it's daniels fault when assigning this to him. I just try to focus at testing at the moment.

#2 Updated by laforge 9 months ago

"BLOCKED" is spec-wise the wrong state for the signaling BVCs, as by definition it can never be blocked. At gbproxy start-up the SGSN side BVC are in WAIT_RESET_ACK state:

NSEI   101, SIG-BVCI     0 [WAIT_RESET_ACK]
NSEI   102, SIG-BVCI     0 [WAIT_RESET_ACK]

and the BSS side BVCs simply don't exist.

I would argue that the PTP BVCs could actually be deleted when a BSS disappears. This would mean
  • BLOCK each SGSN side PTP BVC for this BVCI
  • destroy the BSS side PTP BVC object
  • possibly also destroy the cell object?

On the other hand, that would also destroy any related counters etc. - and from the operational point of view it might be interesting to keep them around even if there is an outage. After all, the number of BSS/BVC is not something that changes frequently in a production network.

So as an alternative, we could simply mark the PTP BVC on the BSS side as blocked (we don't even need to start a BLOCKING procedure, as that will try to send packets and wait for ACKs). Plus start the BLOCK procedure on the SGSN side as described above.

Maybe all of the above is a "Holzweg" and we should simply show the NSE ALIVE/DEAD state next to each BVC?

Any comments/ideas?

#3 Updated by daniel 5 months ago

I believe this was caused by an issue that has since been resolved where IP-SNS NSEs would never be considered dead.

At least show gbproxy bvc bss will show blocked cells if the BSS-NSE disappears.
The SGSN BSS will also appear blocked (if the SGSN NSE is still connected).

I have also observed show gbproxy cell with BSS NSEI <none> for a cell whose corresponding BSS NSE went down and didn't reconnect. It was still listed as connected (blocked) to the SGSN because we can't "delete" BVCs without resetting BVC 0 (which deletes all BVCs and is not what we want).

  • BLOCK each SGSN side PTP BVC for this BVCI
  • destroy the BSS side PTP BVC object

We already do those two.

  • possibly also destroy the cell object?

This we don't do yet, but I'm not sure if we should. As soon as the BSS is gone and we block the BVC towards the SGSN we could also get rid of the cell. When the cell comes back (maybe on a different NSE or BVCI) we go through the RESET procedure anyway so we don't really need the cell.

#4 Updated by daniel 3 months ago

  • Status changed from New to In Progress

#5 Updated by daniel 3 months ago

  • % Done changed from 0 to 40

Patch in Gerrit to cleanup cells: https://gerrit.osmocom.org/c/osmo-gbproxy/+/24956

#6 Updated by daniel 3 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 40 to 100

Merged

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)