Project

General

Profile

Bug #4188

BSC sends COMPLETE L3 before RESET

Added by laforge 3 months ago. Updated about 2 months ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
A interface
Target version:
-
Start date:
09/03/2019
Due date:
% Done:

70%

Spec Reference:

Description

At least in SCCPlite, we've received a protocol trace from a customer that looks like this:

  • IPA CCM handshake
  • SCCP CR with BSSMAP COMPLETE L3 INFO
  • another SCCP CR with BSSMAP COMPLETE L3 INFO
  • only then a SCCP UDT with BSSMAP RESET

The Reset procedure should happen as the first thing after the A link comes up, before any user data is communicated. The SCCP CR messages of the example above should ideally be queued (or else discarded) until the RESET procedure completes. Discarding is probably the easy option, as queueing would have to involve timeouts (what if the RESET takes 5 minutes to complete), ...


Related issues

Related to OsmoBSC - Bug #3041: osmo-bsc initiates BSSMAP Reset after every fourth BSSMAP ClearRejected03/07/2018

Related to OsmoBSC - Feature #3102: clean up and use a_reset.c in osmo-bscResolved03/23/2018

History

#1 Updated by pespin 3 months ago

#2 Updated by pespin 3 months ago

  • Status changed from New to In Progress

#3 Updated by pespin 3 months ago

  • Related to Bug #3041: osmo-bsc initiates BSSMAP Reset after every fourth BSSMAP Clear added

#4 Updated by pespin 3 months ago

  • Related to Feature #3102: clean up and use a_reset.c in osmo-bsc added

#5 Updated by pespin 3 months ago

Related work:
https://gerrit.osmocom.org/c/libosmo-netif/+/15403 stream: Introduce API osmo_stream_cli_is_connected
https://gerrit.osmocom.org/c/libosmo-netif/+/15404 stream: Fix scheduling of queued messages during connecting state

#6 Updated by pespin 3 months ago

  • % Done changed from 0 to 60

https://gerrit.osmocom.org/c/libosmo-sccp/+/15405 ss7: Do not queue messages if stream is not connected

Helpful call stack:

sccp_sclc_user_sap_down_nofree
    xua_gen_encode_and_send    
        xua_gen_msg_cl
        sccp_scrc_rx_sclc_msg
            sua_addr_parse
            scrc_local_out_common
                scrc_node_12
                    gen_mtp_transfer_req_xua
                        sua2sccp_tx_m3ua
                            osmo_ss7_user_mtp_xfer_req
                                m3ua_hmdc_rx_from_l2
                                hmrt_message_for_routing
                                    ipa_tx_xua_as
                                        xua_as_transmit_msg
                                            osmo_ss7_asp_send
                                                osmo_stream_cli_send/osmo_stream_srv_send

#7 Updated by pespin 3 months ago

  • Category set to A interface
  • Status changed from In Progress to Feedback
  • % Done changed from 60 to 70

More related commits:
remote: https://gerrit.osmocom.org/c/osmo-bsc/+/15406 a_reset.c: Don't wait 2 seconds to send first BSSMAP RESET
remote: https://gerrit.osmocom.org/c/osmo-bsc/+/15407 bsc: gsm_08_08.c: Remove repeated conn not null check

I could not find the exact culprit of the issue, according to what I understand from the code it should not happen at all. I think it may happen if the BSC<->MSC conn was already established at some previous point, and then it got restarted without the BSC not yet knowing about it, so upper layers still think the conn is active and so those CL3 Info messages can be sent. And since those are not answered, at some point this condition from a_reset.c triggers, sending the BSSAP reset:

if (reset_ctx->conn_loss_counter >= BAD_CONNECTION_THRESOLD)

But I'm just speculating, it's difficult to say because the bsc logs related to the pcap file don't match (eg. the src port of the connection and timestamps differ), so it's almost impossible to know exactly what's going on since I also lack previous context in the pcap file.

I think the best is to stall this ticket and once the fixes above submitted are merged, try again and get more data to better figure out the issue.

#8 Updated by pespin about 2 months ago

I checked again about the possibility of osmo-bsc forwarding a COMPL L3 message before having done the reset, and again I was unable to find how it can happen.

a_reset.c keeps the SCCP link state in an FSM, and it can be checked with a_reset_conn_ready(), which can only return true with "reset_fsm->state == ST_CONN".

Then, here's the code path when a COMPL L3 message is received in BSC through RSL from BTS:

bsc_compl_l3
    bsc_find_msc (doesn't check with a_reset_conn_ready(), but it's expected since later more fine grained USSD is sent to subscriber in complete_layer3())
    complete_layer3
        osmo_bsc_sigtran_new_conn
            a_reset_conn_ready
                return false
        [Upon return false above, complete_layer3() does bsc_send_ussd_no_srv() and returns without forwarding the message).

So I go back to what I said in last comment. I think the resets seen afterwards were sent by incremented conn_loss_counter through calls to a_reset.c:a_reset_conn_fail() and reaching the threshold reset_ctx->conn_loss_counter >= BAD_CONNECTION_THRESOLD

I don't think it's worth spending more time in related topic until we find some setup were we can clearly see this issue again and get some proper traces with pcaps, sine afaiu those resets could be expected.

laforge what do you think?

#9 Updated by pespin about 2 months ago

  • Assignee changed from pespin to laforge

#10 Updated by laforge about 2 months ago

  • Status changed from Feedback to Rejected

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)