Project

General

Profile

Actions

Bug #3498

closed

osmo-bts-trx (and others?) crash when omitting an IPA ID GET upon first connection on RSL link

Added by neels over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
08/24/2018
Due date:
% Done:

100%

Spec Reference:

Description

When https://gerrit.osmocom.org/10599 is applied to osmo-ttcn3-hacks, the "BSC" sends an IPA ID ACK to osmo-bts-trx when that first connects, on both OML and RSL.
The (apparently) RSL read cb then crashes in a segfault.

<0006> ../../../../src/osmo-bts/src/common/scheduler.c:591 Configuring multiframe with PDCH trx=0 ts=7
<0001> ../../../../src/osmo-bts/src/common/oml.c:313 OC=CHANNEL INST=(00,00,07) Tx STATE CHG REP
<0001> ../../../../src/osmo-bts/src/common/oml.c:348 OC=CHANNEL INST=(00,00,07) AVAIL STATE Dependency -> OK
<0001> ../../../../src/osmo-bts/src/common/oml.c:355 OC=CHANNEL INST=(00,00,07) OPER STATE Disabled -> Enabled
<0001> ../../../../src/osmo-bts/src/common/oml.c:313 OC=CHANNEL INST=(00,00,07) Tx STATE CHG REP
<0006> ../../../../src/osmo-bts/src/osmo-bts-trx/scheduler_trx.c:1640 TRX Clock Ind: elapsed_us= 462107, elapsed_fn=102, error_us=-8623
<0006> ../../../../src/osmo-bts/src/osmo-bts-trx/scheduler_trx.c:1653 GSM clock skew: old fn=0, new fn=102
<0009> ../../../../src/osmo-bts/src/common/pcu_sock.c:899 PCU socket connected to external PCU
../../../src/libosmo-abis/src/e1_input.c:511:2: runtime error: member access within null pointer of type 'struct e1inp_sign_link'
../../../src/libosmo-abis/src/e1_input.c:512:11: runtime error: member access within null pointer of type 'struct e1inp_sign_link'
ASAN:SIGSEGV
=================================================================
==5702==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000038 (pc 0x7f28a379ea34 sp 0x7ffd7d8933f0 bp 0x62e000000a98 T0)
    #0 0x7f28a379ea33 in e1inp_lookup_sign_link ../../../src/libosmo-abis/src/e1_input.c:512
    #1 0x7f28a37b97d4 in ipaccess_bts_read_cb ../../../src/libosmo-abis/src/input/ipaccess.c:778
    #2 0x7f28a37b0277 in ipa_client_read ../../../src/libosmo-abis/src/input/ipa.c:76
    #3 0x7f28a37b0277 in ipa_client_fd_cb ../../../src/libosmo-abis/src/input/ipa.c:139
    #4 0x7f28a2ac1a34 in osmo_fd_disp_fds ../../../src/libosmocore/src/select.c:217
    #5 0x7f28a2ac1a34 in osmo_select_main ../../../src/libosmocore/src/select.c:257
    #6 0x444ba3 in bts_main ../../../../src/osmo-bts/src/common/main.c:364
    #7 0x7f28a17ddb44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
    #8 0x405e54 (/usr/local/bin/osmo-bts-trx+0x405e54)

Related issues

Related to libosmo-sccp + libosmo-sigtran - Bug #3500: changing ipa_asp_fsm to expect IPA ID ACK instead of IPA ID GET has broken the ttcn3-bsc-test,SCCPlite test suiteResolvedneels08/24/2018

Actions
Actions #1

Updated by neels over 5 years ago

  • Status changed from New to In Progress
Actions #2

Updated by neels over 5 years ago

Turns out, libosmo-abis' RSL link strongly depends on the IPA ID GET message to be sent, or it will readily dereference an uninitialized link.
To illustrate, this libosmo-abis patch fixes the failure:

--- a/src/input/ipaccess.c
+++ b/src/input/ipaccess.c
@@ -753,7 +753,8 @@ static int ipaccess_bts_read_cb(struct ipa_client_conn *link, struct msgb *msg)
        if (ret == 1 && hh->proto == IPAC_PROTO_IPACCESS) {
                uint8_t msg_type = *(msg->l2h);

-               if (msg_type == IPAC_MSGT_ID_GET) {
+               if (msg_type == IPAC_MSGT_ID_GET
+                   || msg_type == IPAC_MSGT_ID_ACK) {
                        sign_link = link->line->ops->sign_link_up(msg,
                                        link->line,
                                        link->ofd->priv_nr);

This works because I know we're sending an IPA ID ACK now, and triggering on the ID ACK to initialize the link solves the problem in this instance.
Of course libosmo-abis should never crash, whether an initial message was received or not, so we still need a fix for the crash situation regardless.

The crash happens here:

        else if (link->port == IPA_TCP_PORT_RSL)
                e1i_ts = &link->line->ts[link->ofd->priv_nr-1];  <-- got the RSL link

        OSMO_ASSERT(e1i_ts != NULL);

        /* look up for some existing signaling link. */
        sign_link = e1inp_lookup_sign_link(e1i_ts, hh->proto, 0);  <-- entering
---
struct e1inp_sign_link *e1inp_lookup_sign_link(struct e1inp_ts *e1i,
                                                uint8_t tei, uint8_t sapi)
{
        struct e1inp_sign_link *link;

        llist_for_each_entry(link, &e1i->sign.sign_links, list) {
                if (link->sapi == sapi && link->tei == tei)       <--- crash
                        return link;
        }

        return NULL;
}

I assume the sign_links list is not initialized?? Checking now.

Actions #3

Updated by neels over 5 years ago

  • Subject changed from osmo-bts-trx (and others?) crash when sending an IPA ID ACK upon first connection on RSL link to osmo-bts-trx (and others?) crash when omitting an IPA ID GET upon first connection on RSL link

about the detailed failure cause:

in osmo-bts, sign_link_up() calls e1inp_ts_config_sign(), which sets INIT_LLIST_HEAD(&ts->sign.sign_links);
Before that, the ts->sign.sign_links llist is indeed still uninitialized.
(It would be nice to call INIT_LLIST_HEAD() sooner, i.e. in e1inp_line_create(), but some callers, e.g. bs11_config, call only e1inp_ts_config_sign() and never call e1inp_line_create().)

There should definitely be a safeguard to not use the e1i_ts if it hasn't been initialized: https://gerrit.osmocom.org/#/c/libosmo-abis/+/10600

There are other nightmares in there:

  • directly using unvalidated link->ofd->priv_nr as array index.
  • sometimes link->ofd->priv_nr is used, sometimes link->ofd->priv_nr-1 is used, sometimes trx_nr = link->ofd->priv_nr - E1INP_SIGN_RSL (==2), and there is no explanation of this magic.

It is very tempting to eradicate these horrors from the code, but I am not going to touch those now.

A remaining open question: which IPA clients should work with which init behaviors.
TLDR: make ttcn3 IPA server configurable.

We know of an SCCPlite MSC that, when IPA connects to it, directly sends an IPA ID ACK, never an IPA ID GET.
Thus it seems too fragile to depend on exactly an IPA ID GET to initialize RSL; but actually all our own current IPA servers do that.
The only reason this whole issue shows up is that I adjusted the ttcn3 IPA server to match the behavior of that SCCPlite MSC.

To work around the current stalemate, I could make the ttcn3 IPA server's behavior configurable, so that it sends IPA ID ACK when it acts as SCCPlite-MSC, and sends IPA ID GET when it acts as OML/RSL server. That seems the simplest solution.

What I don't like though is that we then have two different IPA standards, one on the OML/RSL side, and another on the MSC A-interface side.
The warmest fuzziest would be to make both implementations (OML/RSL client in libosmo-abis and the SCCPlite ipa_asp_fsm in libosmo-sccp) work with both IPA ID GET and IPA ID ACK behaviors, and then we can leave the current ttcn3 IPA behavior unchanged to always send only IPA ID GET.

But, we anyway still want to test exactly this SCCPlite MSC's behavior in ttcn3!

In conclusion, either way, step 1 is to make the ttcn3 IPA server configurable, to pick between sending IPA ID GET or IPA ID ACK.
After that follows:

  • make libosmo-abis client work when it doesn't get IPA ID GET (optional).
  • make ipa_asp_fsm client work when it doesn't get IPA ID ACK (optional, pending other MSCs' behavior).
  • after that, also test in ttcn3 that ipa_asp_fsm works with the IPA ID GET behavior
Actions #4

Updated by neels over 5 years ago

  • % Done changed from 0 to 90

The above paragraph is a bit off topic from this issue, which is frankly just about crashing in the absence of IPA ID GET.
Above mentioned https://gerrit.osmocom.org/#/c/libosmo-abis/+/10600 is the fix for this issue.

Actions #5

Updated by neels over 5 years ago

  • Related to Bug #3500: changing ipa_asp_fsm to expect IPA ID ACK instead of IPA ID GET has broken the ttcn3-bsc-test,SCCPlite test suite added
Actions #6

Updated by neels over 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100

merged

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)