Project

General

Profile

Actions

Bug #1798

closed

dynpdch and repairing of broken channels

Added by zecke over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/18/2016
Due date:
% Done:

100%

Resolution:
Spec Reference:

Description

Going through the lchan issue and looking at rsl_rx_rf_chan_rel_ack I see that there is a path we don't do the pdch switching? What is the reason for it?

static int rsl_rx_rf_chan_rel_ack(struct gsm_lchan *lchan)
{

        DEBUGP(DRSL, "%s RF CHANNEL RELEASE ACK\n", gsm_lchan_name(lchan));

        /* Stop all pending timers */
        osmo_timer_del(&lchan->act_timer);
        osmo_timer_del(&lchan->T3111);

        /*
         * The BTS didn't respond within the timeout to our channel
         * release request and we have marked the channel as broken.
         * Now we do receive an ACK and let's be conservative. If it
         * is a sysmoBTS we know that only one RF Channel Release ACK
         * will be sent. So let's "repair" the channel.
         */
        if (lchan->state == LCHAN_S_BROKEN) {
                int do_free = is_sysmobts_v2(lchan->ts->trx->bts);
                LOGP(DRSL, LOGL_NOTICE,
                        "%s CHAN REL ACK for broken channel. %s.\n",
                        gsm_lchan_name(lchan),
                        do_free ? "Releasing it" : "Keeping it broken");
                if (do_free)
                        do_lchan_free(lchan);

!!!!
!!!!  No switch of the PDCH here!
!!!!
!!!!

                return 0;
        }

        if (lchan->state != LCHAN_S_REL_REQ && lchan->state != LCHAN_S_REL_ERR)
                LOGP(DRSL, LOGL_NOTICE, "%s CHAN REL ACK but state %s\n",
                        gsm_lchan_name(lchan),
                        gsm_lchans_name(lchan->state));

        do_lchan_free(lchan);

        /*
         * Put a dynamic TCH/F_PDCH channel back to PDCH mode iff it was
         * released successfully. If in error, the PDCH ACT will follow after
         * T3111 in error_timeout_cb().
         *
         * Any state other than LCHAN_S_REL_ERR became LCHAN_S_NONE after above
         * do_lchan_free(). Assert this, because that's what ensures a PDCH ACT
         * on a dynamic channel in all cases.
         */
        OSMO_ASSERT(lchan->state == LCHAN_S_NONE
                    || lchan->state == LCHAN_S_REL_ERR);
        if (lchan->ts->pchan == GSM_PCHAN_TCH_F_PDCH
            && lchan->state == LCHAN_S_NONE)
                return rsl_ipacc_pdch_activate(lchan->ts, 1);

        return 0;
}
Actions #1

Updated by neels over 7 years ago

The reason is probably that I have not yet covered that case.

To discuss, let's describe the various facets of this:

dyn type:

  • ip.access style (TCH/F_PDCH)
  • Osmocom style (TCH/F_TCH/H_PDCH)

broken channel:

  • marked broken in BTS
    • ?

I'm trying to figure out how these things relate to each other.
Any hints/facts would be welcome.

Actions #2

Updated by neels over 7 years ago

broken channel state can come from

  • chan act timeout
  • chan deact timeout
  • rx chan act nack

(see rsl_lchan_mark_broken() in abis_rsl.c)

Actions #3

Updated by neels over 7 years ago

I've hacked fake act delays into osmo-bts and tested various situations
with my SysmoBTS. (the prompt says "root@sysmobts-v2:~#" so I assume it's v2,
which is interesting because of the do_free condition above.)

Some may not be strictly related to this issue as reported, but I'd like to
discuss here and split into new issues later.

(1)
With a 10 second delay hacked into the TCH/H channel activation ack from
osmo-bts, the act ack comes after the lchan->act_timer expired and the
channel is marked BROKEN_UNUSABLE.

I do this only the first time, so the BSC would recover if it tried the
same lchan a second time.

(1a)
For a plain, non-dynamic TCH/H pchan, I observe that the lchan is
never recovered. It remains marked broken forever:

20160824153815711 DRLL <0000> chan_alloc.c:367 (bts=0,trx=0,ts=5,pchan=TCH/H) Allocating lchan=0 as TCH_H
20160824153815711 DRSL <0004> abis_rsl.c:1727 (bts=0,trx=0,ts=5,ss=0) Activating ARFCN(868) SS(0) lctype TCH_H r=CALL ra=0x47 ta=0
20160824153815711 DRSL <0004> abis_rsl.c:533 (bts=0,trx=0,ts=5,pchan=TCH/H) Tx RSL Channel Activate with act_type=INITIAL
20160824153815711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state NONE -> ACTIVATION REQUESTED
[osmo-bts doesn't respond with an act ack]
[4 seconds later, act_timer fires]
20160824153819711 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=0) TCH_H lchan broken: activation timeout
20160824153819711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state ACTIVATION REQUESTED -> BROKEN UNUSABLE
[another 6 seconds and the act ack comes in late]
20160824153825725 DRSL <0004> abis_rsl.c:1456 (bts=0,trx=0,ts=5,ss=0) CHANNEL ACTIVATE ACK
20160824153825725 DRSL <0004> abis_rsl.c:1146 (bts=0,trx=0,ts=5,ss=0) CHAN ACT ACK for broken channel.
[another 6 seconds pass and the BTS signals conn failure on RSL:]
20160824153831420 DRSL <0004> abis_rsl.c:1222 (bts=0,trx=0,ts=5,ss=0) CONNECTION FAIL: RELEASING state BROKEN UNUSABLE CAUSE=0x01(Radio Link Failure) 

In abis_rsl.c:1222 rsl_rx_conn_fail(), the BSC could free the lchan, but does not because
the lchan state is not LCHAN_S_ACTIVE. rsl_rx_conn_fail() calls rsl_rf_chan_release_err():

/*
 * Special handling for channel releases in the error case.
 */
static int rsl_rf_chan_release_err(struct gsm_lchan *lchan)
{
        if (lchan->state != LCHAN_S_ACTIVE)
                return 0;
        return rsl_rf_chan_release(lchan, 1, SACCH_DEACTIVATE);
}

After this, the lchan remains marked broken:

OpenBSC> show lchan summary 
BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm
BTS 0, TRX 0, Timeslot 5 TCH/H, Lchan 0, Type NONE, State BROKEN UNUSABLE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm

If nitb config here has only one TCH/H TS (the rest as SDCCH8 a.k.a. disabled)
the call does not succeed -- one TCH/H remains broken, and there is only one
working TCH/H but two phones wanting one.

If there are two TCH/H, the first TCH/H goes broken, but the call succeeds
because the phones get assigned a different, working TCH/H, which are still
available.

Nevertheless, it looks too harsh to keep this lchan broken forever without
even a second try.

(1b)
For dyn TS (TCH/F_TCH/H_PDCH), the situation is the same as for plain TCH/H.

Before being able to fix dyn TS, we should probably resolve the plain
TCH/* recovery.


(2)
With a 10 second delay hacked into the TCH/H channel deactivation ack
(activation ack back to normal), things look better. The lchan hits above
"Releasing it" condition and gets freed back to NONE state.

(2a)
For plain TCH/H, all is well.

20160824161705795 DRLL <0000> abis_rsl.c:1917 (bts=0,trx=0,ts=5,ss=1) SAPI=0 RELEASE INDICATION
20160824161705795 DRSL <0004> abis_rsl.c:807 (bts=0,trx=0,ts=5,ss=1) RF Channel Release
20160824161705798 DRSL <0004> abis_rsl.c:2334 (bts=0,trx=0,ts=5,ss=1) IPAC_DLCX_IND 
[...]
20160824161709798 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=1) TCH_H lchan broken: de-activation timeout
20160824161709798 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state RELEASE REQUESTED -> BROKEN UNUSABLE
[...]
20160824161715837 DRSL <0004> abis_rsl.c:864 (bts=0,trx=0,ts=5,ss=1) RF CHANNEL RELEASE ACK
20160824161715837 DRSL <0004> abis_rsl.c:882 (bts=0,trx=0,ts=5,ss=1) CHAN REL ACK for broken channel. Releasing it.
20160824161715837 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state BROKEN UNUSABLE -> NONE
OpenBSC> show lchan summary 
BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm

(2b)
For dyn TS TCH/F_TCH/H_PDCH, the situation is the same, but a switchover
back to PDCH operation is indeed missing. Testing and fixing now...

Actions #4

Updated by zecke over 7 years ago

On 24 Aug 2016, at 16:24, neels [REDMINE] <> wrote:

Issue #1798 has been updated by neels.

(1)
With a 10 second delay hacked into the TCH/H channel activation ack from
osmo-bts, the act ack comes after the lchan->act_timer expired and the
channel is marked BROKEN_UNUSABLE.

yes, that is known (see my other gerrit change to release the channel in that case)

(2b)
For dyn TS TCH/F_TCH/H_PDCH, the situation is the same, but a switchover
back to PDCH operation is indeed missing. Testing and fixing now...

that was the point. Good you have an understanding of the issue now.

Actions #5

Updated by neels over 7 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 90
Actions #6

Updated by neels over 7 years ago

  • % Done changed from 90 to 100

760 was merged.

Actions #7

Updated by laforge over 7 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)