Project

General

Profile

Bug #1798

dynpdch and repairing of broken channels

Added by zecke over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/18/2016
Due date:
% Done:

100%

Resolution:
Spec Reference:

Description

Going through the lchan issue and looking at rsl_rx_rf_chan_rel_ack I see that there is a path we don't do the pdch switching? What is the reason for it?

static int rsl_rx_rf_chan_rel_ack(struct gsm_lchan *lchan)
{

        DEBUGP(DRSL, "%s RF CHANNEL RELEASE ACK\n", gsm_lchan_name(lchan));

        /* Stop all pending timers */
        osmo_timer_del(&lchan->act_timer);
        osmo_timer_del(&lchan->T3111);

        /*
         * The BTS didn't respond within the timeout to our channel
         * release request and we have marked the channel as broken.
         * Now we do receive an ACK and let's be conservative. If it
         * is a sysmoBTS we know that only one RF Channel Release ACK
         * will be sent. So let's "repair" the channel.
         */
        if (lchan->state == LCHAN_S_BROKEN) {
                int do_free = is_sysmobts_v2(lchan->ts->trx->bts);
                LOGP(DRSL, LOGL_NOTICE,
                        "%s CHAN REL ACK for broken channel. %s.\n",
                        gsm_lchan_name(lchan),
                        do_free ? "Releasing it" : "Keeping it broken");
                if (do_free)
                        do_lchan_free(lchan);

!!!!
!!!!  No switch of the PDCH here!
!!!!
!!!!

                return 0;
        }

        if (lchan->state != LCHAN_S_REL_REQ && lchan->state != LCHAN_S_REL_ERR)
                LOGP(DRSL, LOGL_NOTICE, "%s CHAN REL ACK but state %s\n",
                        gsm_lchan_name(lchan),
                        gsm_lchans_name(lchan->state));

        do_lchan_free(lchan);

        /*
         * Put a dynamic TCH/F_PDCH channel back to PDCH mode iff it was
         * released successfully. If in error, the PDCH ACT will follow after
         * T3111 in error_timeout_cb().
         *
         * Any state other than LCHAN_S_REL_ERR became LCHAN_S_NONE after above
         * do_lchan_free(). Assert this, because that's what ensures a PDCH ACT
         * on a dynamic channel in all cases.
         */
        OSMO_ASSERT(lchan->state == LCHAN_S_NONE
                    || lchan->state == LCHAN_S_REL_ERR);
        if (lchan->ts->pchan == GSM_PCHAN_TCH_F_PDCH
            && lchan->state == LCHAN_S_NONE)
                return rsl_ipacc_pdch_activate(lchan->ts, 1);

        return 0;
}

Associated revisions

Revision d35fc440 (diff)
Added by neels over 4 years ago

dyn TS: fix OS#1798: on late RF CHAN REL ACK, activate PDCH

Tested by hacking a REL ACK delay of a couple of seconds into osmo-bts' rsl.c
for the first TCH_H lchan:

[[[
diff --git a/include/osmo-bts/rsl.h b/include/osmo-bts/rsl.h
index 093e9cb..b35c3bb 100644
--- a/include/osmo-bts/rsl.h
+++ b/include/osmo-bts/rsl.h
@ -22,6 +22,7 @ int rsl_tx_est_ind(struct gsm_lchan *lchan, uint8_t link_id, uint8_t *data, int
int rsl_tx_chan_act_acknack(struct gsm_lchan *lchan, uint8_t cause);
int rsl_tx_conn_fail(struct gsm_lchan *lchan, uint8_t cause);
int rsl_tx_rf_rel_ack(struct gsm_lchan *lchan);
+int rsl_tx_rf_rel_ack_later(struct gsm_lchan *lchan);
int rsl_tx_hando_det(struct gsm_lchan *lchan, uint8_t *ho_delay);

/* call-back for LAPDm code, called when it wants to send msgs UP */
diff --git a/src/common/l1sap.c b/src/common/l1sap.c
index 3802e25..1f92b0d 100644
--- a/src/common/l1sap.c
+++ b/src/common/l1sap.c
@ -491,7 +491,16 @ static int l1sap_info_rel_cnf(struct gsm_bts_trx *trx,
lchan = get_lchan_by_chan_nr(trx, info_act_cnf->chan_nr);

- rsl_tx_rf_rel_ack(lchan);
+ static int yyy = 0;

DEBUGP(DRSL, "%s YYYYYYYYYYYYYYYYYYYYY d %s\n",
+ gsm_lchan_name(lchan), yyy, gsm_lchant_name(lchan->type));

if (lchan->type == GSM_LCHAN_TCH_H &x%x
!yyy) {
+ yyy +;
rsl_tx_rf_rel_ack_later(lchan);
+ } else
+ rsl_tx_rf_rel_ack(lchan);

/* During PDCH DEACT, this marks the deactivation of the PDTCH as
         * requested by the PCU. Next up, we disconnect the TS completely and
diff --git a/src/common/rsl.c b/src/common/rsl.c
index 3c97af9..7926f21 100644
--- a/src/common/rsl.c
+++ b/src/common/rsl.c
@ -534,6 +534,22 @ int rsl_tx_rf_rel_ack(struct gsm_lchan *lchan)
return abis_bts_rsl_sendmsg(msg);
}

struct osmo_timer_list yyy_timer;

static void yyy_timer_cb(void data)
{
+ rsl_tx_rf_rel_ack(data);
}

int rsl_tx_rf_rel_ack_later(struct gsm_lchan *lchan)
{
+ yyy_timer.cb = yyy_timer_cb;
+ yyy_timer.data = lchan;
+ osmo_timer_schedule(&yyy_timer, 10, 0);
+ return 0;
}

+
/
8.4.2 sending CHANnel ACTIVation ACKnowledge */
static int rsl_tx_chan_act_ack(struct gsm_lchan *lchan) {
]]]

Change-Id: I87e07e1d54882f8f3d667fa300c6e3679f5c920d
Fixes: OS#1798

Revision a8430762 (diff)
Added by neels over 4 years ago

dyn TS: fix OS#1798: on late RF CHAN REL ACK, activate PDCH

Tested by hacking a REL ACK delay of a couple of seconds into osmo-bts' rsl.c
for the first TCH_H lchan:

[[[
diff --git a/include/osmo-bts/rsl.h b/include/osmo-bts/rsl.h
index 093e9cb..b35c3bb 100644
--- a/include/osmo-bts/rsl.h
+++ b/include/osmo-bts/rsl.h
@ -22,6 +22,7 @ int rsl_tx_est_ind(struct gsm_lchan *lchan, uint8_t link_id, uint8_t *data, int
int rsl_tx_chan_act_acknack(struct gsm_lchan *lchan, uint8_t cause);
int rsl_tx_conn_fail(struct gsm_lchan *lchan, uint8_t cause);
int rsl_tx_rf_rel_ack(struct gsm_lchan *lchan);
+int rsl_tx_rf_rel_ack_later(struct gsm_lchan *lchan);
int rsl_tx_hando_det(struct gsm_lchan *lchan, uint8_t *ho_delay);

/* call-back for LAPDm code, called when it wants to send msgs UP */
diff --git a/src/common/l1sap.c b/src/common/l1sap.c
index 3802e25..1f92b0d 100644
--- a/src/common/l1sap.c
+++ b/src/common/l1sap.c
@ -491,7 +491,16 @ static int l1sap_info_rel_cnf(struct gsm_bts_trx *trx,
lchan = get_lchan_by_chan_nr(trx, info_act_cnf->chan_nr);

- rsl_tx_rf_rel_ack(lchan);
+ static int yyy = 0;

DEBUGP(DRSL, "%s YYYYYYYYYYYYYYYYYYYYY d %s\n",
+ gsm_lchan_name(lchan), yyy, gsm_lchant_name(lchan->type));

if (lchan->type == GSM_LCHAN_TCH_H &x%x
!yyy) {
+ yyy +;
rsl_tx_rf_rel_ack_later(lchan);
+ } else
+ rsl_tx_rf_rel_ack(lchan);

/* During PDCH DEACT, this marks the deactivation of the PDTCH as
         * requested by the PCU. Next up, we disconnect the TS completely and
diff --git a/src/common/rsl.c b/src/common/rsl.c
index 3c97af9..7926f21 100644
--- a/src/common/rsl.c
+++ b/src/common/rsl.c
@ -534,6 +534,22 @ int rsl_tx_rf_rel_ack(struct gsm_lchan *lchan)
return abis_bts_rsl_sendmsg(msg);
}

struct osmo_timer_list yyy_timer;

static void yyy_timer_cb(void data)
{
+ rsl_tx_rf_rel_ack(data);
}

int rsl_tx_rf_rel_ack_later(struct gsm_lchan *lchan)
{
+ yyy_timer.cb = yyy_timer_cb;
+ yyy_timer.data = lchan;
+ osmo_timer_schedule(&yyy_timer, 10, 0);
+ return 0;
}

+
/
8.4.2 sending CHANnel ACTIVation ACKnowledge */
static int rsl_tx_chan_act_ack(struct gsm_lchan *lchan) {
]]]

Change-Id: I87e07e1d54882f8f3d667fa300c6e3679f5c920d
Fixes: OS#1798

History

#1 Updated by neels over 4 years ago

The reason is probably that I have not yet covered that case.

To discuss, let's describe the various facets of this:

dyn type:

  • ip.access style (TCH/F_PDCH)
  • Osmocom style (TCH/F_TCH/H_PDCH)

broken channel:

  • marked broken in BTS
    • ?

I'm trying to figure out how these things relate to each other.
Any hints/facts would be welcome.

#2 Updated by neels over 4 years ago

broken channel state can come from

  • chan act timeout
  • chan deact timeout
  • rx chan act nack

(see rsl_lchan_mark_broken() in abis_rsl.c)

#3 Updated by neels over 4 years ago

I've hacked fake act delays into osmo-bts and tested various situations
with my SysmoBTS. (the prompt says "root@sysmobts-v2:~#" so I assume it's v2,
which is interesting because of the do_free condition above.)

Some may not be strictly related to this issue as reported, but I'd like to
discuss here and split into new issues later.

(1)
With a 10 second delay hacked into the TCH/H channel activation ack from
osmo-bts, the act ack comes after the lchan->act_timer expired and the
channel is marked BROKEN_UNUSABLE.

I do this only the first time, so the BSC would recover if it tried the
same lchan a second time.

(1a)
For a plain, non-dynamic TCH/H pchan, I observe that the lchan is
never recovered. It remains marked broken forever:

20160824153815711 DRLL <0000> chan_alloc.c:367 (bts=0,trx=0,ts=5,pchan=TCH/H) Allocating lchan=0 as TCH_H
20160824153815711 DRSL <0004> abis_rsl.c:1727 (bts=0,trx=0,ts=5,ss=0) Activating ARFCN(868) SS(0) lctype TCH_H r=CALL ra=0x47 ta=0
20160824153815711 DRSL <0004> abis_rsl.c:533 (bts=0,trx=0,ts=5,pchan=TCH/H) Tx RSL Channel Activate with act_type=INITIAL
20160824153815711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state NONE -> ACTIVATION REQUESTED
[osmo-bts doesn't respond with an act ack]
[4 seconds later, act_timer fires]
20160824153819711 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=0) TCH_H lchan broken: activation timeout
20160824153819711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state ACTIVATION REQUESTED -> BROKEN UNUSABLE
[another 6 seconds and the act ack comes in late]
20160824153825725 DRSL <0004> abis_rsl.c:1456 (bts=0,trx=0,ts=5,ss=0) CHANNEL ACTIVATE ACK
20160824153825725 DRSL <0004> abis_rsl.c:1146 (bts=0,trx=0,ts=5,ss=0) CHAN ACT ACK for broken channel.
[another 6 seconds pass and the BTS signals conn failure on RSL:]
20160824153831420 DRSL <0004> abis_rsl.c:1222 (bts=0,trx=0,ts=5,ss=0) CONNECTION FAIL: RELEASING state BROKEN UNUSABLE CAUSE=0x01(Radio Link Failure) 

In abis_rsl.c:1222 rsl_rx_conn_fail(), the BSC could free the lchan, but does not because
the lchan state is not LCHAN_S_ACTIVE. rsl_rx_conn_fail() calls rsl_rf_chan_release_err():

/*
 * Special handling for channel releases in the error case.
 */
static int rsl_rf_chan_release_err(struct gsm_lchan *lchan)
{
        if (lchan->state != LCHAN_S_ACTIVE)
                return 0;
        return rsl_rf_chan_release(lchan, 1, SACCH_DEACTIVATE);
}

After this, the lchan remains marked broken:

OpenBSC> show lchan summary 
BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm
BTS 0, TRX 0, Timeslot 5 TCH/H, Lchan 0, Type NONE, State BROKEN UNUSABLE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm

If nitb config here has only one TCH/H TS (the rest as SDCCH8 a.k.a. disabled)
the call does not succeed -- one TCH/H remains broken, and there is only one
working TCH/H but two phones wanting one.

If there are two TCH/H, the first TCH/H goes broken, but the call succeeds
because the phones get assigned a different, working TCH/H, which are still
available.

Nevertheless, it looks too harsh to keep this lchan broken forever without
even a second try.

(1b)
For dyn TS (TCH/F_TCH/H_PDCH), the situation is the same as for plain TCH/H.

Before being able to fix dyn TS, we should probably resolve the plain
TCH/* recovery.


(2)
With a 10 second delay hacked into the TCH/H channel deactivation ack
(activation ack back to normal), things look better. The lchan hits above
"Releasing it" condition and gets freed back to NONE state.

(2a)
For plain TCH/H, all is well.

20160824161705795 DRLL <0000> abis_rsl.c:1917 (bts=0,trx=0,ts=5,ss=1) SAPI=0 RELEASE INDICATION
20160824161705795 DRSL <0004> abis_rsl.c:807 (bts=0,trx=0,ts=5,ss=1) RF Channel Release
20160824161705798 DRSL <0004> abis_rsl.c:2334 (bts=0,trx=0,ts=5,ss=1) IPAC_DLCX_IND 
[...]
20160824161709798 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=1) TCH_H lchan broken: de-activation timeout
20160824161709798 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state RELEASE REQUESTED -> BROKEN UNUSABLE
[...]
20160824161715837 DRSL <0004> abis_rsl.c:864 (bts=0,trx=0,ts=5,ss=1) RF CHANNEL RELEASE ACK
20160824161715837 DRSL <0004> abis_rsl.c:882 (bts=0,trx=0,ts=5,ss=1) CHAN REL ACK for broken channel. Releasing it.
20160824161715837 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state BROKEN UNUSABLE -> NONE
OpenBSC> show lchan summary 
BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm

(2b)
For dyn TS TCH/F_TCH/H_PDCH, the situation is the same, but a switchover
back to PDCH operation is indeed missing. Testing and fixing now...

#4 Updated by zecke over 4 years ago

On 24 Aug 2016, at 16:24, neels [REDMINE] <> wrote:

Issue #1798 has been updated by neels.

(1)
With a 10 second delay hacked into the TCH/H channel activation ack from
osmo-bts, the act ack comes after the lchan->act_timer expired and the
channel is marked BROKEN_UNUSABLE.

yes, that is known (see my other gerrit change to release the channel in that case)

(2b)
For dyn TS TCH/F_TCH/H_PDCH, the situation is the same, but a switchover
back to PDCH operation is indeed missing. Testing and fixing now...

that was the point. Good you have an understanding of the issue now.

#5 Updated by neels over 4 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 90

#6 Updated by neels over 4 years ago

  • % Done changed from 90 to 100

760 was merged.

#7 Updated by laforge over 4 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)