Project

General

Profile

Actions

Bug #6002

closed

retention or rapid re-creation of ms entries.

Added by keith 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
04/14/2023
Due date:
% Done:

90%

Spec Reference:

Description

I mailed (privately) pespin about this a couple of weeks ago.. because I saw a case where there were over 100,000 ms entries. It was in spanish so I won't repost here.

Can't really add more now anyway, other than just point that the issue exists. Those stats are not latest release, but the 100,000 was with latest.

This is just the latest to be added:

MS TLLI=92f209e0, IMSI=
  Timing advance (TA):    7
  Coding scheme uplink:   MCS-1
  Coding scheme downlink: MCS-7
  Mode:                   EGPRS
  MS class:               12
  EGPRS MS class:         12
  PACCH:                  5 
  LLC queue length:       4
  LLC queue octets:       36
  RSSI:                   -110 dBm
  Bit error rate:         0 %
  Link quality:           7 dB
  Burst timing offset:    3/4 bit
  MS C value:             7 dB
  Downlink TBF:           TFI=1, state=ASSIGN
  Current DL Throughput:  0 Kbps 
  MS Statistics:
   Amount of DL CTRL messages scheduled:      113 (0/s 0/m 0/h 113/d)


Files

Actions #1

Updated by keith 11 months ago

Here is some abbreviated output, recently restarted, only ~45 MS entries


root@NTQ0132425:~# echo show ms all | nc 127.0.0.2 4240 | egrep IMSI\|Downlink 
MS TLLI=7c11d60f, IMSI=
MS TLLI=7863f6ff, IMSI=
MS TLLI=7b891ff7, IMSI=334030313691195
MS TLLI=b9f06937, IMSI=
  Downlink TBF:           TFI=0, state=FLOW
MS TLLI=792dfbd5, IMSI=334020541836127
  Downlink TBF:           TFI=4, state=RELEASING
MS TLLI=7d601e19, IMSI=334020226590995
  Downlink TBF:           TFI=3, state=FLOW
MS TLLI=7e335d71, IMSI=
MS TLLI=7834550c, IMSI=334020158026989
  Downlink TBF:           TFI=5, state=ASSIGN
MS TLLI=7f8ceffb, IMSI=310170863677361
  Downlink TBF:           TFI=6, state=WAIT_RELEASE
MS TLLI=7baabbbf, IMSI=334020413558539
MS TLLI=b38c9842, IMSI=
  Downlink TBF:           TFI=0, state=FLOW
MS TLLI=79d7ff7c, IMSI=334020340652796
MS TLLI=7f62e26b, IMSI=334020537559239
MS TLLI=da3d4629, IMSI=
MS TLLI=bef5ff2c, IMSI=
  Downlink TBF:           TFI=5, state=ASSIGN
MS TLLI=7bda972f, IMSI=310280084665422
MS TLLI=7f506fa5, IMSI=334020530614734
MS TLLI=7c1f5a49, IMSI=334030213374099
  Downlink TBF:           TFI=2, state=FLOW
MS TLLI=7bb5f4f7, IMSI=334020164165147
MS TLLI=86f325ff, IMSI=
  Downlink TBF:           TFI=5, state=RELEASING
MS TLLI=7ffcf337, IMSI=334020167966709
MS TLLI=7e616554, IMSI=334020345235695
  Downlink TBF:           TFI=6, state=FLOW
MS TLLI=7c277f8f, IMSI=334020531404400
MS TLLI=7a28eb43, IMSI=
  Downlink TBF:           TFI=5, state=WAIT_RELEASE
MS TLLI=85ed392f, IMSI=
  Downlink TBF:           TFI=6, state=ASSIGN
MS TLLI=b59d8003, IMSI=
  Downlink TBF:           TFI=1, state=ASSIGN
MS TLLI=7b74ee54, IMSI=334020168476463
MS TLLI=9af058bc, IMSI=
  Downlink TBF:           TFI=0, state=ASSIGN
MS TLLI=7d9b4f0f, IMSI=334020549625048
MS TLLI=787381ed, IMSI=334020434401884
MS TLLI=7eba65db, IMSI=334030238827684
  Downlink TBF:           TFI=2, state=WAIT_RELEASE
MS TLLI=a49dda18, IMSI=
  Downlink TBF:           TFI=1, state=ASSIGN
MS TLLI=be97c025, IMSI=
  Downlink TBF:           TFI=3, state=ASSIGN
MS TLLI=82a70839, IMSI=
  Downlink TBF:           TFI=0, state=ASSIGN
MS TLLI=7e7e63e6, IMSI=334050159266203
MS TLLI=99ea440f, IMSI=
MS TLLI=8c9b1884, IMSI=334020540171967
MS TLLI=797d1e75, IMSI=334020544865524
MS TLLI=bce8f542, IMSI=
  Downlink TBF:           TFI=1, state=ASSIGN
MS TLLI=9a8c000d, IMSI=
MS TLLI=b5e84c28, IMSI=
MS TLLI=8250d00d, IMSI=
  Downlink TBF:           TFI=1, state=ASSIGN
MS TLLI=7b9cbeaf, IMSI=334020227017338
  Downlink TBF:           TFI=7, state=ASSIGN
MS TLLI=c7eb333c, IMSI=334020158257675
  Downlink TBF:           TFI=4, state=FLOW
MS TLLI=a1e9cfdc, IMSI=
  Downlink TBF:           TFI=2, state=ASSIGN
MS TLLI=9e88604f, IMSI=
  Downlink TBF:           TFI=2, state=ASSIGN
MS TLLI=8c8970e0, IMSI=
MS TLLI=7c26ade0, IMSI=334030306866713
MS TLLI=86f37d89, IMSI=
MS TLLI=7bb9f3ea, IMSI=334030316416837
  Downlink TBF:           TFI=3, state=ASSIGN

root@NTQ0132425:~# echo show tbf all | nc 127.0.0.2 4240 | egrep IMSI\|state
TBF: TFI=0 TLLI=0xb9f06937 (valid) TA=24 DIR=UL IMSI=
 created=1681570073 state=RELEASING flags=00000002 [CCCH:0, PACCH:2] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10
TBF: TFI=1 TLLI=0x7cdccea7 (valid) TA=5 DIR=UL IMSI=
 created=1681570071 state=RELEASING flags=00000002 [CCCH:0, PACCH:2] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12
TBF: TFI=0 TLLI=0x7a7cf77e (valid) TA=26 DIR=UL IMSI=334020160799321
 created=1681570074 state=RELEASING flags=00000002 [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=0 TLLI=0xc7eb333c (valid) TA=3 DIR=UL IMSI=334020158257675
 created=1681570076 state=FLOW flags=00000002 [CCCH:0, PACCH:2] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12
TBF: TFI=4 TLLI=0xc7eb333c (valid) TA=3 DIR=DL IMSI=334020158257675
 created=1681570029 state=FLOW flags=00000006 [CCCH:0, PACCH:2] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12
TBF: TFI=6 TLLI=0x99ea440f (valid) TA=2 DIR=DL IMSI=
 created=1681570075 state=FLOW flags=00000006 [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=4 TLLI=0x7863f6ff (valid) TA=11 DIR=DL IMSI=334020164165147
 created=1681570074 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10
TBF: TFI=3 TLLI=0x7a7cf77e (valid) TA=26 DIR=DL IMSI=334020160799321
 created=1681570073 state=RELEASING flags=0000000e [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=4 TLLI=0x7e335d71 (valid) TA=7 DIR=DL IMSI=334020537559239
 created=1681570074 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=4 TLLI=0x7c26ade0 (valid) TA=37 DIR=DL IMSI=334030306866713
 created=1681570074 state=FLOW flags=00000005 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12
TBF: TFI=5 TLLI=0xb9f06937 (valid) TA=24 DIR=DL IMSI=
 created=1681570070 state=FINISHED flags=00000006 [CCCH:0, PACCH:2] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10
TBF: TFI=2 TLLI=0x7c1f5a49 (valid) TA=7 DIR=DL IMSI=334030213374099
 created=1681570073 state=FLOW flags=00000005 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=5 TLLI=0x7c11d60f (valid) TA=23 DIR=DL IMSI=334020531404400
 created=1681570071 state=WAIT_RELEASE flags=00000006 [CCCH:0, PACCH:2] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12
TBF: TFI=0 TLLI=0x792dfbd5 (valid) TA=8 DIR=DL IMSI=334020541836127
 created=1681570071 state=FLOW flags=00000006 [CCCH:0, PACCH:2] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12
TBF: TFI=2 TLLI=0x86f325ff (valid) TA=34 DIR=DL IMSI=
 created=1681570068 state=RELEASING flags=0000000d [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10
TBF: TFI=0 TLLI=0x9af058bc (valid) TA=3 DIR=DL IMSI=
 created=1681570054 state=FLOW flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12
TBF: TFI=5 TLLI=0xbef5ff2c (valid) TA=23 DIR=DL IMSI=
 created=1681570047 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=1 TLLI=0xb59d8003 (valid) TA=26 DIR=DL IMSI=
 created=1681570043 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12
TBF: TFI=6 TLLI=0x85ed392f (valid) TA=0 DIR=DL IMSI=
 created=1681569996 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12
TBF: TFI=1 TLLI=0xa49dda18 (valid) TA=5 DIR=DL IMSI=
 created=1681569996 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=1 TLLI=0x8250d00d (valid) TA=5 DIR=DL IMSI=
 created=1681569961 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12
TBF: TFI=1 TLLI=0xbce8f542 (valid) TA=6 DIR=DL IMSI=
 created=1681569952 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12
TBF: TFI=2 TLLI=0xa1e9cfdc (valid) TA=5 DIR=DL IMSI=
 created=1681569949 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12
TBF: TFI=3 TLLI=0xbe97c025 (valid) TA=9 DIR=DL IMSI=
 created=1681569936 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12
TBF: TFI=2 TLLI=0x9e88604f (valid) TA=3 DIR=DL IMSI=
 created=1681569934 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12
TBF: TFI=0 TLLI=0x82a70839 (valid) TA=5 DIR=DL IMSI=
 created=1681569927 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12
TBF: TFI=7 TLLI=0x7b9cbeaf (valid) TA=5 DIR=DL IMSI=334020227017338
 created=1681567259 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12
TBF: TFI=3 TLLI=0x7bb9f3ea (valid) TA=26 DIR=DL IMSI=334030316416837
 created=1681551956 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12
root@NTQ0132425:~# 
Actions #2

Updated by pespin 11 months ago

This is a related fix, though I think it's not fixing the real problem I saw in there:
https://gerrit.osmocom.org/c/osmo-pcu/+/32348 ms: Fix MS without PTMSI not freed immediatelly

What I observed:
  • All the leaked MS have no TBFs attached and TLLI=GSM_RESERVED_TMSI (0xFFFFFFFF)
Actions #3

Updated by pespin 11 months ago

  • Status changed from New to In Progress
  • Assignee changed from keith to pespin
  • % Done changed from 0 to 30
Actions #4

Updated by pespin 11 months ago

I believe the mentioned leaking MS are somehow going through this path:

void ms_merge_and_clear_ms(struct GprsMs *ms, struct GprsMs *old_ms)
     ms_reset(old_ms);

void ms_reset(struct GprsMs *ms)
{
    LOGPMS(ms, DRLCMAC, LOGL_INFO, "Clearing MS object\n");

    ms_release_timer_stop(ms);

    ms->tlli = GSM_RESERVED_TMSI;
    ms->new_dl_tlli = ms->tlli;
    ms->new_ul_tlli = ms->tlli;
    ms->imsi[0] = '\0';
}

Another option is that the MS is allocated, then its first TBF is attempted to be allocated but fails, and then maybe the MS is not released properly.

Actions #5

Updated by pespin 10 months ago

  • % Done changed from 30 to 70

This (and all previous patches) may help in getting rid of this issue:
https://gerrit.osmocom.org/c/osmo-pcu/+/32360 ms: Rewrite MS release lifecycle

Something I still need to look at / improve:
  • Check paths during tbf_alloc() which calls ms_alloc() but it may fail before in tbf->setup() it calls ms_attach_tbf(), hence when tbf is freed it won't go through the usual ms_detach_tbf() -> use_count becoming 0, and release procedure for MS is not started. Solution may be to add an extra ms_ref(func) in tbf_alloc, or even a ms_ref() inside ms_alloc which then the caller fo ms_alloc has to ms_unref().
  • Add idle timeout information in VTY "show ms all".
Actions #6

Updated by pespin 10 months ago

pespin wrote in #note-5:

Something I still need to look at / improve:
  • Check paths during tbf_alloc() which calls ms_alloc() but it may fail before in tbf->setup() it calls ms_attach_tbf(), hence when tbf is freed it won't go through the usual ms_detach_tbf() -> use_count becoming 0, and release procedure for MS is not started. Solution may be to add an extra ms_ref(func) in tbf_alloc, or even a ms_ref() inside ms_alloc which then the caller fo ms_alloc has to ms_unref().

Should be fixed by: https://gerrit.osmocom.org/c/osmo-pcu/+/32375

  • Add idle timeout information in VTY "show ms all".

TODO

Actions #7

Updated by pespin 10 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 70 to 90

pespin wrote in #note-6:

  • Add idle timeout information in VTY "show ms all".

TODO

Done here:
https://gerrit.osmocom.org/c/osmo-pcu/+/32394 ms: Log MS active/idle state in 'show ms' VTY commands

I think once merged we can attempt at giving it a try at some of the affected deployments and see how everything works. I expect the issue should be gone now.

Actions #8

Updated by keith 10 months ago

I'll post this in case it helps, in the meantime I'll have to build a binary for debugging.. not today though.

Program received signal SIGSEGV, Segmentation fault.                                                                                                                                  
0xb6dd8e14 in osmo_timer_pending () from /usr/lib/libosmocore.so.20                                                                                                                   
(gdb) bt                                                                                                                                                                              
#0  0xb6dd8e14 in osmo_timer_pending () from /usr/lib/libosmocore.so.20                                                                                                               
#1  0x00032430 in gprs_rlcmac_tbf::timers_pending (this=0x3e6c6576, t=<optimized out>)                                                                                                
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf.cpp:383                                                                                                  
#2  0x00021264 in ms_reset (ms=ms@entry=0x33fec0)                                                                                                                                     
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/gprs_ms.c:414
#3  0x0002206c in ms_merge_and_clear_ms (ms=<optimized out>, old_ms=0x33fec0)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/gprs_ms.c:477
#4  0x00037568 in gprs_rlcmac_ul_tbf::rcv_data_block_acknowledged (this=0xbeb11b28,
    this@entry=0x22e220, rlc=0x70520, rlc@entry=0xbeb11b28,
    data=0x6fefc "../../git/src/tbf_ul.cpp",
    data@entry=0x24edca "}\002\300\202\277\314m\354\027\200I\312",
    meas=meas@entry=0xb6df69fc)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf_ul.cpp:351
#5  0x00049800 in gprs_rlcmac_pdch::rcv_data_block (this=this@entry=0x1560dc,
    data=data@entry=0x24edca "}\002\300\202\277\314m\354\027\200I\312",
    data_len=data_len@entry=27 '\033', fn=fn@entry=1184347, meas=0xbeb11c50,
    meas@entry=0xb6df69fc, cs=cs@entry=MCS1)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/pdch.cpp:1102
#6  0x0004bc6c in gprs_rlcmac_pdch::rcv_block (this=this@entry=0x1560dc,
    data=0x24edca "}\002\300\202\277\314m\354\027\200I\312", len=<optimized out>,
    fn=1184347, fn@entry=670320, meas=0xbeb11c50,
    meas@entry=0x15f00 <l1if_handle_l1prim+944>)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/pdch.cpp:1013
#7  0x00026aa0 in pcu_rx_data_ind_pdtch (bts=bts@entry=0x155740, pdch=0x1560dc,
    data=<optimized out>, len=<optimized out>, fn=1184347, meas=0xbeb11c50,
    meas@entry=0xbeb11c48)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/pcu_l1_if.cpp:349
#8  0x00015f00 in handle_ph_data_ind (fl1h=0x153288, fl1h=0x153288, l1p_msg=0x24ece0,
    data_ind=0x24eda8)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/osmo-bts-sysmo/sysmo_l1_if.c:220                                                                            
#9  l1if_handle_l1prim (wq=<optimized out>, fl1h=0x153288, msg=0x24ece0)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/osmo-bts-sysmo/sysmo_l1_if.c:296                                                                            
#10 0xb6dd010c in ?? () from /usr/lib/libosmocore.so.20
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) info locals
No symbol table info available.
(gdb) info args
No symbol table info available.
(gdb) up
#1  0x00032430 in gprs_rlcmac_tbf::timers_pending (this=0x3e6c6576, t=<optimized out>)
    at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf.cpp:383
383     /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf.cpp: No such file or directory.
(gdb) info locals
i = 0 '\000'
(gdb) info args
this = 0x3e6c6576
t = <optimized out>

Actions #9

Updated by pespin 10 months ago

I think it should be fixed by:
https://gerrit.osmocom.org/c/osmo-pcu/+/32547 ms: Fix unsafe iterating freeing items in list

Actions #10

Updated by pespin 10 months ago

  • Assignee changed from pespin to keith

Assigning to keith for feedback.
Once the issue is considered solved by him we can close this ticket.

Actions #11

Updated by keith 10 months ago

I'm not observing the issue any more since applying the patch, as can be seen in this stats screenshot. thanks!

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)