Bug #6002
closedretention or rapid re-creation of ms entries.
90%
Description
I mailed (privately) pespin about this a couple of weeks ago.. because I saw a case where there were over 100,000 ms entries. It was in spanish so I won't repost here.
Can't really add more now anyway, other than just point that the issue exists. Those stats are not latest release, but the 100,000 was with latest.
This is just the latest to be added:
MS TLLI=92f209e0, IMSI= Timing advance (TA): 7 Coding scheme uplink: MCS-1 Coding scheme downlink: MCS-7 Mode: EGPRS MS class: 12 EGPRS MS class: 12 PACCH: 5 LLC queue length: 4 LLC queue octets: 36 RSSI: -110 dBm Bit error rate: 0 % Link quality: 7 dB Burst timing offset: 3/4 bit MS C value: 7 dB Downlink TBF: TFI=1, state=ASSIGN Current DL Throughput: 0 Kbps MS Statistics: Amount of DL CTRL messages scheduled: 113 (0/s 0/m 0/h 113/d)
Files
Updated by keith about 2 months ago
Here is some abbreviated output, recently restarted, only ~45 MS entries
root@NTQ0132425:~# echo show ms all | nc 127.0.0.2 4240 | egrep IMSI\|Downlink MS TLLI=7c11d60f, IMSI= MS TLLI=7863f6ff, IMSI= MS TLLI=7b891ff7, IMSI=334030313691195 MS TLLI=b9f06937, IMSI= Downlink TBF: TFI=0, state=FLOW MS TLLI=792dfbd5, IMSI=334020541836127 Downlink TBF: TFI=4, state=RELEASING MS TLLI=7d601e19, IMSI=334020226590995 Downlink TBF: TFI=3, state=FLOW MS TLLI=7e335d71, IMSI= MS TLLI=7834550c, IMSI=334020158026989 Downlink TBF: TFI=5, state=ASSIGN MS TLLI=7f8ceffb, IMSI=310170863677361 Downlink TBF: TFI=6, state=WAIT_RELEASE MS TLLI=7baabbbf, IMSI=334020413558539 MS TLLI=b38c9842, IMSI= Downlink TBF: TFI=0, state=FLOW MS TLLI=79d7ff7c, IMSI=334020340652796 MS TLLI=7f62e26b, IMSI=334020537559239 MS TLLI=da3d4629, IMSI= MS TLLI=bef5ff2c, IMSI= Downlink TBF: TFI=5, state=ASSIGN MS TLLI=7bda972f, IMSI=310280084665422 MS TLLI=7f506fa5, IMSI=334020530614734 MS TLLI=7c1f5a49, IMSI=334030213374099 Downlink TBF: TFI=2, state=FLOW MS TLLI=7bb5f4f7, IMSI=334020164165147 MS TLLI=86f325ff, IMSI= Downlink TBF: TFI=5, state=RELEASING MS TLLI=7ffcf337, IMSI=334020167966709 MS TLLI=7e616554, IMSI=334020345235695 Downlink TBF: TFI=6, state=FLOW MS TLLI=7c277f8f, IMSI=334020531404400 MS TLLI=7a28eb43, IMSI= Downlink TBF: TFI=5, state=WAIT_RELEASE MS TLLI=85ed392f, IMSI= Downlink TBF: TFI=6, state=ASSIGN MS TLLI=b59d8003, IMSI= Downlink TBF: TFI=1, state=ASSIGN MS TLLI=7b74ee54, IMSI=334020168476463 MS TLLI=9af058bc, IMSI= Downlink TBF: TFI=0, state=ASSIGN MS TLLI=7d9b4f0f, IMSI=334020549625048 MS TLLI=787381ed, IMSI=334020434401884 MS TLLI=7eba65db, IMSI=334030238827684 Downlink TBF: TFI=2, state=WAIT_RELEASE MS TLLI=a49dda18, IMSI= Downlink TBF: TFI=1, state=ASSIGN MS TLLI=be97c025, IMSI= Downlink TBF: TFI=3, state=ASSIGN MS TLLI=82a70839, IMSI= Downlink TBF: TFI=0, state=ASSIGN MS TLLI=7e7e63e6, IMSI=334050159266203 MS TLLI=99ea440f, IMSI= MS TLLI=8c9b1884, IMSI=334020540171967 MS TLLI=797d1e75, IMSI=334020544865524 MS TLLI=bce8f542, IMSI= Downlink TBF: TFI=1, state=ASSIGN MS TLLI=9a8c000d, IMSI= MS TLLI=b5e84c28, IMSI= MS TLLI=8250d00d, IMSI= Downlink TBF: TFI=1, state=ASSIGN MS TLLI=7b9cbeaf, IMSI=334020227017338 Downlink TBF: TFI=7, state=ASSIGN MS TLLI=c7eb333c, IMSI=334020158257675 Downlink TBF: TFI=4, state=FLOW MS TLLI=a1e9cfdc, IMSI= Downlink TBF: TFI=2, state=ASSIGN MS TLLI=9e88604f, IMSI= Downlink TBF: TFI=2, state=ASSIGN MS TLLI=8c8970e0, IMSI= MS TLLI=7c26ade0, IMSI=334030306866713 MS TLLI=86f37d89, IMSI= MS TLLI=7bb9f3ea, IMSI=334030316416837 Downlink TBF: TFI=3, state=ASSIGN root@NTQ0132425:~# echo show tbf all | nc 127.0.0.2 4240 | egrep IMSI\|state TBF: TFI=0 TLLI=0xb9f06937 (valid) TA=24 DIR=UL IMSI= created=1681570073 state=RELEASING flags=00000002 [CCCH:0, PACCH:2] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10 TBF: TFI=1 TLLI=0x7cdccea7 (valid) TA=5 DIR=UL IMSI= created=1681570071 state=RELEASING flags=00000002 [CCCH:0, PACCH:2] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12 TBF: TFI=0 TLLI=0x7a7cf77e (valid) TA=26 DIR=UL IMSI=334020160799321 created=1681570074 state=RELEASING flags=00000002 [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=0 TLLI=0xc7eb333c (valid) TA=3 DIR=UL IMSI=334020158257675 created=1681570076 state=FLOW flags=00000002 [CCCH:0, PACCH:2] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12 TBF: TFI=4 TLLI=0xc7eb333c (valid) TA=3 DIR=DL IMSI=334020158257675 created=1681570029 state=FLOW flags=00000006 [CCCH:0, PACCH:2] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12 TBF: TFI=6 TLLI=0x99ea440f (valid) TA=2 DIR=DL IMSI= created=1681570075 state=FLOW flags=00000006 [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=4 TLLI=0x7863f6ff (valid) TA=11 DIR=DL IMSI=334020164165147 created=1681570074 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10 TBF: TFI=3 TLLI=0x7a7cf77e (valid) TA=26 DIR=DL IMSI=334020160799321 created=1681570073 state=RELEASING flags=0000000e [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=4 TLLI=0x7e335d71 (valid) TA=7 DIR=DL IMSI=334020537559239 created=1681570074 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=4 TLLI=0x7c26ade0 (valid) TA=37 DIR=DL IMSI=334030306866713 created=1681570074 state=FLOW flags=00000005 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12 TBF: TFI=5 TLLI=0xb9f06937 (valid) TA=24 DIR=DL IMSI= created=1681570070 state=FINISHED flags=00000006 [CCCH:0, PACCH:2] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10 TBF: TFI=2 TLLI=0x7c1f5a49 (valid) TA=7 DIR=DL IMSI=334030213374099 created=1681570073 state=FLOW flags=00000005 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=5 TLLI=0x7c11d60f (valid) TA=23 DIR=DL IMSI=334020531404400 created=1681570071 state=WAIT_RELEASE flags=00000006 [CCCH:0, PACCH:2] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12 TBF: TFI=0 TLLI=0x792dfbd5 (valid) TA=8 DIR=DL IMSI=334020541836127 created=1681570071 state=FLOW flags=00000006 [CCCH:0, PACCH:2] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12 TBF: TFI=2 TLLI=0x86f325ff (valid) TA=34 DIR=DL IMSI= created=1681570068 state=RELEASING flags=0000000d [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=10/10 TBF: TFI=0 TLLI=0x9af058bc (valid) TA=3 DIR=DL IMSI= created=1681570054 state=FLOW flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12 TBF: TFI=5 TLLI=0xbef5ff2c (valid) TA=23 DIR=DL IMSI= created=1681570047 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=1 TLLI=0xb59d8003 (valid) TA=26 DIR=DL IMSI= created=1681570043 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12 TBF: TFI=6 TLLI=0x85ed392f (valid) TA=0 DIR=DL IMSI= created=1681569996 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12 TBF: TFI=1 TLLI=0xa49dda18 (valid) TA=5 DIR=DL IMSI= created=1681569996 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=1 TLLI=0x8250d00d (valid) TA=5 DIR=DL IMSI= created=1681569961 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12 TBF: TFI=1 TLLI=0xbce8f542 (valid) TA=6 DIR=DL IMSI= created=1681569952 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12 TBF: TFI=2 TLLI=0xa1e9cfdc (valid) TA=5 DIR=DL IMSI= created=1681569949 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12 TBF: TFI=3 TLLI=0xbe97c025 (valid) TA=9 DIR=DL IMSI= created=1681569936 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12 TBF: TFI=2 TLLI=0x9e88604f (valid) TA=3 DIR=DL IMSI= created=1681569934 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=4 1st_cTS=4 ctrl_TS=4 MS_CLASS=12/12 TBF: TFI=0 TLLI=0x82a70839 (valid) TA=5 DIR=DL IMSI= created=1681569927 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=12/12 TBF: TFI=7 TLLI=0x7b9cbeaf (valid) TA=5 DIR=DL IMSI=334020227017338 created=1681567259 state=ASSIGN flags=00000001 [CCCH:1, PACCH:0] 1st_TS=5 1st_cTS=5 ctrl_TS=5 MS_CLASS=12/12 TBF: TFI=3 TLLI=0x7bb9f3ea (valid) TA=26 DIR=DL IMSI=334030316416837 created=1681551956 state=ASSIGN flags=00000009 [CCCH:1, PACCH:0] 1st_TS=7 1st_cTS=7 ctrl_TS=7 MS_CLASS=12/12 root@NTQ0132425:~#
Updated by pespin about 1 month ago
This is a related fix, though I think it's not fixing the real problem I saw in there:
https://gerrit.osmocom.org/c/osmo-pcu/+/32348 ms: Fix MS without PTMSI not freed immediatelly
- All the leaked MS have no TBFs attached and TLLI=GSM_RESERVED_TMSI (0xFFFFFFFF)
Updated by pespin about 1 month ago
- Status changed from New to In Progress
- Assignee changed from keith to pespin
- % Done changed from 0 to 30
Updated by pespin about 1 month ago
I believe the mentioned leaking MS are somehow going through this path:
void ms_merge_and_clear_ms(struct GprsMs *ms, struct GprsMs *old_ms) ms_reset(old_ms); void ms_reset(struct GprsMs *ms) { LOGPMS(ms, DRLCMAC, LOGL_INFO, "Clearing MS object\n"); ms_release_timer_stop(ms); ms->tlli = GSM_RESERVED_TMSI; ms->new_dl_tlli = ms->tlli; ms->new_ul_tlli = ms->tlli; ms->imsi[0] = '\0'; }
Another option is that the MS is allocated, then its first TBF is attempted to be allocated but fails, and then maybe the MS is not released properly.
Updated by pespin about 1 month ago
- % Done changed from 30 to 70
This (and all previous patches) may help in getting rid of this issue:
https://gerrit.osmocom.org/c/osmo-pcu/+/32360 ms: Rewrite MS release lifecycle
- Check paths during tbf_alloc() which calls ms_alloc() but it may fail before in tbf->setup() it calls ms_attach_tbf(), hence when tbf is freed it won't go through the usual ms_detach_tbf() -> use_count becoming 0, and release procedure for MS is not started. Solution may be to add an extra ms_ref(func) in tbf_alloc, or even a ms_ref() inside ms_alloc which then the caller fo ms_alloc has to ms_unref().
- Add idle timeout information in VTY "show ms all".
Updated by pespin about 1 month ago
pespin wrote in #note-5:
Something I still need to look at / improve:
- Check paths during tbf_alloc() which calls ms_alloc() but it may fail before in tbf->setup() it calls ms_attach_tbf(), hence when tbf is freed it won't go through the usual ms_detach_tbf() -> use_count becoming 0, and release procedure for MS is not started. Solution may be to add an extra ms_ref(func) in tbf_alloc, or even a ms_ref() inside ms_alloc which then the caller fo ms_alloc has to ms_unref().
Should be fixed by: https://gerrit.osmocom.org/c/osmo-pcu/+/32375
- Add idle timeout information in VTY "show ms all".
TODO
Updated by pespin about 1 month ago
- Status changed from In Progress to Feedback
- % Done changed from 70 to 90
pespin wrote in #note-6:
- Add idle timeout information in VTY "show ms all".
TODO
Done here:
https://gerrit.osmocom.org/c/osmo-pcu/+/32394 ms: Log MS active/idle state in 'show ms' VTY commands
I think once merged we can attempt at giving it a try at some of the affected deployments and see how everything works. I expect the issue should be gone now.
Updated by keith about 1 month ago
I'll post this in case it helps, in the meantime I'll have to build a binary for debugging.. not today though.
Program received signal SIGSEGV, Segmentation fault. 0xb6dd8e14 in osmo_timer_pending () from /usr/lib/libosmocore.so.20 (gdb) bt #0 0xb6dd8e14 in osmo_timer_pending () from /usr/lib/libosmocore.so.20 #1 0x00032430 in gprs_rlcmac_tbf::timers_pending (this=0x3e6c6576, t=<optimized out>) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf.cpp:383 #2 0x00021264 in ms_reset (ms=ms@entry=0x33fec0) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/gprs_ms.c:414 #3 0x0002206c in ms_merge_and_clear_ms (ms=<optimized out>, old_ms=0x33fec0) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/gprs_ms.c:477 #4 0x00037568 in gprs_rlcmac_ul_tbf::rcv_data_block_acknowledged (this=0xbeb11b28, this@entry=0x22e220, rlc=0x70520, rlc@entry=0xbeb11b28, data=0x6fefc "../../git/src/tbf_ul.cpp", data@entry=0x24edca "}\002\300\202\277\314m\354\027\200I\312", meas=meas@entry=0xb6df69fc) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf_ul.cpp:351 #5 0x00049800 in gprs_rlcmac_pdch::rcv_data_block (this=this@entry=0x1560dc, data=data@entry=0x24edca "}\002\300\202\277\314m\354\027\200I\312", data_len=data_len@entry=27 '\033', fn=fn@entry=1184347, meas=0xbeb11c50, meas@entry=0xb6df69fc, cs=cs@entry=MCS1) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/pdch.cpp:1102 #6 0x0004bc6c in gprs_rlcmac_pdch::rcv_block (this=this@entry=0x1560dc, data=0x24edca "}\002\300\202\277\314m\354\027\200I\312", len=<optimized out>, fn=1184347, fn@entry=670320, meas=0xbeb11c50, meas@entry=0x15f00 <l1if_handle_l1prim+944>) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/pdch.cpp:1013 #7 0x00026aa0 in pcu_rx_data_ind_pdtch (bts=bts@entry=0x155740, pdch=0x1560dc, data=<optimized out>, len=<optimized out>, fn=1184347, meas=0xbeb11c50, meas@entry=0xbeb11c48) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/pcu_l1_if.cpp:349 #8 0x00015f00 in handle_ph_data_ind (fl1h=0x153288, fl1h=0x153288, l1p_msg=0x24ece0, data_ind=0x24eda8) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/osmo-bts-sysmo/sysmo_l1_if.c:220 #9 l1if_handle_l1prim (wq=<optimized out>, fl1h=0x153288, msg=0x24ece0) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/osmo-bts-sysmo/sysmo_l1_if.c:296 #10 0xb6dd010c in ?? () from /usr/lib/libosmocore.so.20 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) info locals No symbol table info available. (gdb) info args No symbol table info available. (gdb) up #1 0x00032430 in gprs_rlcmac_tbf::timers_pending (this=0x3e6c6576, t=<optimized out>) at /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf.cpp:383 383 /usr/src/debug/osmo-pcu/1.2.0+gitAUTOINC+c564ce8965-r1.18/git/src/tbf.cpp: No such file or directory. (gdb) info locals i = 0 '\000' (gdb) info args this = 0x3e6c6576 t = <optimized out>
Updated by pespin about 1 month ago
I think it should be fixed by:
https://gerrit.osmocom.org/c/osmo-pcu/+/32547 ms: Fix unsafe iterating freeing items in list
Updated by keith 27 days ago
- File clipboard-202305051418-6ocvc.png clipboard-202305051418-6ocvc.png added
- Status changed from Feedback to Resolved
I'm not observing the issue any more since applying the patch, as can be seen in this stats screenshot. thanks!