fix subscr_conn fsm: safely catch all compl-l3 failures, properly handle all release situations
Various reports and patches pop up with various people about the MSC's subscr_conn FSM not handling specific corner cases properly.
- If anything goes wrong during compl-l3, the FSM might think that it is busy with auth+ciph. Need a separate state for auth+ciph; then at the end of msc_compl_l3() discard any conn that is still in state 'NEW'.
- For failure situations causing premature conn release, properly handle release messages and receive responses in a separate 'RELEASING' state.
- In the course of that, it may make sense to refactor:
- closely tie the FSM with the struct gsm_subscriber_connection. Historically, the ownership was shared between libbsc and libmsc, complicating the ref-count in that the FSM was a separate entity. It should be possible to refactor the conn struct and the FSM as "a single entity", triggering a release event by the ref-count reaching zero, instead of needing explicit "release if unused" events.
CM Service Requests may actually overlap. The conn->received_cm_service_request however is a boolean, which means that we possibly lose the pending-ness of a second CM Service Request if a first one concludes at just the wrong time, or if two come in "consecutively".-> #3156
That's a lot to ask for in a single issue, but it makes sense to tie all of these items into a refactoring of the subscr_conn FSM.
#4 Updated by neels almost 2 years ago
Testing against the current ttcn3 test suite yields 6 tests being fixed (proper Clear Request / Clear Complete messages now).
But a corner case (#3062) is re-raised by the changes, still need to address that (shouldn't be too hard). It should also be part of the ttcn3 (or at least some) test suite.
I would like to get this merged sooner rather than later and get back to inter-bsc HO, but #3062 shows that I need to be patient enough to not break things that had workarounds before.
Also still trying to reproduce #3125 in ttcn3, see there. Took me a lot of time to get a simple MNCC REL REQ case going (mostly log interpretation retard: this time I got mixed up between MNCC Alerting vs. DTAP CC Alerting, and then it took forever to figure out that I need to expect an IPACC DLCX + ACK to not run into T_guard...) -- now it still needs to actually trigger the bug instead of succeeding, so that I can see whether the new code fixes the bug.
#5 Updated by neels almost 2 years ago
- Subject changed from fix subscr_conn fsm: safely catch all compl-l3 failures, properly handle all release situations, handle overlapping CM Service Requests to fix subscr_conn fsm: safely catch all compl-l3 failures, properly handle all release situations
- Description updated (diff)
- % Done changed from 60 to 90
I think handling overlapping CM Service Requests should be a separate issue -> #3156