PDP contexts not cleared/released if GGSN is restarted
When re-starting OpenGGSN while OsmoSGSN remains running and has PDP contexts established, the PDP contexts are not cleared.
All MS-originated GTP-U messages are rejected by the GGSN with an "Error Indication". This Error Indication is insufficient to close that specific PDP context.
Also the concept of the gsn restart counter doesn't appear to be helping? Let's find out why
#2 Updated by laforge about 1 year ago
- Status changed from New to In Progress
- % Done changed from 0 to 20
I think the problem is threefold:
- the SGSN doesn't seem to use the ECHO req/resp procedure to poll the GSN restart counter of the GGSN and thus doesn't know about a restart
- the GGSN isn't including mandatory information elements in the GTP Error Indication, which would be required to resolve the GTP context that's causing the error (#2434)
- the SGSN isn't resolving the PDP context on an incoming GTP Error Indication
#3 Updated by laforge about 1 year ago
- % Done changed from 20 to 50
Change-Id: I3e843f9ef1d6fd7868cc992e083c0891d16b6da9 adresses the libgtp part of matching the PDP context and deleting it: https://gerrit.osmocom.org/3503
However, now we have the problem that the SGSN is not properly informing the MS about this, i.e. the MS continues to send packets for that PDP context.
cb_delete_context() is called for PDP context deletion, but this is called for both those PDP contexts that the SGSN has previously requested to be deleted, as well as for those that were "unilaterally" deleted by libgtp. We need to work out something here...
I did some work related to this topic:
remote: New Changes:
remote: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9932 sgsn_libgtp.c: Log pointer of deleted context
remote: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9933 Maintain per ggsn pdp ctx list
remote: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9934 osmo-sgsn: ping GGSN periodically and check for restart counter
remote: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9935 Disarm T3395 when dettaching mmctx from pdpctx
remote: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9936 examples: Enable by default echo req towards active GGSNs
So basically echo loop seems to be now working for osmo-sgsn. Still need to add support for it in osmo-ggsn. I also fixed some bugs I found while triggering new scenarios.
About the issue mentioned by laforge (#2434), I still need to check if it's actually solved, because i recall seeing some Error Indications without the restart counter
I did some TTCN3 work, which is almost working:
remote: https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/9949 lib: GTP_Emulation: Allow receiving packets with TEID 0
remote: https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/9950 sgsn: Add test to verify restart_ctr during echo req/reply.
Problem is code I took from other parts to match against the PDP CTX DEL REQ at the end doesn't match correctly what I see in wireshark (expected result I think). That TTCN3 code is also used in TC_attach_pdp_act but I see that it actuallt is also failing in current master for same reason.
- ECHO REQUEST/REPLY seems to be working fine, as well as detection of restartCTR increase and cleanup of pdp ctx.
- TTCN3 test is working when running standalone, and other tests are not broken (still work) with my changes. However, the 2 already broken tests before my test leave the SGSN in a weird state which then makes my test not pass when run together with them. I'm trying to fix those tests in order to have everything passing.
osmo-sgsn patches can be found in: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9993/
libgtp (osmo-ggsn) patches can be found in: https://gerrit.osmocom.org/#/c/osmo-ggsn/+/9989/
TTCN3 patches can be found in: https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/9950/
- Fix previous failing tests: GTP CTX DEL REQ initiated by GGSN was not implemented. Right now I'm at the point where the REQ arrives to th osmo-sgsn cb_delete_context() func. It needs to forward the message to the PCU now.
- Error Ind probably still don't have the RestartCounter, need to check.
- Status changed from In Progress to Feedback
- Priority changed from High to Normal
- % Done changed from 50 to 90
It turns out Error Indications don't include Restart Counter (not even optionally), so GGSN is sending all the correct information. I added a test to verify SGSN deactivates the ctx at receival of ErrorInd in https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/10037/
All related bits from this task are covered in the whole bunch of commits I submitted to osmo-ggsn (libgtp) and osmo-sgsn.