Project

General

Profile

Actions

Bug #1952

open

IPv6 support for inner (user) IP layer missing

Added by laforge about 7 years ago. Updated 13 days ago.

Status:
In Progress
Priority:
High
Assignee:
Target version:
-
Start date:
02/18/2017
Due date:
% Done:

70%

Spec Reference:

Description

The 3GPP specs permit for both IPv4 and IPv6, but we only implement IPv4. This is embarrassing and should be changed. Contribution welcome!


Files

test2-netns-gtp-ipv4-over-ipv6.sh test2-netns-gtp-ipv4-over-ipv6.sh 1.94 KB pablo, 10/27/2023 02:30 PM
test2-netns-gtp-ipv6.sh test2-netns-gtp-ipv6.sh 1.64 KB pablo, 10/27/2023 02:30 PM
test2-netns-gtp-ipv6-over-ipv4.sh test2-netns-gtp-ipv6-over-ipv4.sh 2.04 KB pablo, 10/27/2023 02:30 PM
0002-link-gtp-add-support-for-local-listener-address.patch 0002-link-gtp-add-support-for-local-listener-address.patch 3.16 KB pablo, 10/27/2023 02:34 PM
0001-uapi-cache-linux-gtp.h-header.patch 0001-uapi-cache-linux-gtp.h-header.patch 1.45 KB pablo, 10/27/2023 02:34 PM
gtp-link-ipv4-ipv6.patch gtp-link-ipv4-ipv6.patch 4.1 KB pablo, 10/31/2023 11:06 AM
0001-gtp-link-add-IPv6-support.patch 0001-gtp-link-add-IPv6-support.patch 4.7 KB pablo, 11/15/2023 11:32 PM
test2-netns-gtp.sh test2-netns-gtp.sh 1.95 KB pablo, 11/15/2023 11:44 PM
test2-netns-gtp-ipv4-over-ipv6.sh test2-netns-gtp-ipv4-over-ipv6.sh 2 KB pablo, 11/15/2023 11:44 PM
test2-netns-gtp-ipv6.sh test2-netns-gtp-ipv6.sh 1.73 KB pablo, 11/15/2023 11:44 PM
test2-netns-gtp-ipv6-over-ipv4.sh test2-netns-gtp-ipv6-over-ipv4.sh 1.84 KB pablo, 11/15/2023 11:44 PM
GGSN_Tests.TC_pdp6_act_deact_gtpu_access.pcap.gz GGSN_Tests.TC_pdp6_act_deact_gtpu_access.pcap.gz 4.37 KB laforge, 11/16/2023 11:22 AM
0001-gtp-pass-up-link-local-traffic-to-userspace-socket.patch 0001-gtp-pass-up-link-local-traffic-to-userspace-socket.patch 988 Bytes pablo, 01/30/2024 09:33 AM
0001-gtp-use-IPv6-address-64-prefix-for-UE-MS.patch 0001-gtp-use-IPv6-address-64-prefix-for-UE-MS.patch 2.61 KB pablo, 01/30/2024 11:23 AM
0001-gtp-identify-tunnel-via-GTP-version-TEID-family.patch 0001-gtp-identify-tunnel-via-GTP-version-TEID-family.patch 7.01 KB pablo, 01/30/2024 12:18 PM
0001-gtp-allow-IPv6-only-listener-socket.patch 0001-gtp-allow-IPv6-only-listener-socket.patch 1020 Bytes pablo, 01/31/2024 06:20 PM
0001-gtp-link-add-IPv6-support.patch 0001-gtp-link-add-IPv6-support.patch 4.69 KB pablo, 01/31/2024 06:44 PM
0002-gtp-link-set-IPv6-socket-only.patch 0002-gtp-link-set-IPv6-socket-only.patch 1.27 KB pablo, 01/31/2024 06:44 PM
0003-gtp-provide-interface-to-set-family.patch 0003-gtp-provide-interface-to-set-family.patch 3.44 KB pablo, 01/31/2024 06:44 PM
0004-gtp-add-flags-to-gtp_tunnel-object.patch 0004-gtp-add-flags-to-gtp_tunnel-object.patch 7.2 KB pablo, 01/31/2024 06:44 PM
0005-gtp-genl-display-gtp-device-in-listing.patch 0005-gtp-genl-display-gtp-device-in-listing.patch 1.54 KB pablo, 01/31/2024 06:44 PM
0006-gtp-tunnel-display-i_tei-in-help.patch 0006-gtp-tunnel-display-i_tei-in-help.patch 800 Bytes pablo, 01/31/2024 06:44 PM
0001-fix-for-gtp-link-add-IPv6-support.patch 0001-fix-for-gtp-link-add-IPv6-support.patch 1.27 KB pablo, 02/14/2024 10:34 PM
0001-gtp-link-add-IPv6-support.patch 0001-gtp-link-add-IPv6-support.patch 4.69 KB pablo, 02/14/2024 10:59 PM
0002-gtp-link-set-IPv6-socket-only.patch 0002-gtp-link-set-IPv6-socket-only.patch 1.27 KB pablo, 02/14/2024 10:59 PM
0003-gtp-provide-interface-to-set-family.patch 0003-gtp-provide-interface-to-set-family.patch 3.44 KB pablo, 02/14/2024 10:59 PM
0004-gtp-add-flags-to-gtp_tunnel-object.patch 0004-gtp-add-flags-to-gtp_tunnel-object.patch 7.2 KB pablo, 02/14/2024 10:59 PM
0005-gtp-genl-display-gtp-device-in-listing.patch 0005-gtp-genl-display-gtp-device-in-listing.patch 1.54 KB pablo, 02/14/2024 10:59 PM
0006-gtp-tunnel-display-i_tei-in-help.patch 0006-gtp-tunnel-display-i_tei-in-help.patch 800 Bytes pablo, 02/14/2024 10:59 PM
0007-fix-for-gtp-link-add-IPv6-support.patch 0007-fix-for-gtp-link-add-IPv6-support.patch 1.27 KB pablo, 02/14/2024 10:59 PM
0001-gtp-link-allow-to-specify-listener-address.patch 0001-gtp-link-allow-to-specify-listener-address.patch 4.44 KB pablo, 02/15/2024 08:59 PM

Related issues

Related to OsmoGGSN (former OpenGGSN) - Feature #6096: add support for kernel-GTP IPv6In Progressosmith07/12/2023

Actions
Related to Linux Kernel GTP-U - Bug #6123: IPv4/IPv6 address family has to be the same for inner layer and outer layerResolvedpablo07/31/2023

Actions
Actions #1

Updated by laforge 8 months ago

  • Status changed from New to In Progress
  • Assignee set to pablo
  • Priority changed from Normal to High
  • % Done changed from 0 to 90

This has actually been implemented by pablo some time ago, the kernel code is in https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/ while the libgtpnl extension is in https://gitea.osmocom.org/cellular-infrastructure/libgtpnl/src/branch/pablo/ipv6

We need to adjust osmo-ggsn support this in order to run the existing IPv6 TTCN3 test suite against it.

Actions #2

Updated by laforge 8 months ago

  • Related to Feature #6096: add support for kernel-GTP IPv6 added
Actions #4

Updated by laforge 7 months ago

  • Related to Bug #6123: IPv4/IPv6 address family has to be the same for inner layer and outer layer added
Actions #5

Updated by pablo 5 months ago

kbuild robot reported via sparse tool:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git main
head:   230b7c1e9db7fcb2b796845017ce64f35e7430f1
commit: cb1bf181ddcc4a7f28deb5cde00e14ed37b7c3d9 [8/11] gtp: remove IPv4 and IPv6 header from context object
config: i386-randconfig-062-20231014 (https://download.01.org/0day-ci/archive/20231014/202310142009.vKJll0Uf-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231014/202310142009.vKJll0Uf-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310142009.vKJll0Uf-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/net/gtp.c:1197:45: sparse: sparse: incorrect type in argument 9 (different base types) @@     expected restricted __be32
 [usertype] label @@     got unsigned char [addressable] [usertype] tos @@
   drivers/net/gtp.c:1197:45: sparse:     expected restricted __be32 [usertype] label
   drivers/net/gtp.c:1197:45: sparse:     got unsigned char [addressable] [usertype] tos

Problem is here:

  1192  #if IS_ENABLED(CONFIG_IPV6)
  1193                  udp_tunnel6_xmit_skb(&pktinfo.rt6->dst, pktinfo.sk, skb, dev,
  1194                                       &pktinfo.fl6.saddr, &pktinfo.fl6.daddr,
  1195                                       0,
  1196                                       ip6_dst_hoplimit(&pktinfo.rt->dst),
> 1197                                       pktinfo.tos,
  1198                                       pktinfo.gtph_port, pktinfo.gtph_port,
  1199                                       false);
  1200  #else
  1201                  goto tx_err;
  1202  #endif
  1203                  break;
  1204          }
  1205
  1206          return NETDEV_TX_OK;
  1207  tx_err:
  1208          dev->stats.tx_errors++;
  1209          dev_kfree_skb(skb);
  1210          return NETDEV_TX_OK;
  1211  }

This is a real bug in ("gtp: add IPv6 support"), I accidentally swapped ToS and flowlabel. This is the fix I have collapsed:

  1193                  udp_tunnel6_xmit_skb(&pktinfo.rt6->dst, pktinfo.sk, skb, dev,
  1194                                       &pktinfo.fl6.saddr, &pktinfo.fl6.daddr,
> 1195                                       pktinfo.tos,
  1196                                       ip6_dst_hoplimit(&pktinfo.rt->dst),
~ 1197                                       0,
  1198                                       pktinfo.gtph_port, pktinfo.gtph_port,
  1199                                       false);

This is now fixed in gtp.git, I have rebased and forced a push, please make sure you work with a fresh gtp.git clone when testing, thanks.

Actions #6

Updated by osmith 4 months ago

Hi pablo,

an initial version of IPv6 support has been merged into libgtpnl (the patches had to be reworked to address points brought up in code review): https://gitea.osmocom.org/cellular-infrastructure/libgtpnl/commit/a295615229747e938f14514da14aabd595d8d1b5

To better understand how libgtpnl APIs should be used from osmo-ggsn for type-support v4v6, and to make sure that it works as expected without using osmo-ggsn, I've started adding test cases to libgtpnl. The tests build a small initrd with gtp-tunnel, gtp-link, busybox and test shell scripts, and run it in QEMU (so it can run against a kernel built from source).

libgtpnl.git branch: osmith/qemu-tests at https://gerrit.osmocom.org/libgtpnl (browse, also on gitea, but it takes a bit to sync)

$ autoreconf -fi
$ ./configure --enable-qemu-tests
$ make check

Here are the current test results (see tests/qemu/initrd-init.sh):

run_test 01_ms_ip4_sgsn_ip4.sh  # OK
# run_test 02_ms_ip4_sgsn_ip6.sh  # NOK: kernel panic
# run_test 03_ms_ip6_sgsn_ip4.sh  # NOK: ping doesn't work
# run_test 04_ms_ip6_sgsn_ip6.sh  # NOK: ping doesn't work
# run_test 05_ms_ip46_sgsn_ip4.sh  # TODO

Could you take a look at the kernel panic (see below)? I'm running HEAD of https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/log/ (da3924c0, "gtp: support for IPv4-in-IPv6-GTP and IPv6-in-IPv4-GTP").

For tests 03 and 04, could you check if I'm using it wrong, or if there might be a bug in the kernel?

Thanks!

PS: I found that I probably need to add gtp_tunnel_set_family() again (which was in your patch) to be able to properly delete a tunnel (otherwise, it is not clear if the IPv4 or IPv6 variant of a tunnel with the same i_tei and o_tei and MS + SGSN addr should be deleted, if MS has both IPv4 and IPv6 as it would be the case with type-support v4v6 in osmo-ggsn as I understand). For adding the tunnel it seemed not necessary, because the family would always be the same one as for MS, as I understood from your patches. So I will add it back in a follow-up patch.

02_ms_ip4_sgsn_ip6.sh kernel panic

Actions #7

Updated by pablo 4 months ago

osmith wrote in #note-6:

Hi pablo,

an initial version of IPv6 support has been merged into libgtpnl (the patches had to be reworked to address points brought up in code review): https://gitea.osmocom.org/cellular-infrastructure/libgtpnl/commit/a295615229747e938f14514da14aabd595d8d1b5

Thank you.

It feels a bit strange that you are taking my patches you mangled original author? Looking at the repository, they look as they were done by you, why didn't you use 'git am' to apply patches?

Could you also explain what you mangled from the original patches? Otherwise I have to review all this. Why not simply apply incremental patches on top instead of better traceability?

Sorry, this is not nice.

To better understand how libgtpnl APIs should be used from osmo-ggsn for type-support v4v6, and to make sure that it works as expected without using osmo-ggsn, I've started adding test cases to libgtpnl. The tests build a small initrd with gtp-tunnel, gtp-link, busybox and test shell scripts, and run it in QEMU (so it can run against a kernel built from source).

libgtpnl.git branch: osmith/qemu-tests at https://gerrit.osmocom.org/libgtpnl (browse, also on gitea, but it takes a bit to sync)

[...]

Here are the current test results (see tests/qemu/initrd-init.sh):

[...]

Could you take a look at the kernel panic (see below)? I'm running HEAD of https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/log/ (da3924c0, "gtp: support for IPv4-in-IPv6-GTP and IPv6-in-IPv4-GTP").

For tests 03 and 04, could you check if I'm using it wrong, or if there might be a bug in the kernel?

Thanks!

PS: I found that I probably need to add gtp_tunnel_set_family() again (which was in your patch) to be able to properly delete a tunnel (otherwise, it is not clear if the IPv4 or IPv6 variant of a tunnel with the same i_tei and o_tei and MS + SGSN addr should be deleted, if MS has both IPv4 and IPv6 as it would be the case with type-support v4v6 in osmo-ggsn as I understand). For adding the tunnel it seemed not necessary, because the family would always be the same one as for MS, as I understood from your patches. So I will add it back in a follow-up patch.

I will have to review what you apply to master and compare it to my local branch to review what is going on.

I have to admit I did not test delete tunnel path with my scripts.

{{collapse(02_ms_ip4_sgsn_ip6.sh kernel panic)
[...]
}}

I will take a look at the kernel again, I really tested datapath, this very much means nothing is working which is strange. Maybe I have break anything with my little fixes to make kbuild robot happy. I will come back to you with a more detailed assesment on what is going on.

I will start reviewing with my scripts with containers, it is a lot more lightweight than this qemu environment you have created and they do not require osmo-ggsn, I would have suggested to pick up to those I sent, they are also much easier to integrate into kernel/selftests infrastructure upstream. Well, we can also end up with two test infrastructures, one with qemu and another with kernel/selftests using containers, but I really think we really need kernel/selftests for this submission.

Apologies if my feedback is not pleasant to read, but I have to say a few things I don't like.

Thanks for your work.

Actions #8

Updated by pablo 4 months ago

I suggest you revert what you applied and we start from scratch, apply my patches and then you incrementally make patches on top, writing a description on your intention and rationale, otherwise it is really very hard to track down things. Then I have a chance to discuss with you if what you apply makes also sense to me.

I can see you have also squashed my preparation patches, I really like to split initial preparation patches that make no functional changes, before adding new features. All that was really intentional on my side, it is really the way I have been working for two decades in upstream projects.

I am now looking at the kernel issue you reported, I will keep using my local libgtnl library by now for this kernel testing.

Apologies, previous comment was not very constructive, let's find a way of working that allows us to scrutinize each other work.

Actions #9

Updated by osmith 4 months ago

pablo wrote in #note-7:

It feels a bit strange that you are taking my patches you mangled original author? Looking at the repository, they look as they were done by you, why didn't you use 'git am' to apply patches?

Hello Pablo,

I'm sorry, it was not my intention to upset you with this. I started modifying the patches as they were, adding descriptions of changes I made to each of them, but then it became apparent that I had to rewrite a lot to make them fit what was suggested in code review. I started modifying the patches as they were, using "struct gtp_addr" as requested in code review, and iterating through all the patches, resolving all the conflicts. But it became apparent that the original patches implement two versions, first one where v4-in-v6 and vice versa does not work, and later the implementation gets changed to make it work. After spending a lot of time changing the patches as they were, it seemed to me to make more sense to just squash them together since in the end we only need one implementation. I spent many hours working on the patches, and in the end the resulting patch is very different from the original patches. I changed the primary author to indicate that this is not just a small fixup but rather a rewrite of the patches, but mentioned in the commit message that this is based on your patches, and added you as Co-developed-by.

I thought I was supposed to take care of getting the patches into libgtpnl and change them as needed, and that this would be a valid way to do it. In any case, I'm sure we can figure this out, and if needed, force push patches to master that changes the author back or change the patches as needed.

Could you also explain what you mangled from the original patches? Otherwise I have to review all this. Why not simply apply incremental patches on top instead of better traceability?

I've tried to explain above why I did it that way. What I changed is mostly:
  • using struct gtp_addr
  • no extra ip4/ip6 arguments in gtp-tunnel, instead it is implict from the format of the address (if it can be parsed as ipv4, it is an ipv4)
  • removed gtp_tunnel_set_family as mentioned (but this needs to be brought back, see the "PS" above)

Please do read the patch in its current form, it is not long.

I have to admit I did not test delete tunnel path with my scripts.

Where are your test scripts? (I was not aware that they exist.)

I will start reviewing with my scripts with containers, it is a lot more lightweight than this qemu environment you have created and they do not require osmo-ggsn, I would have suggested to pick up to those I sent, they are also much easier to integrate into kernel/selftests infrastructure upstream. Well, we can also end up with two test infrastructures, one with qemu and another with kernel/selftests using containers, but I really think we really need kernel/selftests for this submission.

The test scripts I wrote don't use osmo-ggsn either, it just tests libgtpnl with gtp-tunnel.

In addition to that there are ttcn3 tests that do use osmo-ggsn, but that is not what I mentioned above.

Apologies if my feedback is not pleasant to read, but I have to say a few things I don't like.

Thanks for your work.

Thank you too!

Actions #10

Updated by pablo 4 months ago

osmith wrote in #note-9:

Hello Pablo,

I'm sorry, it was not my intention to upset you with this. I started modifying the patches as they were, adding descriptions of changes I made to each of them, but then it became apparent that I had to rewrite a lot to make them fit what was suggested in code review. I started modifying the patches as they were, using "struct gtp_addr" as requested in code review, and iterating through all the patches, resolving all the conflicts. But it became apparent that the original patches implement two versions, first one where v4-in-v6 and vice versa does not work and later the implementation gets changed to make it work. After spending a lot of time changing the patches as they were, it seemed to me to make more sense to just squash them together since in the end we only need one implementation. I spent many hours working on the patches, and in the end the resulting patch is very different from the original patches. I changed the primary author to indicate that this is not just a small fixup but rather a rewrite of the patches, but mentioned in the commit message that this is based on your patches, and added you as Co-developed-by.

I understand, thanks for explaining. I missed all that process. I made scripts to test both v4-in-v6 and v6-in-v4, they are all attached. We have used different tickets for this task, so it seems they got lost.

I thought I was supposed to take care of getting the patches into libgtpnl and change them as needed, and that this would be a valid way to do it. In any case, I'm sure we can figure this out, and if needed, force push patches to master that changes the author back or change the patches as needed.

Could you also explain what you mangled from the original patches? Otherwise I have to review all this. Why not simply apply incremental patches on top instead of better traceability?

I've tried to explain above why I did it that way. What I changed is mostly:
  • using struct gtp_addr
  • no extra ip4/ip6 arguments in gtp-tunnel, instead it is implict from the format of the address (if it can be parsed as ipv4, it is an ipv4)
  • removed gtp_tunnel_set_family as mentioned (but this needs to be brought back, see the "PS" above)

thanks for explaining.

Please do read the patch in its current form, it is not long.

yes, patch is small, I will take a look.

I have to admit I did not test delete tunnel path with my scripts.

Where are your test scripts? (I was not aware that they exist.)

Attached. They require two patches for iproute2, which allows for standalone GTP tunnels (with no osmo-ggsn).

Thanks

Actions #11

Updated by laforge 4 months ago

Hi Oliver,

On Fri, Oct 27, 2023 at 02:24:34PM +0000, osmith wrote:

I thought I was supposed to take care of getting the patches into libgtpnl and change them as needed, and that this would be a valid way to do it. In any case, I'm sure we can figure this out, and if needed, force push patches to master that changes the author back or change the patches as needed.

The primary task from my point of view was to perform the testing. The
fact that pablo send diff files by e-mail and didn't follow the (to us
normal) process for code review in gerrit directly could have been
understood in a way to transfer responsibility to you, basically "take
these patches and bring them through code review and upstream, I don't
want to have to deal with gerrit".

In the end, what I as Osmocom project lead and sysmocom evil dictator
care about is that things get done. It is generally my impressin that
Pablo is often busy with other and probably more important tasks (in
terms of netfilter, etc.) and is generally rather remote from what we're
doing in Osmocom (in terms of tools used, code review, etc). That's
fine, and I think that the sysmocom team should do as much as we can to
allow Pablo to focus on those parts that really are in his direct line
of expertise which is the Linux kernel networking code.

I also agree with osmith's rationale to squash the "v6 / v4 separately"
and later change that to support mixed v4/v6. That's an arbitrary
distinction which resulted from the development history. However, that
"v4/v6 separate" is not the way GTP is ever used, so it's an arbitrary
intermediate step, and it's cleaner to introduce it properly at once.

Actions #12

Updated by osmith 4 months ago

  • % Done changed from 90 to 70

Thanks, Harald.

Back to technical discussions:

pablo wrote in #note-10:

Where are your test scripts? (I was not aware that they exist.)

Attached. They require two patches for iproute2, which allows for standalone GTP tunnels (with no osmo-ggsn).

Thanks for attaching the tests, I've tried them out. Setting up the GTP tunnels works with them, but attempting to run iperf3 as in the commented lines does not work for me ("unable to send control message: Bad file descriptor", looks like the IPs there need to be adjusted and routes need to be added? I've tried it for a bit but left it for now).

Regarding the test that caused a kernel panic, it can also be executed without qemu:
Actions #13

Updated by pablo 4 months ago

osmith wrote in #note-12:

Thanks, Harald.

Back to technical discussions:

pablo wrote in #note-10:

Where are your test scripts? (I was not aware that they exist.)

Attached. They require two patches for iproute2, which allows for standalone GTP tunnels (with no osmo-ggsn).

Thanks for attaching the tests, I've tried them out. Setting up the GTP tunnels works with them, but attempting to run iperf3 as in the commented lines does not work for me ("unable to send control message: Bad file descriptor", looks like the IPs there need to be adjusted and routes need to be added? I've tried it for a bit but left it for now).

Regarding the test that caused a kernel panic, it can also be executed without qemu:

gtp-link also needs an update to set up a IPv6 socket for the gtpX device.

This patch would avoid a crash from control plane.

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 2d2f24eb05ca..8698a00708ec 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1780,6 +1780,9 @@ static struct pdp_ctx *gtp_pdp_add(struct gtp_dev *gtp, struct sock *sk,
                    info->attrs[GTPA_MS_ADDR6])
                        return ERR_PTR(-EINVAL);

+               if (sk->sk_family != AF_INET)
+                       return ERR_PTR(-EAFNOSUPPORT);
+
                ms_addr = nla_get_be32(info->attrs[GTPA_MS_ADDRESS]);
                hash_ms = ipv4_hashfn(ms_addr) % gtp->hash_size;
                pctx = ipv4_pdp_find(gtp, ms_addr);
@@ -1789,6 +1792,9 @@ static struct pdp_ctx *gtp_pdp_add(struct gtp_dev *gtp, struct sock *sk,
                    info->attrs[GTPA_MS_ADDRESS])
                        return ERR_PTR(-EINVAL);

+               if (sk->sk_family != AF_INET6)
+                       return ERR_PTR(-EAFNOSUPPORT);
+
                ms_addr6 = nla_get_in6_addr(info->attrs[GTPA_MS_ADDR6]);
                hash_ms = ipv6_hashfn(&ms_addr6) % gtp->hash_size;
                pctx = ipv6_pdp_find(gtp, &ms_addr6);

At this stage, the gtp device can either have an IPv4 address or IPv6 address, ie. two GTP devices would be needed, one for each outer protocol family. In other words, in an environment with both GTP over IPv4 and GTP over IPv6 (regardless the inner protocol family network header), two GTP devices are required.

Is there a need for a single device both GTP IPv4 and IPv6 sockets?

After this patch I am proposing, if gtp0 has an IPv6 socket, then this fails:

gtp-tunnel add gtp0 v1 1 1 ip 10.141.10.2 ip 192.168.10.11

because peer address needs to be IPv6.

Actions #14

Updated by pablo 4 months ago

Actually, fix is this to reject adding a peer address which has different socket family:

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 2d2f24eb05ca..0147939dbd6e 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1826,6 +1826,8 @@ static struct pdp_ctx *gtp_pdp_add(struct gtp_dev *gtp, struct sock *sk,
                        ipv6_pdp_fill(pctx, info);
                        break;
                }
+               if (pctx->peer_af != sk->sk_family)
+                       return ERR_PTR(-EAFNOSUPPORT);

                if (pctx->gtp_version == GTP_V0)
                        netdev_dbg(dev, "GTPv0-U: update tunnel id = %llx (pdp %p)\n",
Actions #15

Updated by osmith 4 months ago

pablo wrote in #note-13:

gtp-link also needs an update to set up a IPv6 socket for the gtpX device.

Thanks for explaining.

This patch would avoid a crash from control plane.

[...]

At this stage, the gtp device can either have an IPv4 address or IPv6 address, ie. two GTP devices would be needed, one for each outer protocol family. In other words, in an environment with both GTP over IPv4 and GTP over IPv6 (regardless the inner protocol family network header), two GTP devices are required.

Is there a need for a single device both GTP IPv4 and IPv6 sockets?

I looked quickly over the related osmo-ggsn code, and it looks like it should be fine to have a different device for IPv4 and IPv6.

But laforge is the expert, maybe he has an important reason why it should use the same device.

After this patch I am proposing, if gtp0 has an IPv6 socket, then this fails:

gtp-tunnel add gtp0 v1 1 1 ip 10.141.10.2 ip 192.168.10.11

because peer address needs to be IPv6.

Sounds good to me.

pablo: as 03_ms_ip6_sgsn_ip4.sh is very similar to 02_ms_ip4_sgsn_ip6.sh, do you happen to see why I wasn't able to ping in 03_ms_ip6_sgsn_ip4.sh? https://gitea.osmocom.org/cellular-infrastructure/libgtpnl/src/branch/osmith/qemu-tests/tests/qemu/03_ms_ip6_sgsn_ip4.sh

Actions #16

Updated by pablo 4 months ago

osmith wrote in #note-15:

pablo wrote in #note-13:

gtp-link also needs an update to set up a IPv6 socket for the gtpX device.

Thanks for explaining.

I am attaching a patch only for gtp-link to make it IPv4/IPv6 aware.

Compile-tested only.

Actions #17

Updated by laforge 4 months ago

On Tue, Oct 31, 2023 at 10:48:15AM +0000, osmith wrote:

I looked quickly over the related osmo-ggsn code, and it looks like it should be fine to have a different tun device for IPv4 and IPv6.

well, both my and your response are technically correct. The 3GPP specs say nothing about "gtp net-devices".

The configuration we must support is a single tunnel (TEID) that carries both v4 and v6 inner IP packets.
Normally, one would expect those to be emerging from the same net-device upon decapsulation. However, if
it simplified things on the implementation side, one could of course let those packets appear on separate
net-devices. The key requirement is that a single tunnel must be able to carry both v4 and v6.

Regards,
Harald

Actions #18

Updated by laforge 4 months ago

Hi Pablo,

On Tue, Oct 31, 2023 at 10:24:36AM +0000, pablo wrote:

At this stage, the gtp device can either have an IPv4 address or IPv6 address, ie. two GTP devices would be needed, one for each outer protocol family. In other words, in an environment with both GTP over IPv4 and GTP over IPv6 (regardless the inner protocol family network header), two GTP devices are required.

Is there a need for a single device both GTP IPv4 and IPv6 sockets?

From the 3GPP usage point of view, there are the following three options, all of which are deployed in
production for many years:

  • IPv4-only APN
    • The tunnel gets allocated a single IPv4 address only
    • All user-plane IP packets are IPv4
  • IPv6-only APN
    • The tunnel gets allocated and IPv6 prefix (/64) only
    • All user-plane IP packets are IPv6
  • IPv4v6 APN
    • The tunnel gets allocated both an IPv4 address and and IPv6 prefix (/64)
    • user-plane IP packets are alternating v4 and v6. Basically just like any normal dual-stacked host on the
      internet today
    • this is actually the default bearer type since the introduction of LTE in 3GPP Release 8 in 2010

So I think the answer is yes, as single GTP device / socket must support a combination of both AF as inner
protocol.

See also https://www.ietf.org/proceedings/77/slides/v6ops-10.ppt for a high-level intro that was apparently
presented at an IETF meeting long ago

Actions #19

Updated by pablo 4 months ago

osmith wrote in #note-15:

pablo: as 03_ms_ip6_sgsn_ip4.sh is very similar to 02_ms_ip4_sgsn_ip6.sh, do you happen to see why I wasn't able to ping in 03_ms_ip6_sgsn_ip4.sh? https://gitea.osmocom.org/cellular-infrastructure/libgtpnl/src/branch/osmith/qemu-tests/tests/qemu/03_ms_ip6_sgsn_ip4.sh

I suspect it is the same issue, showing up in a different way.

static int __gtp_build_skb_ip6(struct net *net, struct sk_buff *skb,             
                               struct net_device *dev,                           
                               struct gtp_pktinfo *pktinfo,                      
                               struct pdp_ctx *pctx, __u8 tos)                   
{                                                                                
        struct dst_entry *dst;                                                   
        struct rt6_info *rt;                                                     
        struct flowi6 fl6;                                                       
        int mtu;                                                                 

        rt = ip6_route_output_gtp(net, &fl6, pctx->sk, &pctx->ip6.peer_addr,     
                                  &inet6_sk(pctx->sk)->saddr);   

Kernel drivers uses the socket address to look up for the route when building the IPv6 GTP packet, see inet6_sk(), so kernel assume an IPv6 socket is in place.

But gtp-link is setting an IPv4 socket from userspace.

I can try to reproduce it here.

Actions #20

Updated by pablo 4 months ago

laforge wrote in #note-18:

So I think the answer is yes, as single GTP device / socket must support a combination of both AF as inner
protocol.

OK, I am going to explore a way to attach two sockets from netlink interface POV.

Actions #21

Updated by laforge 4 months ago

On Tue, Oct 31, 2023 at 11:18:54AM +0000, pablo wrote:

OK, I am going to explore a way to attach two sockets from netlink interface POV.

sorry, it's been ages since I looked at this code in detail. But why would we need two different
sockets? Maybe we should schedule a quick call with osmith, pespin, yourself and me? At least
I'm flexible today.

Actions #22

Updated by osmith 4 months ago

Notes from the call we did:

  • overloading one device with more sockets is not needed
  • next step is getting ipv4 and ipv6 traffic working at all
  • then the kernel code needs to be adjusted to match on the /64 netmask prefix for ipv6, as each user gets a specific prefix

IIRC Pablo said he would look into the kernel code some more and then notify us.

Actions #23

Updated by pablo 4 months ago

Thanks for the summary. I'll pick up asap, on Friday I could hit EINVAL when adding tunnels, I need to follow up on this and debug the issue.

Actions #24

Updated by pablo 4 months ago

Hi!

I have just refreshed the kernel GTP driver for IPv4-over-IPv6, IPv6-over-IPv6 and IPv4-over-IPv6.

Tree is available here:

https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/

This round:

- Fixes a crash due to missing sanity check on the socket type, before it was possible to attach an IPv4 socket (which determines the outer header) to a mismatching IPv6 GTP tunnel.
- Fixes a silly union I made at the very beginning of the previous series when preparing for the IPv6 support, resulting in proble, when outer and inner header families were different.
- Added a few default: cases to switches with WARN_ON_ONCE(1) and properly set error, just in case.
- Remove pctx->peer_af, use the socket family which already determines the outer header family. Outer IP header and socket families need to be the same as we discussed, that is, a GTP device supports a single outer header family, but inner headers can be either IPv4 or IPv6.

This also requires a patch for libgtnl/gtp-link to support for IPv6 sockets that is attached to this report!

This time, I tested this with gtp-link (not iproute standalone more). For datapath, I have generated basic icmp echo/request packets and TCP traffic with iperf3.

I use this firewall policy to clamp TCP MSS to path MTU on the routers:

table inet filter {
        chain forward {
                type filter hook forward priority 0; policy accept;
                tcp flags syn tcp option maxseg size set rt mtu
        }
}
Actions #25

Updated by pablo 4 months ago

Here are my test scripts. They live under libgtpnl/tools/ in my test box, so they can invoke ./gtp-link and ./gtp-tunnel.

Actions #26

Updated by laforge 3 months ago

pablo wrote in #note-24:

Hi!

I have just refreshed the kernel GTP driver for IPv4-over-IPv6, IPv6-over-IPv6 and IPv4-over-IPv6.

excellent, thanks.

I had a very brief review of the patches. I think it still does a /128 compare for IPv6 addresses, right? the two memcmp in gtp_check_ms_ipv6 and ipv6_pdp_find still use sizeof(struct in6_addr). As discussed, we need a /64 prefix match.

3GPP TS 29.060:

PDN Connection: the association between a MS represented by one IPv4 address and/or one IPv6 prefix and a PDN represented by an APN.

so this clearly states that IPv4 is a single address while IPv6 is a single prefix.

3GPP TS 29.061, Section 11.2.1.3:

For APNs that are configured for IPv6 address allocation, the GGSN/P-GW shall only use the Prefix part of the IPv6 address for forwarding of mobile terminated IP packets. The size of the prefix shall be according to the maximum prefix length for a global IPv6 address as specified in the IPv6 Addressing Architecture, see RFC 4291 [82].

RFC 4291 section 2.5.4 states

All Global Unicast addresses other than those that start with binary 000 have a 64-bit interface ID field (i.e., n + m = 64) ...

3GPP TS 29.61 Section 11.2.1.3.2a:

In the procedure in the cases of using GTP-based S5/S8, P-GW acts as an access router, and allocates to a UE a globally unique /64 IPv6 prefix if the PLMN allocates the prefix. ...

The situation in IPv6 beyond that is unfortunately is rather complex. My memory is blurred as I implemented it many years ago (2017) in osmo-ggsn.

The point is that
  • on the signaling plane, the UE (== MS) gets allocated a /64 prefix
  • the MS/UE may send a router-solicitation to the GGSN/PGW
  • the GGSN/PGW answers with a router advertisement containing that /64 prefix

The router-solicitation/advertisement is using link-local addresses. In order to avoid address conflicts, the interface identifier used by the MS/UE is set to the /64 prefix(!).

So initially, the UE uses a link-local address with the lower 64 bits (interface identifier) set to the /64 prefix, and later on (assigned via router advertisement) the UE uses the /64 prefix as upper 64 bits and uses a random lower 64 bits (interface identifier).

In theory, of course, also at any later point during the PDP context, the UE and/or GGSN may need to send packets using the link-local addresses.

See 3GPP TS 29.061 11.2.1.3.2

The attached pcap file GGSN_Tests.TC_pdp6_act_deact_gtpu_access.pcap.gz (generated by our test suite, last artefact can always be found at https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-ggsn-test/lastSuccessfulBuild/artifact/logs/ggsn-tester/GGSN_Tests.TC_pdp6_act_deact_gtpu_access.pcap.gz) you can see

Packet Protocol Interpretation
52 GTP-C control plane allocates 2001:780:44:2004/64 to the UE
54 GTP-U UE sends router solicitation from fe80:2001:780:44:2004/128 to ff02::2
56 GTP-U GGSN sends router advertisement from its own link-local addr to fe80:2001:780:44:2004/128
57 GTP-U UE sends neighbor solicitation from fe80:2001:780:44:2004
63+66 GPT-U UE exchanges normal IPv6 traffic from any random source within 2001:780:44:2004/64
You can find the related code in osmo-ggsn handling this in
Actions #27

Updated by laforge 3 months ago

just thinking more about this in terms of expectations to the kernel GTP IPv6 support:

  • all-routers-multicast destination should not be handled in the kernel at all (neither inside the normal kernel ICMPv6 code, nor in the GTP module. It should be passed to the UDP socket of the control plan, so it can handle the 3GPP-specific interpretation of those packets
  • link-local source address with lower-64bits matching a PDP context should also pass into the control plane via the UDP socket
  • only the source address upper 64 bit prefix matching a PDP context should be handled inside the kernel path for decapsulation (like we do in the IPv4 path
Passing those other packets up to the control plane via the UDP socket makes sense, as * this is what we do with any packets not handled inside the kernel.
  • we cannot let the normal kernel ICMPv6 code handle them (as it would send wrong responses as it doesn't know the per-UE/PDP specific prefixes)
  • we don't want to put all this logic into the kernel either; different control planes might want to handle this differently. Technically it's possible to do it in the kernel, if somebody thinks this has some advantage. I can't really think of any right now.
Actions #28

Updated by laforge 3 months ago

One more requirement for the combined IPv4/v6 support.

  • it is correct that one Socket (and hence gtpX net-device) can remain bound to one specific address family
  • however, a PDP context can not just be v4 or v6, but also 'v4v6'. The latter is (as stated earlier) the default in modern cellular networks.

This means that within one TEID we might have IPv4 and IPv6 in alternating fashion.

There are two possible ways to represent this to the netlink user (control plane):
  1. allow a single tunnel / TEID to have both a (inner) v4 address and a (inner) v6 prefix, not just one of the two, or
  2. allow the creation of two tunnels with identical TEID but different address families

From the user/application point of view the first approach would be more convenient, but I undertand that it might be less invasive to the kernel code to do the second variant. Either way is fine in terms of achieving the desired functionality.

Actions #29

Updated by pablo 3 months ago

laforge wrote in #note-26:

pablo wrote in #note-24:

Hi!

I have just refreshed the kernel GTP driver for IPv4-over-IPv6, IPv6-over-IPv6 and IPv4-over-IPv6.

excellent, thanks.

I had a very brief review of the patches. I think it still does a /128 compare for IPv6 addresses, right? the two memcmp in gtp_check_ms_ipv6 and ipv6_pdp_find still use sizeof(struct in6_addr). As discussed, we need a /64 prefix match.

Yes, this is pending. I am going to digest this and get back to you with a patch (or questions if any before such patch), thanks!

Actions #30

Updated by pablo 3 months ago

laforge wrote in #note-28:

One more requirement for the combined IPv4/v6 support.

  • it is correct that one Socket (and hence gtpX net-device) can remain bound to one specific address family
  • however, a PDP context can not just be v4 or v6, but also 'v4v6'. The latter is (as stated earlier) the default in modern cellular networks.

This means that within one TEID we might have IPv4 and IPv6 in alternating fashion.

There are two possible ways to represent this to the netlink user (control plane):
  1. allow a single tunnel / TEID to have both a (inner) v4 address and a (inner) v6 prefix, not just one of the two, or
  2. allow the creation of two tunnels with identical TEID but different address families

From the user/application point of view the first approach would be more convenient, but I undertand that it might be less invasive to the kernel code to do the second variant. Either way is fine in terms of achieving the desired functionality.

Thanks! I will pick up on this task too.

Actions #31

Updated by pespin 3 months ago

Hi, just adding my five cents here to clarify something:

on the signaling plane, the UE (== MS) gets allocated a /64 prefix
the MS/UE may send a router-solicitation to the GGSN/PGW
the GGSN/PGW answers with a router advertisement containing that /64 prefix

According to Harald's writing above, I have the impression he expresses the fact that the 64 prefix initially allocated to do the IPv6 SLAAC procedure is the same one being passed to the UE as the final prefix to be used after SLAAC. This is afaiu incorrect: The two prefixes don't have any requirement/restriction to be the same; The prefix initially allocated to the SLAAC procedure can be different that the one assigned later one during SLAAC.

We actually had this sort of problem in our implementations in the past and got eventually fixed: https://gitea.osmocom.org/cellular-infrastructure/osmo-ggsn/commit/04715d284f5abc6bce6cd6401c5700e3031662de
The fact that osmo-ggsn uses the same prefix for both is just an implementation detail.
See the commit link shared above how actually talks about the fact of 3 IP address entries required to match the 1 IPv4 and 2 IPv6 addresses, as opposted to the 2 entries described by Harald previously:

There are two possible ways to represent this to the netlink user (control plane):

    allow a single tunnel / TEID to have both a (inner) v4 address and a (inner) v6 prefix, not just one of the two, or

Actions #32

Updated by pablo 3 months ago

Hi Pau,

pespin wrote in #note-31:

See the commit link shared above how actually talks about the fact of 3 IP address entries required to match the 1 IPv4 and 2 IPv6 addresses, as opposted to the 2 entries described by Harald previously:
[...]

Do you mean that this needs to represent the following in the netlink control plane?

IPv4 + TEID
IPv6 /prefix + TEID
IPv6 link-local /prefix + TEID

I was expecting that SGSN allocates the IP address as described above, so no SLAAC is seen in the tunnel.

Commit above says GGSN is responsible to reply to SLAAC request?

* link-local: "fe80::IfId", where IfId is the Interface-Identifier
  received during Pdp Context Resp and which can be used to communicate
  with the nearest router (the GGSN).
Actions #33

Updated by laforge 3 months ago

SLAAC happens between UE and GGSN. SGSN is just relaying messages from left to right and vice versa.

Actions #34

Updated by pablo 3 months ago

OK, then link-local traffic is seen in the tunnel. This confirms up to three entries (one for IPv4 and at least two for IPv6) are needed.

Actions #35

Updated by pablo 3 months ago

There are two possible ways to represent this to the netlink user (control plane):

    1. allow a single tunnel / TEID to have both a (inner) v4 address and a (inner) v6 prefix, not just one of the two, or
    2. allow the creation of two tunnels with identical TEID but different address families

I will go for option number 2, it is more flexible since it also covered for the SLAAC case.

Actions #36

Updated by pespin 3 months ago

Hi Pablo,

pablo wrote in #note-32:

Do you mean that this needs to represent the following in the netlink control plane?

IPv4 + TEID
IPv6 /prefix + TEID
IPv6 link-local /prefix + TEID

I never really used nor looked into the gtp tun module (I'm doing it soon these days since I need to set it up in osmo-epdg), so I can't tell for sure right now, I may be saying dummy things starting from here. My understanding is that the gtp kernel module offloads regular user data traffic and only forwards a handful of selected packets to the fd which the controlling userspace app (SGSN/GGSN) can read and process?
If that's the case, then yes I guess you need to forward the "IPv6 link-local /prefix + TEID" to the GGSN userspace process. But my point is, bear in mind this /prefix is not necessarily the same as the one in "IPv6 /prefix + TEID" case. Because I guess you don't want to forward ICMPv6 for link local address outside of the tunnel? Or we want to inject it into the regular routing stack and then handle it using some sort of RAW ICMP socket?

So it goes more or less this way:
  • UE/SGSN -> GGSN sends CreatePDPContextReq with type IPv6
  • UE/SGSN <- GGSN sends CreatePDPContextResp with an 128bit IPv6 address consisting of first 64 bits of trash and 64 bits containing the Interface-Identifier (need to check in sgsnemu/osmo-ggsn whether the first or last 64 bits are the trash).
  • UE/SGSN -> GGSN sends over the GTPU tunnel an ICMPv6 Router Soliciation with src link-local address fe80:: + Interface-Identifier.
  • UE/SGSN <- GGSN sends over the GTPU tunnel an ICMPv6 Router Advertisement back to fe80:: + Interface-Identifier, containing "ICMPv6 Option" "Prefix information" of 64 bits. This "Prefix information" may be different that the Interface-Identifier used before.
  • UE/SGSN <-> GGSN transmit data over GTPU tunnel for that pdp context using IPv6 address consisting of first 64 bits "Prefix information" and then 64 bits can be any value.

As Harald mentioned, you can see it in actions here: https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-ggsn-test/lastSuccessfulBuild/artifact/logs/ggsn-tester/GGSN_Tests.TC_pdp6_act_deact_gtpu_access.pcap.gz
(wireshark filter: "!tcp && !gsmtap_log"). Our GGSN implementation there is returning a 128bit IPv6 address consisting of the 64bit Interface-Identifier copied twice, but one of them is actually trash and not really used.

Actions #37

Updated by laforge 3 months ago

Hi Pablo,

On Tue, Nov 28, 2023 at 11:48:09AM +0000, pespin wrote:

My understanding is that the gtp kernel module offloads regular user data traffic and only forwards a handful of selected packets to the fd which the controlling userspace app (SGSN/GGSN) can read and process?

It's technically the other way around: It treats those packets it can treat in the kernel and lets
everything else pass normally to userspace via the udp socket. So basically the UDP socket doesn't see
all packets anymore.

If that's the case, then yes I guess you need to forward the "IPv6 link-local /prefix + TEID" to the GGSN userspace process.

That is exactly what I proposed in #1952#note-27 12 days ago. Happy to hear we agree.

Actions #38

Updated by pablo 3 months ago

laforge wrote in #note-37:

On Tue, Nov 28, 2023 at 11:48:09AM +0000, pespin wrote:

If that's the case, then yes I guess you need to forward the "IPv6 link-local /prefix + TEID" to the GGSN userspace process.

That is exactly what I proposed in #1952#note-27 12 days ago. Happy to hear we agree.

OK, then I will update the GTP driver to pass up this IPv6 link-local packets to userspace :-)

Actions #39

Updated by pablo about 1 month ago

pablo wrote in #note-38:

OK, then I will update the GTP driver to pass up this IPv6 link-local packets to userspace :-)

Attaching patch to pass up IPv6 link-local packets to userspace.

Actions #40

Updated by pablo about 1 month ago

pablo wrote in #note-39:

Attaching patch to pass up IPv6 link-local packets to userspace.

I am checking this from gtp_check_ms_ipv6(), which already passes up packets to the userspace daemon in case that a MS does not have a PDP context in place.

Actions #41

Updated by pablo 30 days ago

laforge wrote in #note-26:

the two memcmp in gtp_check_ms_ipv6 and ipv6_pdp_find still use sizeof(struct in6_addr). As discussed, we need a /64 prefix match.

Patch to compare /64 prefix in IPv6 is attached.

This rejects PDP context if lower 64 bits of the IPv6 address are set with EOPNOTSUPP, just in case there is a need to support different prefix length in the future, to detect older kernels.

Actions #42

Updated by laforge 30 days ago

Hi pablo, both the v6-prefix and the link-local patches look fine to me.

To avoid having to collect individual patches from an issue tracker:
Do you have a branch somehwere with the entire series of GTPv6 patches? https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/ doesn't have any updates since November 15, AFAICT.

Actions #43

Updated by pablo 30 days ago

laforge wrote in #note-28:

One more requirement for the combined IPv4/v6 support.

[...]

This means that within one TEID we might have IPv4 and IPv6 in alternating fashion.

There are two possible ways to represent this to the netlink user (control plane):
  1. allow a single tunnel / TEID to have both a (inner) v4 address and a (inner) v6 prefix, not just one of the two, or
  2. allow the creation of two tunnels with identical TEID but different address families

From the user/application point of view the first approach would be more convenient, but I undertand that it might be less invasive to the kernel code to do the second variant. Either way is fine in terms of achieving the desired functionality.

I went for your second choice.

This requires from userspace to define two PDP context objects, one for each family. I think it is better if userspace deals with this complexity rather than extending the kernel representation.

Actions #44

Updated by pablo 30 days ago

laforge wrote in #note-42:

Hi pablo, both the v6-prefix and the link-local patches look fine to me.

To avoid having to collect individual patches from an issue tracker:
Do you have a branch somehwere with the entire series of GTPv6 patches? https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/ doesn't have any updates since November 15, AFAICT.

All of the three patches (also the one I just posted to allow to use same TEID for IPv4 and IPv6) that I posted today are compile-tested only.

I am going to give them a test, then push them out to:

https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/

I will report once this done.

Actions #45

Updated by pablo 30 days ago

pablo wrote in #note-44:

laforge wrote in #note-42:

Hi pablo, both the v6-prefix and the link-local patches look fine to me.

To avoid having to collect individual patches from an issue tracker:
Do you have a branch somehwere with the entire series of GTPv6 patches? https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/ doesn't have any updates since November 15, AFAICT.

Last update happened at that time Nov 15 (see Note 25 in this ticket). Then, from Note 26 on it includes the three new requirements that I just made patches for.

I believe all pending requirements are addressed by my patches today.

Thanks.

Actions #46

Updated by pablo 29 days ago

Patches have been pushed out here as requested:

https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/

I had to fix a few glitches in the patches I posted here I detected with testing:

- Check for s6_addr32\[3\] and s6_addr32\[4\] to be zero, should be instead s6_addr32\[2\] and s6_addr32\[3\] respectively.
- update ipv6_hash() function to only consider the IPv6 prefix, otherwise PDP context lookup from packet path fails.

I could also test control plane to add two gtp devices (hence sockets, one IPv4 and another IPv6 respectively). Then, add tunnels with the same TEID. I only tested control plane path, I would need to extend my tested to have a dual stack scenario (I have simple container tests for IPv4, IPv6, IPv4 over IPv6 and IPv6 over IPv4 but not for both IPv4 AND IPv6).

I found two issues:
- gtp-link for IPv6 listens to both IPv4 and IPv6 because IPV6_V6ONLY is off by default these days. So far we discussed that two sockets are required for a setup with both IPv4 and IPv6 traffic (dual stack), but probably it is possible to support for this with a single gtp device and socket by checking from sk->sk_ipv6only from the kernel. I still have to check if this will work from GTP tx and rx packet path too. Currently control plane rejects this since the IPv6 support patch from Q4 2023 I made, I can revisit this.
- gtp-tunnel del command, which reports ENOENT, I suspect it is a userspace issue. I remember Oliver mentioned this already.

Actions #47

Updated by pablo 29 days ago

pablo wrote in #note-46:

- gtp-link for IPv6 listens to both IPv4 and IPv6 because IPV6_V6ONLY is off by default these days. So far we discussed that two sockets are required for a setup with both IPv4 and IPv6 traffic (dual stack), but probably it is possible to support for this with a single gtp device and socket by checking from sk->sk_ipv6only from the kernel. I still have to check if this will work from GTP tx and rx packet path too. Currently control plane rejects this since the IPv6 support patch from Q4 2023 I made, I can revisit this.

This is complicated. Because there is a setsockopt ADDRFORM that allows to downgrade a IPv4-mapped-listener socket to IPv4 only after the socket is already bound, this even breaks the assumption that a IPv6 tunnel is always attached to a IPv6 listener socket.

I made a kernel patch to ensure that IPv6 socket that is received from userspace is always IPv6-only, see attachment.

Actions #48

Updated by pablo 29 days ago

pablo wrote in #note-46:

- gtp-tunnel del command, which reports ENOENT, I suspect it is a userspace issue. I remember Oliver mentioned this already.

This is the list of patches for libgtpnl I have here on top of git HEAD:

- Tunnel deletion now works fine. This requires gtp_tunnel_set_family() to set the family, since tunnel is identified by: gtp device + tunnel identifier + family (now it is possible to define tunnels with same TID/TEID for better integration with dualstack MS/UE). This also requires a flag field in gtp_tunnel object in userspace to reuse the existing build helper.
- gtp tunnel listing shows the gtp device.
- toggle IPv6 socket only, otherwise kernel rejects this IPv6 socket (in most distros AF_INET6 enabled IPv4-mapped-IPv6 socket allows for both IPv4 and IPv6 connections), see /proc/sys/net/ipv6/bindv6only is set to 0 in Debian.

Actions #49

Updated by pablo 29 days ago

pablo wrote in #note-47:

I made a kernel patch to ensure that IPv6 socket that is received from userspace is always IPv6-only, see attachment.

Two more patches have been pushed out to:

https://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp.git/

Actions #50

Updated by osmith 15 days ago

Hi Pablo,

thank you very much for the updated patches, and sorry that it took me some time to get back to this.

I have tried the updated kernel tree and libgtpnl patches.

01_ms_ip4_sgsn_ip4.sh is passing again.

02_ms_ip4_sgsn_ip6.sh almost passes, traffic flows through the GTP tunnel, only in the teardown it fails here (find full test output below):

+ /bin/ip addr del fd00::1/7 dev veth_sgsn
Error: ipv6: address not found.
These two tests don't pass:
  • 03_ms_ip6_sgsn_ip4.sh
  • 04_ms_ip6_sgsn_ip6.sh

For both I get the following error, find the full log below.

+ gtp-tunnel add gtp_sgsn v1 200 100 fc00::1 fd00::2
genl_socket_talk: Cannot assign requested address

I'm not sure if it is a problem with my tests, or if I'm triggering an unexpected code path again.

The tests are here:
https://gitea.osmocom.org/cellular-infrastructure/libgtpnl/src/branch/osmith/wip/tests/qemu

Your workflow seems to be to just boot the kernel on a test box and then run test code. You could directly run these tests if you put gtp-tunnel and gtp-link in PATH.

I run the tests in qemu by selecting the test to run in tests/qemu/initrd-init.sh, and then:

$ cp bzImage tests/qemu/_linux
$ ./configure --enable-qemu-tests
$ make
$ make check

I've built a kernel from your tree with make defconfig and then enabled via menuconfig:

CONFIG_GTP=y
CONFIG_NET_NS=y
CONFIG_VETH=y

Regarding the libgtpnl patches, there seems to be a bug in gtp-link.c: "gtp-link del" shouldn't require 4 arguments (argc < 4).

Regarding your tests, I could try to run them as well (and use them to narrow down why my tests don't work), but I would need some more information:
  • are the tests up-to-date? they seem to have e.g. old syntax for gtp-tunnel
  • are the iputils2 patches still required (https://osmocom.org/issues/1952#note-10)? for the purpose of testing libgtpnl.git, I think it would be preferably to just use gtp-link and gtp-tunnel instead, so we know these tools and the related libgtpnl code they use, works as expected
  • do you remove the comments infront of some of the commented out lines when running the tests?

Full test outputs:

output of 02_ms_ip4_sgsn_ip6.sh

output of 03_ms_ip6_sgsn_ip4.sh

output of 04_ms_ip6_sgsn_ip6.sh

Actions #51

Updated by pablo 15 days ago

osmith wrote in #note-50:

Hi Pablo,

thank you very much for the updated patches, and sorry that it took me some time to get back to this.

no issue!

I have tried the updated kernel tree and libgtpnl patches.

01_ms_ip4_sgsn_ip4.sh is passing again.

02_ms_ip4_sgsn_ip6.sh almost passes, traffic flows through the GTP tunnel, only in the teardown it fails here (find full test output below):

+ /bin/ip addr del fd00::1/7 dev veth_sgsn
Error: ipv6: address not found.

It looks like an issue in your tests.

tunnel_stop() invokes: ip netns del ggsn_side

and tunnel_stop is called before: ip addr del fd00::1/7 dev veth_sgsn

+ tunnel_stop
+ killall gtp-link
+ /bin/ip addr del 172.99.0.1/32 dev lo
+ /bin/ip link set veth_sgsn down
+ /bin/ip addr del fd00::1/7 dev veth_sgsn
Error: ipv6: address not found.

And note that:

ip link add veth_sgsn type veth peer name veth_ggsn

and veth_sgns depends on the peer veth_ggsn, which is now gone.

Could you try to remove this address before the veth_ggsn is destroyed?

IIRC, the host size veth goes away as soon as the container is removed, since the peer veth on the container is gone, the kernel releases the veth as soon because the host veth now points nowhere.

And again, IIRC, the kernel uses a wait queue (see netdev_unregistering_wq), and it wakes up this kernel thread when there is garbage netdev interfaces, which might explain why you don't see this error earlier. So this looks like a race in the test scripts?

These two tests don't pass:
  • 03_ms_ip6_sgsn_ip4.sh
  • 04_ms_ip6_sgsn_ip6.sh

For both I get the following error, find the full log below.

+ gtp-tunnel add gtp_sgsn v1 200 100 fc00::1 fd00::2
genl_socket_talk: Cannot assign requested address

I'm not sure if it is a problem with my tests, or if I'm triggering an unexpected code path again.

I forgot to mention: ms address now needs to be IPv6 /64 prefix, ie:

+ gtp-tunnel add gtp_sgsn v1 200 100 fc00:: fd00::2

otherwise this hits EADDRNOTAVAIL, see:

commit 7bdd525d85db58862fb6bf52191af3b59490697d
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Jan 30 11:24:24 2024 +0100

    gtp: use IPv6 address /64 prefix for UE/MS

and

commit dac0c642ed0f96d0b150bb5f6b1ea2e556c37f31 (HEAD -> main, orig
in/main)
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Wed Jan 31 19:35:51 2024 +0100

    gtp: report EADDRNOTAVAIL if userspace provides a non-IPv6 /64 
prefix for MS/UE

    Provide a different error reporting, in case userspace provides
 a
    non-IPv6 /64 prefix.

    Squash this patch to ("gtp: use IPv6 address /64 prefix for UE/
MS")

The tests are here:
https://gitea.osmocom.org/cellular-infrastructure/libgtpnl/src/branch/osmith/wip/tests/qemu

Your workflow seems to be to just boot the kernel on a test box and then run test code. You could directly run these tests if you put gtp-tunnel and gtp-link in PATH.

I run the tests in qemu by selecting the test to run in tests/qemu/initrd-init.sh, and then:
[...]

I've built a kernel from your tree with make defconfig and then enabled via menuconfig:
[...]

Regarding the libgtpnl patches, there seems to be a bug in gtp-link.c: "gtp-link del" shouldn't require 4 arguments (argc < 4).

Looking at this, let me get back to you.

I will following up in a different thread with a bit more details on my test scripts that could be using for kselftests.

Actions #52

Updated by pablo 15 days ago

osmith wrote in #note-50:

Regarding the libgtpnl patches, there seems to be a bug in gtp-link.c: "gtp-link del" shouldn't require 4 arguments (argc < 4).

Fix for git-link del, this needs to be collapsed to: ("gtp-link: add IPv6 support"), attached.

I can post a full series with this patch collapsed where it belongs, on top of current libgtpnl git HEAD.

Actions #53

Updated by pablo 15 days ago

osmith wrote in #note-50:

Regarding your tests, I could try to run them as well (and use them to narrow down why my tests don't work), but I would need some more information:
  • are the tests up-to-date? they seem to have e.g. old syntax for gtp-tunnel

Tests are up to date and running fine here.

  • are the iputils2 patches still required (https://osmocom.org/issues/1952#note-10)? for the purpose of testing libgtpnl.git, I think it would be preferably to just use gtp-link and gtp-tunnel instead, so we know these tools and the related libgtpnl code they use, works as expected

Not needed.

  • do you remove the comments infront of some of the commented out lines when running the tests?

No, they can be removed. They are distracting leftovers. Although, last two lines give you a hint on to use iperf from ns1 and ns2 (the client and server that send traffic that uses nsr1 and nsr2 as routers (sgsn and ggsn respectively).

These scripts work for me almost out-of-the-box. I copied them to libgtnl/tools/, then run the script and from ''ns1'':

I have to do a few manual things to generate traffic:

ip netns exec ns1 ping X.X.X.X

where X.X.X.X is the IP address in ns2, to check ICMP traffic. Type 'exit' to leave ns1 container.

Then, run iperf server in ns2:

term2% ip netns exec ns2 iperf3 -s

and in a different terminal, run iperf client traffic generator:

term1% ip netns exec ns1 iperf3 -c 192.168.10.2 -n 100G

to test TCP traffic, where -n is the amount of traffic to generate. I have to press ctrl-c to stop it.

To automate this, it should be possible to check for the ping output results. With iperf, using -t (to limit maximum time to send data) in the client (ns1), and -D in the server (ns2).

It would be great to use this to include kselftests in the submission. Maybe extend the nftables ruleset to match for GTP traffic (simple rule with UDP and port), then check counters in the ruleset.

See linux/tools/testing/selftests/ for kselftest, there is one folder for netfilter.

Actions #54

Updated by pablo 15 days ago

pablo wrote in #note-53:

osmith wrote in #note-50:

  • are the tests up-to-date? they seem to have e.g. old syntax for gtp-tunnel

Tests are up to date and running fine here.

They seem to work fine, this is what I have here:

938935a97c0e gtp-tunnel: display i_tei in help
5c9229cf820d gtp-genl: display gtp device in listing
4491a56efd8f gtp: add flags to gtp_tunnel object
3198f94d674e gtp: provide interface to set family
bd439562c7f4 gtp-link: set IPv6 socket only
ad947bd1e598 gtp-link: add IPv6 support
f3253354e36c (origin/master) tools/gtp-tunnel: pass rc of gtp_add_tunnel

Attaching the entire series that applies on top of: f3253354e36c (origin/master) tools/gtp-tunnel: pass rc of gtp_add_tunnel, I have just rebased, it seems I was behind git HEAD.

Actions #55

Updated by pablo 14 days ago

pablo wrote in #note-53:

See linux/tools/testing/selftests/ for kselftest, there is one folder for netfilter.

Hm.

One issue when integrating this into kselftests is the availability of a fresh libgtnl version and the gtp-link and gtp-tunnel tools...

Another possibility can be to make ad-hoc libmnl programs to add gtp links and tunnels that can compile in the kernel tree, so this integrates into kselftest.

There is some recent progress to have NetDev tree CI:

https://netdev.bots.linux.dev/status.html

but to be honest, I have no idea how that works and how to request for GTP to be covered. I don't seem to find the netfilter tests on that page.

Actions #56

Updated by osmith 14 days ago

Thank you for the detailed feedback!

pablo wrote in #note-51:

02_ms_ip4_sgsn_ip6.sh almost passes, traffic flows through the GTP tunnel, only in the teardown it fails here (find full test output below):

+ /bin/ip addr del fd00::1/7 dev veth_sgsn
Error: ipv6: address not found.

It looks like an issue in your tests.

tunnel_stop() invokes: ip netns del ggsn_side

and tunnel_stop is called before: ip addr del fd00::1/7 dev veth_sgsn

+ tunnel_stop
+ killall gtp-link
+ /bin/ip addr del 172.99.0.1/32 dev lo
+ /bin/ip link set veth_sgsn down
+ /bin/ip addr del fd00::1/7 dev veth_sgsn
Error: ipv6: address not found.

The "ip addr del" line is part of "tunnel_stop", the shell (sh -x) prints "+ tunnel_stop" to indicate that it is now executing this function, and then each line it executes from this function. So this is not resolved yet, if you have another pointer it would be great.

And note that:

[...]

and veth_sgns depends on the peer veth_ggsn, which is now gone.

Could you try to remove this address before the veth_ggsn is destroyed?

IIRC, the host size veth goes away as soon as the container is removed, since the peer veth on the container is gone, the kernel releases the veth as soon because the host veth now points nowhere.

And again, IIRC, the kernel uses a wait queue (see netdev_unregistering_wq), and it wakes up this kernel thread when there is garbage netdev interfaces, which might explain why you don't see this error earlier. So this looks like a race in the test scripts?

I see it consistently failing with ip6, and never with ip.

These two tests don't pass:
  • 03_ms_ip6_sgsn_ip4.sh
  • 04_ms_ip6_sgsn_ip6.sh

For both I get the following error, find the full log below.

+ gtp-tunnel add gtp_sgsn v1 200 100 fc00::1 fd00::2
genl_socket_talk: Cannot assign requested address

I'm not sure if it is a problem with my tests, or if I'm triggering an unexpected code path again.

I forgot to mention: ms address now needs to be IPv6 /64 prefix, ie:

+ gtp-tunnel add gtp_sgsn v1 200 100 fc00:: fd00::2

Thanks a lot, I've adjusted the tests 03 and 04, and they pass now!

pablo wrote in #note-52:

osmith wrote in #note-50:

Regarding the libgtpnl patches, there seems to be a bug in gtp-link.c: "gtp-link del" shouldn't require 4 arguments (argc < 4).

Fix for git-link del, this needs to be collapsed to: ("gtp-link: add IPv6 support"), attached.

Thanks, the fix works. I have rebased my patch on top of your latest attached patches, squashed the gtp-link del fix as instructed, and submitted them to gerrit:

Also thanks for the instructions on how to use your tests. It would be great to have them in kselftests. Not sure what the most elegant way to do it is, but maybe it is feasible once a new libgtpnl version is released that has the new gtp-link and gtp-tunnel tools? (But FWIW, I think this is out-of-scope for the issue at hand... at least we have the tests for libgtpnl + kernel, and the ttcn-3 tests for osmo-ggsn + libgtpnl + kernel, so we should notice regressions.)

I have also done some regression testing with the TTCN-3 tests for osmo-ggsn and found that using your kernel tree with the old libgtpnl fails. I assume the kernel now requires the family to be set, and does not default to ipv4 as it was the case before. Does that make sense, and if so, could you fix it to avoid the regression?

What I have not tested yet is using the new libgtpnl, which sets the ip family, and using an old kernel which does not support setting the family. For best compatibility, using ipv4 should still work in that combination. Do you think there is anything that needs to be adjusted, or should this be working?

Actions #57

Updated by osmith 14 days ago

Pau has left some review comments on your patches on gerrit. Pablo, do you want to reply there, and possibly update the patches?

Actions #58

Updated by laforge 14 days ago

On Thu, Feb 15, 2024 at 02:25:15PM +0000, osmith wrote:

I have also done some regression testing with the TTCN-3 tests for osmo-ggsn and found that using your kernel tree with the old libgtpnl fails. I assume the kernel now requires the family to be set, and does not default to ipv4 as it was the case before. Does that make sense, and if so, could you fix it to avoid the regression?

if this is true, it would be an ABI breakage and must be resolved.
Old apps/libs must work against new kernels, of course.

Actions #59

Updated by pablo 14 days ago

osmith wrote in #note-57:

Pau has left some review comments on your patches on gerrit. Pablo, do you want to reply there, and possibly update the patches?

Replied to https://gerrit.osmocom.org/c/libgtpnl/+/35985/1 and related comments in the same patch series.

Actions #60

Updated by pablo 14 days ago

pablo wrote in #note-59:

osmith wrote in #note-57:

Pau has left some review comments on your patches on gerrit. Pablo, do you want to reply there, and possibly update the patches?

Replied to https://gerrit.osmocom.org/c/libgtpnl/+/35985/1 and related comments in the same patch series.

pespin said: "What If I want to, lets say bind to IPv4 address A.B.C.D and IPV6 address ::Z, and then announce those in CreateSessionReq/Resp in GTPv{1,2}C? AFAIU I need to create 2 gtp tun devices, binding one to A.B.C.D and one to ::Z."

I am attaching a patch to extend the gtp-link to allow to specify the listener address, I extended one of my container tests, I checked via ss -tu that the specified listener address is shown. It falls back to ANY_ADDRESS in case no address is specified.

Actions #61

Updated by pablo 14 days ago

laforge wrote in #note-58:

On Thu, Feb 15, 2024 at 02:25:15PM +0000, osmith wrote:

I have also done some regression testing with the TTCN-3 tests for osmo-ggsn and found that using your kernel tree with the old libgtpnl fails. I assume the kernel now requires the family to be set, and does not default to ipv4 as it was the case before. Does that make sense, and if so, could you fix it to avoid the regression?

Hm. There is a fallback to AF_INET if the attribute is not specified.

if (info->attrs[GTPA_FAMILY])
      family = nla_get_u8(info->attrs[GTPA_FAMILY]);
else
      family = AF_INET;

if this is true, it would be an ABI breakage and must be resolved.
Old apps/libs must work against new kernels, of course.

I have just tested:

- new libgtnl with old stable 6.1 kernel, using my test2-netns-gtp.sh and it works fine.
- old libgtnl (1.2.5, not including any recent patch for the IPv6 support), I had modified test2-netns-gtp.sh so it uses old gtp-link syntax, with new kernel with GTP IPv6 patches, it also works fine.

I cannot reproduce this issue here.

Actions #62

Updated by osmith 13 days ago

pablo wrote in #note-60:

pablo wrote in #note-59:

osmith wrote in #note-57:

Pau has left some review comments on your patches on gerrit. Pablo, do you want to reply there, and possibly update the patches?

Replied to https://gerrit.osmocom.org/c/libgtpnl/+/35985/1 and related comments in the same patch series.

pespin said: "What If I want to, lets say bind to IPv4 address A.B.C.D and IPV6 address ::Z, and then announce those in CreateSessionReq/Resp in GTPv{1,2}C? AFAIU I need to create 2 gtp tun devices, binding one to A.B.C.D and one to ::Z."

I am attaching a patch to extend the gtp-link to allow to specify the listener address, I extended one of my container tests, I checked via ss -tu that the specified listener address is shown. It falls back to ANY_ADDRESS in case no address is specified.

Thanks! Submitted the patch to gerrit here:
https://gerrit.osmocom.org/c/libgtpnl/+/35997

Adjusted it slightly to make it pass checkpatch:

tools/gtp-link.c:146: WARNING:BRACES_NOT_NECESSARY: braces {} are not necessary for any arm of this statement
total: 0 errors, 1 warnings, 67 lines checked

EDIT: also done trivial fixes to have "[<address>]" in the usage help string

pablo wrote in #note-61:

laforge wrote in #note-58:

On Thu, Feb 15, 2024 at 02:25:15PM +0000, osmith wrote:

I have also done some regression testing with the TTCN-3 tests for osmo-ggsn and found that using your kernel tree with the old libgtpnl fails. I assume the kernel now requires the family to be set, and does not default to ipv4 as it was the case before. Does that make sense, and if so, could you fix it to avoid the regression?

Hm. There is a fallback to AF_INET if the attribute is not specified.

[...]

if this is true, it would be an ABI breakage and must be resolved.
Old apps/libs must work against new kernels, of course.

I have just tested:

- new libgtnl with old stable 6.1 kernel, using my test2-netns-gtp.sh and it works fine.
- old libgtnl (1.2.5, not including any recent patch for the IPv6 support), I had modified test2-netns-gtp.sh so it uses old gtp-link syntax, with new kernel with GTP IPv6 patches, it also works fine.

I cannot reproduce this issue here.

My bad, I ran the ttcn-3 tests with libgptnl nightly (current master) against your kernel tree. I can confirm that if using libgtpnl 1.2.5 against your kernel tree, the same tests pass as when running the testsuite against a standard debian kernel. So there is no regression.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)