nat64-bpf/README.org

* NAT64 BPF implementation

This directory contains a BPF implementation of a stateless NAT64
implementation, like that performed by Tayga, but entirely in BPF. It works by
attaching to the TC hooks of an interface and translating incoming IPv6
addresses with a destination in the configured NAT64 prefix, and routing v4
packets back out through that interface based on the (v4) prefix used for
translation.

** Running

To run the translator on =eth0= with an IPv4 prefix of =10.0.1.0/24= and using
the default well-known v6 prefix (=64:ff9b::/96=), simply issue

#+begin_src sh
sudo ./nat64 -i eth0 -4 10.0.1.0/24 -a fc00::/8
#+end_src

Run again with a =-u= parameter to unload (but make sure to also specify the
rest of the parameters as they are needed to properly clean up). To specify
another v6 prefix, use =-6=.

The userspace utility will install the necessary routing rules, and setup the
BPF programs, then exit. The translator will then keep running entirely in the
kernel until unloaded (with =-u=).

** Assumptions

The operation of this NAT64 translator makes a few assumptions:

- A single v6 NAT64 prefix is used, and the prefix length is always 96 (i.e.,
  the v4 addresses live in the last four bytes). By default the well-known
  prefix =64:ff9b::/96= is used.

- IPv6 source addresses are mapped into a configured IPv4 prefix one-to-one.
  Regular NAT4 can be applied afterwards to map to a single public IP. A
  separate v4 prefix should be used for every interface that the translator runs
  on. Source address v6-to-v4 mappings are dynamically created as new sources
  appear, and time out after two hours.

- An allowlist of IPv6 source prefixes that should be subject to translation is
  maintained.

** How it works

Two BPF programs are attached to the ingress and egress hooks of the interface
being configured. The ingress program will process IPv6 packets, and any packet
with a destination address in the configured NAT64 prefix will be either
translated (if the source is allowed), or dropped. The egress program processes
IPv4 packets and any packet with a destination in the configured v4 prefix will
be either translated (if a v6 address is found in the state map) or dropped.

To make sure the v4 traffic makes it to the right interface, a v4-via-v6 route
is installed on that interface with a gateway address of the network address of
the v6 prefix, and a fake neighbour entry is installed to avoid the kernel doing
neighbour lookups of the gateway. This gets the packets to where the BPF program
can process them, and after translation a new neighbour lookup with be performed
with the new v6 destination.

Note that because of the place of the BPF hook in ingress processing, the
ingress BPF program will need to redirect the packet to the same interface after
translation for re-processing as an IPv4 packet. This means that things like
tcpdump will see first the original IPv6 packet, and then the translated IPv4
packet. On egress the translation happens earlier, so only the translated packet
will be seen.

** Limitations / known issues
At least the first two of these should probably be fixed before deploying this:

- The IP headers in ICMP error message payloads are not translated, which
  probably breaks ICMP errors.

- The BPF programs assume the interface is an Ethernet interface, so translation
  won't work on layer 3 devices (like Wireguard tunnels).

- IP options are not handled at all. In particular this means that fragmented
  IPv6 packets won't pass the translator.

- The BPF programs support specifying multiple allowed source IPv6 prefixes, as
  well as doing ahead-of-time static mappings, but the userspace component
  doesn't support these yet.

- The userspace program also has no way to print its status, or dump the state
  of the translation table. The BPF maps can be inspected with bpftool as a
  stopgap measure, though.
nat64-bpf: Initial version This adds an initial version of a NAT64 translator in BPF. It compiles and loads, but doesn't actually appear to work yet. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-09-29 01:46:09 +02:00			`* NAT64 BPF implementation`

			`This directory contains a BPF implementation of a stateless NAT64`
nat64: Update README Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-10-05 00:44:43 +02:00			`implementation, like that performed by Tayga, but entirely in BPF. It works by`
			`attaching to the TC hooks of an interface and translating incoming IPv6`
			`addresses with a destination in the configured NAT64 prefix, and routing v4`
			`packets back out through that interface based on the (v4) prefix used for`
			`translation.`
nat64-bpf: Initial version This adds an initial version of a NAT64 translator in BPF. It compiles and loads, but doesn't actually appear to work yet. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-09-29 01:46:09 +02:00
nat64: Update README Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-10-05 00:44:43 +02:00			`** Running`
nat64-bpf: Initial version This adds an initial version of a NAT64 translator in BPF. It compiles and loads, but doesn't actually appear to work yet. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-09-29 01:46:09 +02:00
nat64: Update README Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-10-05 00:44:43 +02:00			`To run the translator on =eth0= with an IPv4 prefix of =10.0.1.0/24= and using`
			`the default well-known v6 prefix (=64:ff9b::/96=), simply issue`
nat64-bpf: Initial version This adds an initial version of a NAT64 translator in BPF. It compiles and loads, but doesn't actually appear to work yet. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-09-29 01:46:09 +02:00
nat64: Update README Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-10-05 00:44:43 +02:00			`#+begin_src sh`
			`sudo ./nat64 -i eth0 -4 10.0.1.0/24 -a fc00::/8`
			`#+end_src`

			`Run again with a =-u= parameter to unload (but make sure to also specify the`
			`rest of the parameters as they are needed to properly clean up). To specify`
			`another v6 prefix, use =-6=.`

			`The userspace utility will install the necessary routing rules, and setup the`
			`BPF programs, then exit. The translator will then keep running entirely in the`
			`kernel until unloaded (with =-u=).`

			`** Assumptions`

			`The operation of this NAT64 translator makes a few assumptions:`

			`- A single v6 NAT64 prefix is used, and the prefix length is always 96 (i.e.,`
			`the v4 addresses live in the last four bytes). By default the well-known`
			`prefix =64:ff9b::/96= is used.`

			`- IPv6 source addresses are mapped into a configured IPv4 prefix one-to-one.`
			`Regular NAT4 can be applied afterwards to map to a single public IP. A`
			`separate v4 prefix should be used for every interface that the translator runs`
			`on. Source address v6-to-v4 mappings are dynamically created as new sources`
			`appear, and time out after two hours.`

			`- An allowlist of IPv6 source prefixes that should be subject to translation is`
			`maintained.`

			`** How it works`

			`Two BPF programs are attached to the ingress and egress hooks of the interface`
			`being configured. The ingress program will process IPv6 packets, and any packet`
			`with a destination address in the configured NAT64 prefix will be either`
			`translated (if the source is allowed), or dropped. The egress program processes`
			`IPv4 packets and any packet with a destination in the configured v4 prefix will`
			`be either translated (if a v6 address is found in the state map) or dropped.`

			`To make sure the v4 traffic makes it to the right interface, a v4-via-v6 route`
			`is installed on that interface with a gateway address of the network address of`
			`the v6 prefix, and a fake neighbour entry is installed to avoid the kernel doing`
			`neighbour lookups of the gateway. This gets the packets to where the BPF program`
			`can process them, and after translation a new neighbour lookup with be performed`
			`with the new v6 destination.`

			`Note that because of the place of the BPF hook in ingress processing, the`
			`ingress BPF program will need to redirect the packet to the same interface after`
			`translation for re-processing as an IPv4 packet. This means that things like`
			`tcpdump will see first the original IPv6 packet, and then the translated IPv4`
			`packet. On egress the translation happens earlier, so only the translated packet`
			`will be seen.`

			`** Limitations / known issues`
			`At least the first two of these should probably be fixed before deploying this:`

			`- The IP headers in ICMP error message payloads are not translated, which`
			`probably breaks ICMP errors.`

			`- The BPF programs assume the interface is an Ethernet interface, so translation`
			`won't work on layer 3 devices (like Wireguard tunnels).`

			`- IP options are not handled at all. In particular this means that fragmented`
			`IPv6 packets won't pass the translator.`

			`- The BPF programs support specifying multiple allowed source IPv6 prefixes, as`
			`well as doing ahead-of-time static mappings, but the userspace component`
			`doesn't support these yet.`

			`- The userspace program also has no way to print its status, or dump the state`
			`of the translation table. The BPF maps can be inspected with bpftool as a`
			`stopgap measure, though.`
nat64-bpf: Initial version This adds an initial version of a NAT64 translator in BPF. It compiles and loads, but doesn't actually appear to work yet. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> 2021-09-29 01:46:09 +02:00