nat64: Update README

Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
2024-05-06 15:54:53 +00:00 · 2021-10-05 00:44:43 +02:00
parent a5313d2f1b
commit e41e570869
1 changed files with 79 additions and 9 deletions
--- a/nat64-bpf/README.org
+++ b/nat64-bpf/README.org
@@ -1,16 +1,86 @@
 * NAT64 BPF implementation

 This directory contains a BPF implementation of a stateless NAT64
-implementation, like that performed by Tayga, but entirely in BPF.
+implementation, like that performed by Tayga, but entirely in BPF. It works by
+attaching to the TC hooks of an interface and translating incoming IPv6
+addresses with a destination in the configured NAT64 prefix, and routing v4
+packets back out through that interface based on the (v4) prefix used for
+translation.

-Design:
+** Running

- Global v6 /96 prefix defined as NAT64 prefix
- Each interface is assigned a v4 prefix for mapping v6 addresses
-  - Install onlink v4 route for that prefix to make sure traffic goes out the interface
+To run the translator on =eth0= with an IPv4 prefix of =10.0.1.0/24= and using
+the default well-known v6 prefix (=64:ff9b::/96=), simply issue

- Attach ingress and egress BPF programs to each interface
-  - On ingress: match v6 packets with a NAT64 prefix destination; remap to v4
-  - On egress: lookup v4 destination address; if it's in the configured NAT64 prefix, remap back to v6
+#+begin_src sh
+sudo ./nat64 -i eth0 -4 10.0.1.0/24 -a fc00::/8
+#+end_src
+
+Run again with a =-u= parameter to unload (but make sure to also specify the
+rest of the parameters as they are needed to properly clean up). To specify
+another v6 prefix, use =-6=.
+
+The userspace utility will install the necessary routing rules, and setup the
+BPF programs, then exit. The translator will then keep running entirely in the
+kernel until unloaded (with =-u=).
+
+** Assumptions
+
+The operation of this NAT64 translator makes a few assumptions:
+
+- A single v6 NAT64 prefix is used, and the prefix length is always 96 (i.e.,
+  the v4 addresses live in the last four bytes). By default the well-known
+  prefix =64:ff9b::/96= is used.
+
+- IPv6 source addresses are mapped into a configured IPv4 prefix one-to-one.
+  Regular NAT4 can be applied afterwards to map to a single public IP. A
+  separate v4 prefix should be used for every interface that the translator runs
+  on. Source address v6-to-v4 mappings are dynamically created as new sources
+  appear, and time out after two hours.
+
+- An allowlist of IPv6 source prefixes that should be subject to translation is
+  maintained.
+
+** How it works
+
+Two BPF programs are attached to the ingress and egress hooks of the interface
+being configured. The ingress program will process IPv6 packets, and any packet
+with a destination address in the configured NAT64 prefix will be either
+translated (if the source is allowed), or dropped. The egress program processes
+IPv4 packets and any packet with a destination in the configured v4 prefix will
+be either translated (if a v6 address is found in the state map) or dropped.
+
+To make sure the v4 traffic makes it to the right interface, a v4-via-v6 route
+is installed on that interface with a gateway address of the network address of
+the v6 prefix, and a fake neighbour entry is installed to avoid the kernel doing
+neighbour lookups of the gateway. This gets the packets to where the BPF program
+can process them, and after translation a new neighbour lookup with be performed
+with the new v6 destination.
+
+Note that because of the place of the BPF hook in ingress processing, the
+ingress BPF program will need to redirect the packet to the same interface after
+translation for re-processing as an IPv4 packet. This means that things like
+tcpdump will see first the original IPv6 packet, and then the translated IPv4
+packet. On egress the translation happens earlier, so only the translated packet
+will be seen.
+
+** Limitations / known issues
+At least the first two of these should probably be fixed before deploying this:
+
+- The IP headers in ICMP error message payloads are not translated, which
+  probably breaks ICMP errors.
+
+- The BPF programs assume the interface is an Ethernet interface, so translation
+  won't work on layer 3 devices (like Wireguard tunnels).
+
+- IP options are not handled at all. In particular this means that fragmented
+  IPv6 packets won't pass the translator.
+
+- The BPF programs support specifying multiple allowed source IPv6 prefixes, as
+  well as doing ahead-of-time static mappings, but the userspace component
+  doesn't support these yet.
+
+- The userspace program also has no way to print its status, or dump the state
+  of the translation table. The BPF maps can be inspected with bpftool as a
+  stopgap measure, though.

- Some logic to dynamically assign v4 addresses each time a new v6 source is seen