nat64: Update README

Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
2024-05-06 15:54:53 +00:00 · 2021-10-05 00:44:43 +02:00
parent a5313d2f1b
commit e41e570869
1 changed files with 79 additions and 9 deletions
--- a/nat64-bpf/README.org
+++ b/nat64-bpf/README.org
@@ -1,16 +1,86 @@
 * NAT64 BPF implementation
 This directory contains a BPF implementation of a stateless NAT64
-implementation, like that performed by Tayga, but entirely in BPF.
+implementation, like that performed by Tayga, but entirely in BPF. It works by
 attaching to the TC hooks of an interface and translating incoming IPv6
 addresses with a destination in the configured NAT64 prefix, and routing v4
 packets back out through that interface based on the (v4) prefix used for
 translation.
-Design:
+** Running
- Global v6 /96 prefix defined as NAT64 prefix
+To run the translator on =eth0= with an IPv4 prefix of =10.0.1.0/24= and using
- Each interface is assigned a v4 prefix for mapping v6 addresses
+the default well-known v6 prefix (=64:ff9b::/96=), simply issue
  - Install onlink v4 route for that prefix to make sure traffic goes out the interface
- Attach ingress and egress BPF programs to each interface
+#+begin_src sh
-  - On ingress: match v6 packets with a NAT64 prefix destination; remap to v4
+sudo ./nat64 -i eth0 -4 10.0.1.0/24 -a fc00::/8
-  - On egress: lookup v4 destination address; if it's in the configured NAT64 prefix, remap back to v6
+#+end_src
 Run again with a =-u= parameter to unload (but make sure to also specify the
 rest of the parameters as they are needed to properly clean up). To specify
 another v6 prefix, use =-6=.
 The userspace utility will install the necessary routing rules, and setup the
 BPF programs, then exit. The translator will then keep running entirely in the
 kernel until unloaded (with =-u=).
 ** Assumptions
 The operation of this NAT64 translator makes a few assumptions:
 - A single v6 NAT64 prefix is used, and the prefix length is always 96 (i.e.,
  the v4 addresses live in the last four bytes). By default the well-known
  prefix =64:ff9b::/96= is used.
 - IPv6 source addresses are mapped into a configured IPv4 prefix one-to-one.
  Regular NAT4 can be applied afterwards to map to a single public IP. A
  separate v4 prefix should be used for every interface that the translator runs
  on. Source address v6-to-v4 mappings are dynamically created as new sources
  appear, and time out after two hours.
 - An allowlist of IPv6 source prefixes that should be subject to translation is
  maintained.
 ** How it works
 Two BPF programs are attached to the ingress and egress hooks of the interface
 being configured. The ingress program will process IPv6 packets, and any packet
 with a destination address in the configured NAT64 prefix will be either
 translated (if the source is allowed), or dropped. The egress program processes
 IPv4 packets and any packet with a destination in the configured v4 prefix will
 be either translated (if a v6 address is found in the state map) or dropped.
 To make sure the v4 traffic makes it to the right interface, a v4-via-v6 route
 is installed on that interface with a gateway address of the network address of
 the v6 prefix, and a fake neighbour entry is installed to avoid the kernel doing
 neighbour lookups of the gateway. This gets the packets to where the BPF program
 can process them, and after translation a new neighbour lookup with be performed
 with the new v6 destination.
 Note that because of the place of the BPF hook in ingress processing, the
 ingress BPF program will need to redirect the packet to the same interface after
 translation for re-processing as an IPv4 packet. This means that things like
 tcpdump will see first the original IPv6 packet, and then the translated IPv4
 packet. On egress the translation happens earlier, so only the translated packet
 will be seen.
 ** Limitations / known issues
 At least the first two of these should probably be fixed before deploying this:
 - The IP headers in ICMP error message payloads are not translated, which
  probably breaks ICMP errors.
 - The BPF programs assume the interface is an Ethernet interface, so translation
  won't work on layer 3 devices (like Wireguard tunnels).
 - IP options are not handled at all. In particular this means that fragmented
  IPv6 packets won't pass the translator.
 - The BPF programs support specifying multiple allowed source IPv6 prefixes, as
  well as doing ahead-of-time static mappings, but the userspace component
  doesn't support these yet.
 - The userspace program also has no way to print its status, or dump the state
  of the translation table. The BPF maps can be inspected with bpftool as a
  stopgap measure, though.
 - Some logic to dynamically assign v4 addresses each time a new v6 source is seen