mirror of
https://github.com/xdp-project/bpf-examples.git
synced 2024-05-06 15:54:53 +00:00
nat64: Update README
Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
This commit is contained in:
@@ -1,16 +1,86 @@
|
||||
* NAT64 BPF implementation
|
||||
|
||||
This directory contains a BPF implementation of a stateless NAT64
|
||||
implementation, like that performed by Tayga, but entirely in BPF.
|
||||
implementation, like that performed by Tayga, but entirely in BPF. It works by
|
||||
attaching to the TC hooks of an interface and translating incoming IPv6
|
||||
addresses with a destination in the configured NAT64 prefix, and routing v4
|
||||
packets back out through that interface based on the (v4) prefix used for
|
||||
translation.
|
||||
|
||||
Design:
|
||||
** Running
|
||||
|
||||
- Global v6 /96 prefix defined as NAT64 prefix
|
||||
- Each interface is assigned a v4 prefix for mapping v6 addresses
|
||||
- Install onlink v4 route for that prefix to make sure traffic goes out the interface
|
||||
To run the translator on =eth0= with an IPv4 prefix of =10.0.1.0/24= and using
|
||||
the default well-known v6 prefix (=64:ff9b::/96=), simply issue
|
||||
|
||||
- Attach ingress and egress BPF programs to each interface
|
||||
- On ingress: match v6 packets with a NAT64 prefix destination; remap to v4
|
||||
- On egress: lookup v4 destination address; if it's in the configured NAT64 prefix, remap back to v6
|
||||
#+begin_src sh
|
||||
sudo ./nat64 -i eth0 -4 10.0.1.0/24 -a fc00::/8
|
||||
#+end_src
|
||||
|
||||
Run again with a =-u= parameter to unload (but make sure to also specify the
|
||||
rest of the parameters as they are needed to properly clean up). To specify
|
||||
another v6 prefix, use =-6=.
|
||||
|
||||
The userspace utility will install the necessary routing rules, and setup the
|
||||
BPF programs, then exit. The translator will then keep running entirely in the
|
||||
kernel until unloaded (with =-u=).
|
||||
|
||||
** Assumptions
|
||||
|
||||
The operation of this NAT64 translator makes a few assumptions:
|
||||
|
||||
- A single v6 NAT64 prefix is used, and the prefix length is always 96 (i.e.,
|
||||
the v4 addresses live in the last four bytes). By default the well-known
|
||||
prefix =64:ff9b::/96= is used.
|
||||
|
||||
- IPv6 source addresses are mapped into a configured IPv4 prefix one-to-one.
|
||||
Regular NAT4 can be applied afterwards to map to a single public IP. A
|
||||
separate v4 prefix should be used for every interface that the translator runs
|
||||
on. Source address v6-to-v4 mappings are dynamically created as new sources
|
||||
appear, and time out after two hours.
|
||||
|
||||
- An allowlist of IPv6 source prefixes that should be subject to translation is
|
||||
maintained.
|
||||
|
||||
** How it works
|
||||
|
||||
Two BPF programs are attached to the ingress and egress hooks of the interface
|
||||
being configured. The ingress program will process IPv6 packets, and any packet
|
||||
with a destination address in the configured NAT64 prefix will be either
|
||||
translated (if the source is allowed), or dropped. The egress program processes
|
||||
IPv4 packets and any packet with a destination in the configured v4 prefix will
|
||||
be either translated (if a v6 address is found in the state map) or dropped.
|
||||
|
||||
To make sure the v4 traffic makes it to the right interface, a v4-via-v6 route
|
||||
is installed on that interface with a gateway address of the network address of
|
||||
the v6 prefix, and a fake neighbour entry is installed to avoid the kernel doing
|
||||
neighbour lookups of the gateway. This gets the packets to where the BPF program
|
||||
can process them, and after translation a new neighbour lookup with be performed
|
||||
with the new v6 destination.
|
||||
|
||||
Note that because of the place of the BPF hook in ingress processing, the
|
||||
ingress BPF program will need to redirect the packet to the same interface after
|
||||
translation for re-processing as an IPv4 packet. This means that things like
|
||||
tcpdump will see first the original IPv6 packet, and then the translated IPv4
|
||||
packet. On egress the translation happens earlier, so only the translated packet
|
||||
will be seen.
|
||||
|
||||
** Limitations / known issues
|
||||
At least the first two of these should probably be fixed before deploying this:
|
||||
|
||||
- The IP headers in ICMP error message payloads are not translated, which
|
||||
probably breaks ICMP errors.
|
||||
|
||||
- The BPF programs assume the interface is an Ethernet interface, so translation
|
||||
won't work on layer 3 devices (like Wireguard tunnels).
|
||||
|
||||
- IP options are not handled at all. In particular this means that fragmented
|
||||
IPv6 packets won't pass the translator.
|
||||
|
||||
- The BPF programs support specifying multiple allowed source IPv6 prefixes, as
|
||||
well as doing ahead-of-time static mappings, but the userspace component
|
||||
doesn't support these yet.
|
||||
|
||||
- The userspace program also has no way to print its status, or dump the state
|
||||
of the translation table. The BPF maps can be inspected with bpftool as a
|
||||
stopgap measure, though.
|
||||
|
||||
- Some logic to dynamically assign v4 addresses each time a new v6 source is seen
|
||||
|
Reference in New Issue
Block a user