mirror of
https://github.com/xdp-project/bpf-examples.git
synced 2024-05-06 15:54:53 +00:00
nat64: Update README
Actually explain how to use and how the translator works. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
This commit is contained in:
@@ -1,16 +1,86 @@
|
|||||||
* NAT64 BPF implementation
|
* NAT64 BPF implementation
|
||||||
|
|
||||||
This directory contains a BPF implementation of a stateless NAT64
|
This directory contains a BPF implementation of a stateless NAT64
|
||||||
implementation, like that performed by Tayga, but entirely in BPF.
|
implementation, like that performed by Tayga, but entirely in BPF. It works by
|
||||||
|
attaching to the TC hooks of an interface and translating incoming IPv6
|
||||||
|
addresses with a destination in the configured NAT64 prefix, and routing v4
|
||||||
|
packets back out through that interface based on the (v4) prefix used for
|
||||||
|
translation.
|
||||||
|
|
||||||
Design:
|
** Running
|
||||||
|
|
||||||
- Global v6 /96 prefix defined as NAT64 prefix
|
To run the translator on =eth0= with an IPv4 prefix of =10.0.1.0/24= and using
|
||||||
- Each interface is assigned a v4 prefix for mapping v6 addresses
|
the default well-known v6 prefix (=64:ff9b::/96=), simply issue
|
||||||
- Install onlink v4 route for that prefix to make sure traffic goes out the interface
|
|
||||||
|
|
||||||
- Attach ingress and egress BPF programs to each interface
|
#+begin_src sh
|
||||||
- On ingress: match v6 packets with a NAT64 prefix destination; remap to v4
|
sudo ./nat64 -i eth0 -4 10.0.1.0/24 -a fc00::/8
|
||||||
- On egress: lookup v4 destination address; if it's in the configured NAT64 prefix, remap back to v6
|
#+end_src
|
||||||
|
|
||||||
|
Run again with a =-u= parameter to unload (but make sure to also specify the
|
||||||
|
rest of the parameters as they are needed to properly clean up). To specify
|
||||||
|
another v6 prefix, use =-6=.
|
||||||
|
|
||||||
|
The userspace utility will install the necessary routing rules, and setup the
|
||||||
|
BPF programs, then exit. The translator will then keep running entirely in the
|
||||||
|
kernel until unloaded (with =-u=).
|
||||||
|
|
||||||
|
** Assumptions
|
||||||
|
|
||||||
|
The operation of this NAT64 translator makes a few assumptions:
|
||||||
|
|
||||||
|
- A single v6 NAT64 prefix is used, and the prefix length is always 96 (i.e.,
|
||||||
|
the v4 addresses live in the last four bytes). By default the well-known
|
||||||
|
prefix =64:ff9b::/96= is used.
|
||||||
|
|
||||||
|
- IPv6 source addresses are mapped into a configured IPv4 prefix one-to-one.
|
||||||
|
Regular NAT4 can be applied afterwards to map to a single public IP. A
|
||||||
|
separate v4 prefix should be used for every interface that the translator runs
|
||||||
|
on. Source address v6-to-v4 mappings are dynamically created as new sources
|
||||||
|
appear, and time out after two hours.
|
||||||
|
|
||||||
|
- An allowlist of IPv6 source prefixes that should be subject to translation is
|
||||||
|
maintained.
|
||||||
|
|
||||||
|
** How it works
|
||||||
|
|
||||||
|
Two BPF programs are attached to the ingress and egress hooks of the interface
|
||||||
|
being configured. The ingress program will process IPv6 packets, and any packet
|
||||||
|
with a destination address in the configured NAT64 prefix will be either
|
||||||
|
translated (if the source is allowed), or dropped. The egress program processes
|
||||||
|
IPv4 packets and any packet with a destination in the configured v4 prefix will
|
||||||
|
be either translated (if a v6 address is found in the state map) or dropped.
|
||||||
|
|
||||||
|
To make sure the v4 traffic makes it to the right interface, a v4-via-v6 route
|
||||||
|
is installed on that interface with a gateway address of the network address of
|
||||||
|
the v6 prefix, and a fake neighbour entry is installed to avoid the kernel doing
|
||||||
|
neighbour lookups of the gateway. This gets the packets to where the BPF program
|
||||||
|
can process them, and after translation a new neighbour lookup with be performed
|
||||||
|
with the new v6 destination.
|
||||||
|
|
||||||
|
Note that because of the place of the BPF hook in ingress processing, the
|
||||||
|
ingress BPF program will need to redirect the packet to the same interface after
|
||||||
|
translation for re-processing as an IPv4 packet. This means that things like
|
||||||
|
tcpdump will see first the original IPv6 packet, and then the translated IPv4
|
||||||
|
packet. On egress the translation happens earlier, so only the translated packet
|
||||||
|
will be seen.
|
||||||
|
|
||||||
|
** Limitations / known issues
|
||||||
|
At least the first two of these should probably be fixed before deploying this:
|
||||||
|
|
||||||
|
- The IP headers in ICMP error message payloads are not translated, which
|
||||||
|
probably breaks ICMP errors.
|
||||||
|
|
||||||
|
- The BPF programs assume the interface is an Ethernet interface, so translation
|
||||||
|
won't work on layer 3 devices (like Wireguard tunnels).
|
||||||
|
|
||||||
|
- IP options are not handled at all. In particular this means that fragmented
|
||||||
|
IPv6 packets won't pass the translator.
|
||||||
|
|
||||||
|
- The BPF programs support specifying multiple allowed source IPv6 prefixes, as
|
||||||
|
well as doing ahead-of-time static mappings, but the userspace component
|
||||||
|
doesn't support these yet.
|
||||||
|
|
||||||
|
- The userspace program also has no way to print its status, or dump the state
|
||||||
|
of the translation table. The BPF maps can be inspected with bpftool as a
|
||||||
|
stopgap measure, though.
|
||||||
|
|
||||||
- Some logic to dynamically assign v4 addresses each time a new v6 source is seen
|
|
||||||
|
Reference in New Issue
Block a user