Example programs seems to get out-of-sync (bit rot) more
easily when nobody sees the compile issues.
Thus, add more to the top-level Makefile.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
‘bpf_program__next’ is deprecated: libbpf v0.7+:
use bpf_object__next_program() instead
Also use bpf_xdp_attach() and bpf_xdp_detach() APIs.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The distro kernel UAPI headers evolve too slow.
Thus, maintain a mirror in headers/linux/ in this proj.
Libbpf been overly-eager to get features into their releases
and depend on kernel commit 6089fb325cf7 ("bpf: Add btf enum64 support"),
which have not been released in an official kernel release yet.
Thus, this headers/linux/btf.h update comes from bpf-next git.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
As we have not found a way to get the BTF object ID via the
sysfs filesystem BTF files.
Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>
Skip BTF IDs that doesn't originate from the kernel as this
program are looking for kernel module BTF.
Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>
Manually opening the /sys/kernel/btf/ file and trying to get
info via bpf_obj_get_info_by_fd() doesn't give us anything.
Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>
This contains a fix to the xdp-tools configure script so it works with the
Dash shell used on Debian and derivatives.
Fixes#50.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The trick with printing debug output as a u64 got it in the wrong byte
order; fix that by swapping everything appropriately before printing. Also
add some more information to the drop debug print.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
There were a couple of issues with the IGMP and multicast handling: the
packet parsing checked the MAC address for whether it was a multicast
address before it looked at the IP header, which meant it never got to the
IGMP packets (because they are also sent as multicast). Also, we need to
redirect IGMP packets to the bond master on egress to make sure
subscriptions work as they're supposed to.
Fix the parsing, add the redirect, and also remove the explicit check for
IGMP packets on ingress, as that will already be matched by the multicast
check.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
This reverts commit d3aaec4bdd ("pkt-loop-filter: Check ifindex against
state before dropping packets") - we should not accept packets that are
looped back to the same port either.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The exception for gratuitous ARPs are only supposed to be for entries that
would otherwise be dropped due to the loop filtering logic. In addition, we
should record egress gratuitous ARPs and make sure they don't trigger the
exception when looping back (this is 'rule 4' of the openvswitch SLB
bonding logic).
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
We were indiscriminately dropping packets when the map lookup succeeded,
let's actually check the ifindex first.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
We shouldn't be filtering incoming gratuitous ARPs based on the ifindex
learning. So parse ARP packets and allow them through if they have
identical source and destination IPs.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
When pinning of the bpf_link fails, we keep running to keep the PID alive.
However, staying in the foreground causes problems with scripts that
expects the setup to finish running; so fork into the background instead
and write a PID file so we can kill the running instance on unload.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
When running in the fallback mode where we keep running in the foreground
to keep the kprobe alive, we should unload the cls_bpf programs after being
interrupted instead of just exiting.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Support for bpf_link-based attaching of kprobes was added to kernel 5.15
with commit: b89fbfbb854c ("bpf: Implement minimal BPF perf link"). Prior
to this, it is not possible to pin kprobe attachments in bpffs, which
causes the pkt-loop-filter to fail. Add a fallback where we just keep
running in the foreground to keep the probe alive if bpf_link pinning
fails.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The type of the net->net_cookie field member was changed in kernel 5.12
with commit 3d368ab87cf6 ("net: initialize net->net_cookie at netns setup").
Older versions of the kernel devices net->net_cookie as an atomic64_t
instead of a u64. This causes CO-RE reading of the field to fail due to the
type mismatch. Handle this by adding CO-RE checks for the old type as well
and using the CO-RE facility to check for the right type at load time.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The ktime_get_coarse_ns() helper function was not backported to RHEL8, so
just switch to using ktime_get_boot_ns() instead.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The previous commit working around missing SO_NETNS_COOKIE failed to reset
the err variable, which means things still failed.
Reported-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The SO_NETNS_COOKIE sockopt is fairly new; make sure we can compile the
program without it being defined, and fall back (with a warning) to just
always returning 1 as the netns cookie if the option doesn't work, which
should keep things working in the init namespace at least.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Instead of having to pass the component interfaces to the userspace
program, we can just pass the bond ifname, and have the loader detect which
bond component interfaces are in the bond, and automatically load the BPF
program on each one. Reusing the active bond detection code from the
previous commit also allows us to automatically detect the right initial
active interface, and keep this up-to-date by hooking into the bonding code
that changes it when an iface goes down, instead of naively rotating
between active interfaces.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Add a small utility that uses a kprobe to extract the currently active
slave ifindex from a bond interface. This value is normally only exported
to userspace for bond types where it can be explicitly set, but the bond
driver has an internal notion of an active interface regardless of the bond
type. We can extract this value with a kprobe by attaching to a function in
the bond driver and triggering an operation that causes this function to be
called.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Filter not only the multicast packets themselves, but also any IGMP (and
ICMPv6 MLD) packets coming in on multiple interfaces.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Add a debug mode to pkt-loop-filter that outputs debug messages for every
dropped packet (with the reason it was dropped). Also add a small script to
read the kernel trace pipe, after making sure tracing is active (otherwise
there will be no output in the pipe).
The source MAC address+VLAN is squeezed into a single u64 when printing as
a quick workaround to the lack of MAC address printing in BPF printk.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Multicast, which also includes broadcast, frames can be identified by
looking at the LSB of the first octet of the destination MAC address.
Original-patch-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The debug counter for timed out (deleted by periodical cleanup) flow
states was never incremented, so fix that.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Use global functions to make use of function-by-function verification.
This allows the verifier to analyze parts of the program individually
from each other, which should help reduce the verification
complexity (the number of instructions the verifier must go through to
verify the program) and help prevent exponentially growing with every
loop or branch added to the code.
In this case, break out the packet parsing (parse_packet_identifier)
as a global function, so that it can be handled separately from the
logic after it (updating flow state, saving timestamps, matching
replies to timestamps, calculating and pushing RTTs) etc. To do this,
create small separate wrapper functions (parse_packet_identifier_tc()
and parse_packet_identifier_xdp()) for tc/xdp, so that the verifier
can correctly identify the arguments as pointers to
context (PTR_TO_CTX) when evaluating the global functions. Also create
small wrapper functions pping_tc() and pping_xdp() which call the
corresponding parse_packet_identifier_tc/xdp function.
For this to work in XDP mode (which is the default), the kernel must
have been patched with a fix that addresses an issue with how global
functions are verified for XDP programs, see:
https://lore.kernel.org/all/20220606075253.28422-1-toke@redhat.com/
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Do not provide pointers into the original packet from packet_info
anymore (which the verifier has to ensure are valid), and instead
directly parse all necessary data in parse_packet_identifier and then
only use the parsed data in later functions.
This allows a cleaner separation of concerns, where the parsing
functions parse all necessary data from the packets, and other
functions that need information about the packet only rely on the data
provided in packet_info (and do not attempt to parse any data on their
own).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
This adds filtering of multicast traffic to the set of interfaces. The
filtering works by marking one of the interfaces as "primary" (which is
just the first interface name that is supplied on the command line) and
filtering everything with an all-ones destination MAC address if it's
coming in on any interface that's not the primary one.
To handle interfaces going down, we actually supply all the ifindexes to
the BPF program, and also install a tracing hook that listens to ifdown
events and switches the logic to the next ifindex in the sequence if the
primary one goes down. This is a bit rudimentary but should at least
provide basic filtering.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
This makes it easier to populate the global variables we'll need for
handling multicast, and also means we don't have to worry about keeping the
BPF object file around (since it'll be statically linked).
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
This is needed to be able to react to interfaces going down so we can
allow multicast on a secondary interface if the primary goes down. We don't
actually react to the event yet, just print it; handling this will be added
in a subsequent commit.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Remove the is_egress and ingress_ifindex from the parsing_context
struct to the packet_info struct. Also change the member is_egress to
is_ingress to better fit with the ingress_ifindex member.
These members were only in parsing_context because they were
convenient to fill in right from the start. However, it semantically
makes little sense for the parsing_context to contain these because
they are not used for any parsing, and they fit better with the
packet_info. This also allows later functions (is_local_address(),
pping_timestamp_packet() and pping_match_packet()) to get rid of their
dependency on parsing_context, as it was only used for the is_egress
and ingress_ifindex members (they do not do any parsing). After this
change, parsing_context is only used for the initial parsing, and
packet_info contains all the necessary data for all the functions
related to the pping logic that runs after the packet has been parsed.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>