xdp-project-bpf-examples

mirror of https://github.com/xdp-project/bpf-examples.git synced 2024-05-06 15:54:53 +00:00

Author	SHA1	Message	Date
Simon Sundberg	8732c4f813	pping: Change default ingress program from XDP to tc Using the XDP ingress hook requires a newer kernel (needs Toke's patch fixing the verification of global function for BPF_PROG_TYPE_EXT programs) than tc mode, is will likely perform worse than tc if running in generic mode (due to no driver support for XDP). Furthermore, even when XDP works and has driver support, its performance benefit over tc is likely small as the packets are always passed on to the network stack regardless (not creating a fast-path that bypasses the network stack). Therefore, use the tc ingress hook as default instead, and only use XDP if explicitly required by the user (-I/--ingress hook xdp). This partly addresses issue #49, as ePPing should no longer by default get the confusing error message from failing verification if the kernel lacks Toke's verifier patch. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-11-08 09:30:31 +01:00
Simon Sundberg	832bdea23f	pping: Define BPF program names Define the BPF program names in the user space component. The strings corresponding to the BPF program names were before inserted in several places, including in multiple string comparison, which is error prone and could leave to subtle errors if the program names are changed and not updated correctly in all places. With the program name string being defined, they only have to be changed in a single place. Currently only the names of the ingress programs occur in multiple places, but also define the name for the egress program to be consistent. Note that even after this change one has the sync the defined values with the actual program names declared in the pping_kern.c file. Ideally, these would all be defined in a single place, but not aware of a convenient way to make that happen (cannot use the defined strings as function names as they are not identifiers, and if defined as identifiers instead it would not be possible to use them as strings). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-11-08 09:30:31 +01:00
Simon Sundberg	ddf25abfcc	pping: Check if creating clsact on ingress The userspace loader would only check if the tc clsact was created when the egress program was loaded. Thus, if the ingress program created the clsact the egress program would not have to create the clsact, the ePPing would thus falsely believe it did not create a clsact and fail to remove it on shutdown even if --force was used. Fix this by checking if either ingress or egress created clsact. This bug was introduced as a sneaky side effect of commit `78b45bde56` (pping: Use libxdp to load and attach XDP program). Before this commit the egress program (for which there is only a tc alternative) would be loaded first, and thus it was sufficient to check if it created the clsact. When switching to libxdp however, the ingress program (specifically the XDP program) had to be loaded first, and thus the order of loading ingress and egress program were swapped. Therefore, it was no longer sufficient to only check the egress program as the tc ingress program may have created the clsact before the the egress program is attached (and only checking the ingress program would also not be enough as the tc ingress program may never be loaded if XDP mode is used instead). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-11-08 09:30:31 +01:00
Toke Høiland-Jørgensen	af5db036ab	Merge pull request #55 from simosund/pping-skip-syn PPing: Add option to ignore SYN-packets	2022-11-06 14:18:10 +01:00
Simon Sundberg	e932174882	pping: Fix XDP ingress ifindex Set the ingress_ifindex to the ctx->ingress_ifindex rather than ctx->rx_queue_index. This fixes a bug that was accidently introduced in commit #add8885, and which broke the localfilt functionality if the XDP hook was used on ingress (the FIB lookup would fail). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-11-06 14:14:54 +01:00
Simon Sundberg	251c9b7ad3	pping: Wait for id shift before timestamping packet in new flow Make ePPing wait until the first shift of identifier (the "edge") before starting to timestamp packets for new flows (for TCP flows we do not see the start of). The reason this is necessary is that if ePPing start monitoring a flow in the middle of it (ePPing did not see the start of the flow), then we cannot know if the first TSval we see is actually the first instance of the TSval in that flow, so we have to wait until the next TSval to ensure we get the first instance of a TSval (otherwise we may underestimate the RTT by up to the TCP timestamp update period). To avoid the first RTT sample potentially being underestimated this fix essentially ignores the first RTT sample instead. However, it is not always necessary to wait until the first shift. For TCP traffic where we see the initial handshake we know that we've seen the start of the flow. Furthermore, for ICMP traffic it's generally unlikely that there are duplicate identifiers to begin with, so also allow that to start timestamping right away. It should be noted that after the previous commit (which changed ePPing to ignore TCP SYN-packets by default), ePPing will never see the handshake and thus has to assume that it started to monitor all flows in the middle. Therefore, ePPing will (by default) now miss both the RTT during the handshake, as well as RTT for the first few packets sent after the handshake (until the TSval is updated). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-11-04 15:37:51 +01:00
Simon Sundberg	70f255cbf8	pping: Ignore SYN packets by default Make ePPing ignore TCP SYN packets by default, so that the initial handshake phase of the connection is ignored. Add an option (--include-syn/-s) to explicitly include SYN packets. The main reason it can be a good idea to avoid SYN-packets is to avoid being affected by SYN-flood attacks. When ePPing also includes SYN-packets it becomes quite vulnerable to SYN-flood attacks, which will quickly fill up its flow_state table, blocking actual useful flows from being tracked. As ePPing will consider the connection opened as soon as it sees the SYN-ACK (it will not wait for final ACK), flow-state created from SYN-flood attacks will also stay around in the flow-state table for a long time (5 minutes currently) as no RST/FIN will be sent that can be used to close it. The drawback from ignoring SYN-packets is that no RTTs will be collected during the handshake phase, and all connections will be considered opened due to "first observed packet". A more refined approach could be to properly track the full TCP handshake (SYN + SYN-ACK + ACK) instead of the more generic "open once we see reply in reverse direction" used now. However, this adds a fair bit of additional protocol-specific logic. Furthermore, to track the full handshake we will still need to store some flow-state before the handshake is completed, and thus such a solution would still be vulnerable to SYN-flood attacks (although the incomplete flow states could potentially be cleaned up faster). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-11-04 13:47:08 +01:00
Toke Høiland-Jørgensen	7a9db7b08c	Merge pull request #54 from simosund/pping-fix-reorder-issue PPing fix reorder issue	2022-10-11 00:56:08 +02:00
Toke Høiland-Jørgensen	619adfb6b5	Merge pull request #52 from xdp-project/add_examples Add samples from Linux	2022-09-23 11:36:39 +02:00
Magnus Karlsson	c425a168a1	AF_XDP-example: move xdpsock example to bpf-examples repo Move the xdpsock sample application from the Linux repo to the bpf-examples repo. This example demonstrates a number of capabilities of AF_XDP sockets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>	2022-09-23 06:06:57 +00:00
Magnus Karlsson	dbf4feb043	AF_XDP-forwarding: move xsk_fwd to bpf-examples Move the xsk_fwd example application from the Linux repo to bpf-examples. This sample demonstrates the ability to share a umem between multiple sockets by implementing a simple packet forwarding application. It also has a buffer pool manager for allocating and freeing packet buffers. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>	2022-09-21 11:58:41 +00:00
Jesper Dangaard Brouer	c0565f3995	Merge pull request #56 from xdp-project/cleanup01_libbpf_changes Prepare for newer libbpf in bpf-examples.	2022-09-09 18:41:38 +02:00
Jesper Dangaard Brouer	35883baaac	bpf-link-hang: Adjust for newer libbpf API bpf_object__find_program_by_title’ is deprecated: libbpf v0.7+: use bpf_object__find_program_by_name() instead See: https://github.com/libbpf/libbpf/issues/297 libbpf#297 Deprecate bpf_program__title() in favor of bpf_program__section_name(). “Title” term is confusing and unconventional, it’s SEC() in code and “section name” everywhere else. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 16:11:09 +02:00
Jesper Dangaard Brouer	de39ecd258	Makefile: Add more SUBDIRS without API issues Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 15:57:09 +02:00
Jesper Dangaard Brouer	04db7bd740	preserve-dscp: Adjust for newer libbpf API ‘bpf_map__next’ is deprecated: libbpf v0.7+: use bpf_object__next_map() instead Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 15:49:40 +02:00
Jesper Dangaard Brouer	08161febd1	Makefile: SUBDIR programs depend on lib being finished first This makes it possible to use make -j for simultaneous make processes to run. This does make the pretty output unordered. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 14:57:28 +02:00
Jesper Dangaard Brouer	e70136a68e	ktrace-CO-RE: Adjust for newer libbpf API ‘bpf_program__next’ is deprecated: libbpf v0.7+: use bpf_object__next_program() instead See: https://github.com/libbpf/libbpf/issues/296 Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 14:45:42 +02:00
Jesper Dangaard Brouer	def0169f41	Makefile: Add more top-level directories Example programs seems to get out-of-sync (bit rot) more easily when nobody sees the compile issues. Thus, add more to the top-level Makefile. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 14:34:11 +02:00
Jesper Dangaard Brouer	7fe6d862e8	traffic-pacing-edt: Adjust for newer libbpf API ‘bpf_program__next’ is deprecated: libbpf v0.7+: use bpf_object__next_program() instead Also use bpf_xdp_attach() and bpf_xdp_detach() APIs. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 14:34:11 +02:00
Jesper Dangaard Brouer	88b05144a2	nat64-bpf: rename bpf_map__resize() to bpf_map__set_max_entries() Libbpf API change: Discourage bpf_map__resize(), which is an alias to more clearly named bpf_map__set_max_entries() See: https://github.com/libbpf/libbpf/issues/304 And API migration guide: https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#libbpfh-high-level-apis Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 14:34:11 +02:00
Jesper Dangaard Brouer	5d111a29ee	Update kernel-mirrored UAPI header file btf.h The distro kernel UAPI headers evolve too slow. Thus, maintain a mirror in headers/linux/ in this proj. Libbpf been overly-eager to get features into their releases and depend on kernel commit 6089fb325cf7 ("bpf: Add btf enum64 support"), which have not been released in an official kernel release yet. Thus, this headers/linux/btf.h update comes from bpf-next git. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>	2022-09-02 14:34:11 +02:00
Jesper Dangaard Brouer	9fe28d5577	BTF-playgroundi: Drop printing BTF object ID in btf_module_read As we have not found a way to get the BTF object ID via the sysfs filesystem BTF files. Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-09-02 14:03:27 +02:00
Jesper Dangaard Brouer	d88d8ffe89	BTF-playground: btf_module_ids is only interested in kernel BTF Skip BTF IDs that doesn't originate from the kernel as this program are looking for kernel module BTF. Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-09-02 14:00:55 +02:00
Jesper Dangaard Brouer	203343c5ac	BTF-playground: Mark functions that are privileged Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-09-01 17:16:45 +02:00
Simon Sundberg	867b659534	pping: Remove outdated issue from TODO The previous commit fixes the issue of reordered packets being able to bypass the unique TSval check, so remove the corresponding section from the issues in the TODO. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-09-01 15:47:55 +02:00
Simon Sundberg	03e9245ae9	pping: ensure TSval is monotonically increasing The mechanism to ensure that only the first instance of each TSval is timestamped is a simple equals check. This is check may fail if there are reordered packets. Consider a sequence of packets A, B, C and D, where A and B have TSval=1 and C and D have TSval=2. If all packets arrive in order (ABCD), then A and C will correctly be the only packets that are timestamped (as B and D will have the same TSval as the previously observed one). However, consider if B is reorderd so instead the packets arrive as ACBD. In this scenario all ePPing will attempt to timestamp all (instead of only A and C), as each packet now has a different (but not always higher) TSval than the last seen packet. Note that it will only sucessfully create the timestamps for the later duplicated TSvals if the previous timestamp for the same TSval has already been cleared out, so this is mainly an issue when RTT < 1ms. Fix this by only allowing a packet to be timestamped if its TSval is stricly higher (accounting for wrap-around) than the last seen TSval, and likewise only update last seen TSval if it is strictly higher than the previous one. To allow this calculation, also convert TSval and TSecr from network byte order to host byte order when parsing the packet. While delaying the transform from network to host byte order until the comparison between the packet's TSval and last seen TSval could potentially save the overhead of bpf_ntohs for some packets that do not need to go through this check, most TCP packets will end up performing this check, so performance difference should be minimal. Therefore, opt for the simplier approach of converting TSval and TSecr directly, which also makes them easier to interpret if ex. dumping the maps. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>	2022-09-01 15:40:12 +02:00
Jesper Dangaard Brouer	fec451bb34	BTF-playground: Drop normal vmlinux open in btf_module_ids This btf_module_ids.c example is about getting BTF info via the object IDs. Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-09-01 15:28:58 +02:00
Jesper Dangaard Brouer	639cd96f42	Merge pull request #53 from xdp-project/BTF-playground01 BTF playground for extracting Kernel BTF object ID	2022-08-31 19:08:24 +02:00
Jesper Dangaard Brouer	d77b378c28	BTF-playground: Add params --module and --symbol For easier playing around with poking at different modules on cmdline. Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 18:37:41 +02:00
Jesper Dangaard Brouer	d33d264494	BTF-playground: Stop opening vmlinux as it was a dead end Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 16:29:13 +02:00
Jesper Dangaard Brouer	e4616a809f	BTF-playground: Extract the BTF data size from info call Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 16:23:26 +02:00
Jesper Dangaard Brouer	a998181376	BTF-playground: Find BTF id by name compare walk Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 14:37:02 +02:00
Jesper Dangaard Brouer	e00b6a66e6	BTF-playground: Extract BTF name while walking all IDs Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 12:23:43 +02:00
Jesper Dangaard Brouer	55fcfcce87	BTF-playground: btf_module_ids.c try walking BTF IDs Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 11:22:22 +02:00
Jesper Dangaard Brouer	9eeaf90eaf	BTF-playground: Small steps in btf_module_ids.c Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 10:58:02 +02:00
Jesper Dangaard Brouer	2d73471c2c	BTF-playground: Boilerplate for btf_module_ids.c Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 10:45:02 +02:00
Jesper Dangaard Brouer	f3de9c2e47	BTF-playground: New failed attempt at getting BTF obj ID Manually opening the /sys/kernel/btf/ file and trying to get info via bpf_obj_get_info_by_fd() doesn't give us anything. Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 10:37:54 +02:00
Jesper Dangaard Brouer	bd3e07587c	BTF-playground: refactor code Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 10:23:01 +02:00
Jesper Dangaard Brouer	f0755442ff	BTF-playground: Cannot get kernel BTF obj ID this way Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>	2022-08-31 09:59:16 +02:00
Toke Høiland-Jørgensen	ee0ed78ce8	lib/xdp-tools: Update to latest master This contains a fix to the xdp-tools configure script so it works with the Dash shell used on Debian and derivatives. Fixes #50. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-08-18 20:40:44 +02:00
Toke Høiland-Jørgensen	070c26233e	pkt-loop-filter: Fix byte order of debug-printed MAC addresses The trick with printing debug output as a u64 got it in the wrong byte order; fix that by swapping everything appropriately before printing. Also add some more information to the drop debug print. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-15 01:41:13 +02:00
Toke Høiland-Jørgensen	c24075e732	pkt-loop-filter: Rework IGMP and multicast handling There were a couple of issues with the IGMP and multicast handling: the packet parsing checked the MAC address for whether it was a multicast address before it looked at the IP header, which meant it never got to the IGMP packets (because they are also sent as multicast). Also, we need to redirect IGMP packets to the bond master on egress to make sure subscriptions work as they're supposed to. Fix the parsing, add the redirect, and also remove the explicit check for IGMP packets on ingress, as that will already be matched by the multicast check. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-15 01:28:11 +02:00
Toke Høiland-Jørgensen	cbc0bd6f40	pkt-loop-filter: Drop all packets found in lookup table This reverts commit `d3aaec4bdd` ("pkt-loop-filter: Check ifindex against state before dropping packets") - we should not accept packets that are looped back to the same port either. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-15 00:31:59 +02:00
Toke Høiland-Jørgensen	4089679c07	pkt-loop-filter: Fix gratuitous ARP handling The exception for gratuitous ARPs are only supposed to be for entries that would otherwise be dropped due to the loop filtering logic. In addition, we should record egress gratuitous ARPs and make sure they don't trigger the exception when looping back (this is 'rule 4' of the openvswitch SLB bonding logic). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-14 23:45:01 +02:00
Toke Høiland-Jørgensen	d3aaec4bdd	pkt-loop-filter: Check ifindex against state before dropping packets We were indiscriminately dropping packets when the map lookup succeeded, let's actually check the ifindex first. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-13 17:34:01 +02:00
Toke Høiland-Jørgensen	6745916e91	pkt-loop-filter: Allow incoming gratuitous ARPs We shouldn't be filtering incoming gratuitous ARPs based on the ifindex learning. So parse ARP packets and allow them through if they have identical source and destination IPs. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-13 17:27:12 +02:00
Toke Høiland-Jørgensen	846acc75e7	pkt-loop-filter: Keep running in background instead of foreground When pinning of the bpf_link fails, we keep running to keep the PID alive. However, staying in the foreground causes problems with scripts that expects the setup to finish running; so fork into the background instead and write a PID file so we can kill the running instance on unload. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-11 12:21:40 +02:00
Toke Høiland-Jørgensen	c833c5ad32	pkt-loop-filter: Unload after interruption in keep running fallback mode When running in the fallback mode where we keep running in the foreground to keep the kprobe alive, we should unload the cls_bpf programs after being interrupted instead of just exiting. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-08 14:49:31 +02:00
Toke Høiland-Jørgensen	53a9bbe4c4	pkt-loop-filter: Add fallback if we can't pin bpf_link for kprobe Support for bpf_link-based attaching of kprobes was added to kernel 5.15 with commit: b89fbfbb854c ("bpf: Implement minimal BPF perf link"). Prior to this, it is not possible to pin kprobe attachments in bpffs, which causes the pkt-loop-filter to fail. Add a fallback where we just keep running in the foreground to keep the probe alive if bpf_link pinning fails. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-08 14:46:53 +02:00
Toke Høiland-Jørgensen	0306ff7cca	pkt-loop-filter: Handle old type of net->net_cookie The type of the net->net_cookie field member was changed in kernel 5.12 with commit 3d368ab87cf6 ("net: initialize net->net_cookie at netns setup"). Older versions of the kernel devices net->net_cookie as an atomic64_t instead of a u64. This causes CO-RE reading of the field to fail due to the type mismatch. Handle this by adding CO-RE checks for the old type as well and using the CO-RE facility to check for the right type at load time. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>	2022-07-08 14:46:14 +02:00

1 2 3 4 5 ...

566 Commits