mirror of https://github.com/xdp-project/bpf-examples.git synced 2024-05-06 15:54:53 +00:00

Files

Jesper Dangaard Brouer 18908873a7 refactor code test02_should_fail

Introduce helper to make is easier to test two struct variations
with the same invalid btf_id struct location.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

2022-04-22 12:21:35 +02:00

af_xdp_kern_shared.h

AF_XDP-interaction: Shared header for defining MAX_AF_SOCKS

2021-11-12 12:46:11 +01:00

af_xdp_kern.c

Extend xdp_hints_rx_time with xdp_rx_cpu

2021-11-19 14:33:58 +01:00

af_xdp_user.c

AF_XDP-interaction: lib user don't need to check for btf_id

2022-04-05 10:35:34 +02:00

btf_unit_test_bpf.c

AF_XDP-interaction: Add BPF object/code for unit test

2022-04-22 08:33:40 +02:00

btf_unit_test.c

refactor code test02_should_fail

2022-04-22 12:21:35 +02:00

common_defines.h

AF_XDP-interaction: Add cmdline options for specifying IPs

2022-01-18 16:48:34 +01:00

common_params.c

AF_XDP-interaction: Add cmdline options for specifying IPs

2022-01-18 16:48:34 +01:00

common_params.h

AF_XDP-interaction: Add cmdline options for specifying IPs

2022-01-18 16:48:34 +01:00

common_user_bpf_xdp.c

AF_XDP-interaction: Fix strncpy warning in common_user_bpf_xdp.c

2021-10-28 14:20:14 +02:00

common_user_bpf_xdp.h

AF_XDP-interaction: Add common_user_bpf_xdp.c+h from XDP-tutorial

2021-10-26 15:23:55 +02:00

ethtool_utils.c

AF_XDP-interaction: Update ethtool_utils

2021-11-16 15:24:31 +01:00

ethtool_utils.h

AF_XDP-interaction: Update ethtool_utils

2021-11-16 15:24:31 +01:00

hashmap.c

Import libbpf: resizable non-thread safe internal hashmap

2021-11-03 11:01:36 +01:00

hashmap.h

Import libbpf: resizable non-thread safe internal hashmap

2021-11-03 11:01:36 +01:00

lib_checksum.h

AF_XDP-interaction: Packet fill helpers

2021-12-03 18:06:39 +01:00

lib_xsk_extend.c

AF_XDP-interaction: lib_xsk_extend: Validate 'btf_id' is last member

2022-03-25 14:17:23 +01:00

lib_xsk_extend.h

lib_xsk_extend: Avoid crashing in XSK_BTF_READ_xxx macros

2021-11-12 10:01:58 +01:00

Makefile

AF_XDP-interaction: Add BPF object/code for unit test

2022-04-22 08:33:40 +02:00

README.org

AF_XDP-interaction: Record NIC chip used

2021-11-24 17:10:20 +01:00

README.org

How to transfer info from XDP-prog to AF_XDP

XDP-hints via local BTF info
Why is XDP RX-timestamp essential for AF_XDP
AF_XDP documentation
Example bind to all queues

This BPF-example show how use BTF to create a communication channel between XDP BPF-prog (running kernel-side) and AF_XDP user-space process, via XDP-hints in metadata area.

XDP-hints via local BTF info

XDP-hints have been discussed as a facility where NIC drivers provide information in the metadata area (located just before packet header starts). There have been little progress on kernel drivers adding XDP-hints, as end-users are unsure how to decode and consume the information (chicken-and-egg problem).

In this example we let the BPF-object file define and contain the BTF-info about the XDP-hints data-structures. Thus, no kernel or driver changes are needed as the BTF type-definitions are locally defined. And XDP-hints are used as communication channel between XDP BPF-prog and AF_XDP userspace program.

The API for decoding the BTF data-structures have been added to seperate files (in lib_xsk_extend.c and lib_xsk_extend.h) to make this reusable for other projects, with the goal of getting this included in libbpf or libxdp. The API takes an struct btf pointer as input argument when searching for a struct-name. This BTF pointer is obtained from opening the BPF-object ELF file, but it could also come from the kernel (e.g via btf__load_vmlinux_btf()) and even from a kernel module (e.g. via btf__load_module_btf()). See other btf_module_read example howto do this.

The requirement for being a valid XDP-hints data-struct is that the last member in the struct is named btf_id and have size 4 bytes (32-bit). See C code example below. This btf_id member is used for identifying what struct have been put into this metadata area. The kernel-side BPF-prog stores the btf_id via using API bpf_core_type_id_local() to obtain the ID. Userspace API reads the btf_id via reading -4 bytes from packet header start, and can check the ID against the IDs that was available via the struct btf pointer.

 struct xdp_hints_rx_time {
	__u64 rx_ktime;
	__u32 btf_id;
 } __attribute__((aligned(4))) __attribute__((packed));

The location as the last member is because metadata area, located just before packet header starts, can only grow "backwards" (via BPF-helper bpf_xdp_adjust_meta()). To store a larger struct, the metadata is grown by a larger negative offset. The BTF type-information knows the size (and offsets) of all data-structures. Thus, we can deduce the size of the metadata area, when knowing the bpf_id, which by placing it as the last member is in a known location.

Why is XDP RX-timestamp essential for AF_XDP

In this example, the kernel-side XDP BPF-prog (file:af_xdp_kern.c) take a timestamp (bpf_ktime_get_ns()) and stores it in the metadata as an XDP-hint. This make it possible to measure the time-delay from XDP softirq execution and when AF_XDP gets the packet out of its RX-ring. (See Interesting data-points below)

The real value for the application use-case (in question) is that it doesn't need to waste so much CPU time spinning to get this accurate timestamps for packet arrival. The application only need timestamps on the synchronization traffic (PCF frames). The other Time-triggered traffic arrives at a deterministic time (according to established time schedule based on PCF). The application prefers to bulk receive the Time-triggered traffic, which can be acheived by waking up at the right time (according to time-schedule). Thus, it would be wasteful to busy-poll with the only purpose of getting better timing accuracy for the PCF frames.

Interesting data-points

The time-delay from XDP softirq execution and to when AF_XDP gets the packet, give us some interesting data-points, and tell us about system latency and sleep behavior.

The data-points are interesting, as there is (obviously) big difference between waiting for a wakeup (via poll or select) or using the spin-mode, and effects of userspace running on same or a different CPU core, and the CPU sleep state modes and RT-patched kernels.

Driver/HW	Test	core	time-delay avg	min	max	System
igc/i225	spin	same	1575 ns	849 ns	2123 ns	A
igc/i225	spin	remote	2639 ns	2337 ns	4019 ns	A
igc/i225	wakeup	same	22881 ns	21190 ns	30619 ns	A
igc/i225	wakeup	remote	50353 ns	47420 ns	56156 ns	A
conf upd					no C-states	B
igc/i225	spin	same	1402 ns	805 ns	2867 ns	B
igc/i225	spin	remote	1056 ns	419 ns	2798 ns	B
igc/i225	wakeup	same	3177 ns	2210 ns	9136 ns	B
igc/i225	wakeup	remote	4095 ns	3029 ns	10595 ns	B

The latency is affected a lot by CPUs power-saving states, which can be limited globally by changing /dev/cpu_dma_latency. (See section below). The main difference between system A and B is that 'cpu_dma_latency' have been changed to such a low value that CPU doesn't use C-states. (Side-note: used tool tuned-adm profile latency-performance thus other tunings might also have happened)

System RT1 have a Real-Time patched kernel, and cpu_dma_latency have no effect (likely due to kernel config).

Driver/HW	Test	core	time-delay avg	min	max	System
igb/i210	spin	same	2577 ns	2129 ns	4155 ns	RT1
igb/i210	spin	remote	788 ns	551 ns	1473 ns	RT1
igb/i210	wakeup	same	6209 ns	5644 ns	8178 ns	RT1
igb/i210	wakeup	remote	5239 ns	4463 ns	7390 ns	RT1

Systems table:

Name	CPU @ GHz	Kernel	Kernel options	cpu_dma_latency
A	E5-1650 v4 @ 3.60GHz	5.15.0-net-next	PREEMPT	2 ms (2000000000 ns)
B	E5-1650 v4 @ 3.60GHz	5.15.0-net-next	PREEMPT	2 ns
RT1	i5-6500TE @ 2.30GHz	5.13.0-rt1+	PREEMPT_RT	2ms, but no-effect

C-states wakeup time

It is possible to view the systems time (in usec) to wakeup from a certain C-state, via below grep command:

# grep -H . /sys/devices/system/cpu/cpu0/cpuidle/*/latency
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2
/sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10
/sys/devices/system/cpu/cpu0/cpuidle/state3/latency:40
/sys/devices/system/cpu/cpu0/cpuidle/state4/latency:133

Meaning of cpu_dma_latency

The global CPU latency limit is controlled via the file /dev/cpu_dma_latency, which contains a binary value (interpreted as a signed 32-bit integer). Reading contents can be annoying from the command line, so lets provide a practical example:

Reading /dev/cpu_dma_latency:

$ sudo hexdump --format '"%d\n"' /dev/cpu_dma_latency
2000000000

AF_XDP documentation

When developing your AF_XDP application, we recommend familiarising yourself with the core AF_XDP concepts, by reading the kernel documentation for AF_XDP. And XDP-tools also contain documentation in libxdp for AF_XDP, explaining how to use the API, and the difference between the control-path and data-path APIs.

It is particularly important to understand the four different ring-queues which are all Single-Producer Single-Consumer (SPSC) ring-queues. A set of these four queues are needed for each queue on the network device (netdev).

Example bind to all queues

Usually AF_XDP examples makes a point out-of forcing end-user to select a specific queue or channel ID, to show that AF_XDP sockets operates on a single queue ID.

In this example, default behavior, is to setup AF_XDP sockets for ALL configured queues/channels available, and "listen" for packets on all of the queues. This way we can ignore setting up hardware filters or reducing channels to 1 (as a popular workaround).

This also means memory consumption increase as NIC have more queues available. For AF_XDP all the "UMEM" memory is preallocated by userspace and registered with the kernel. AF_XDP trade wasting memory for speedup. Each frame is a full memory-page 4K (4096 bytes). For each channel/queue ID program allocates 4096 frames, which takes up 16MB memory per channel.