Controlling TC qdisc TXQ selection via BPF
Use-case
As a policy we don't want any traffic generated by the Linux networking stack, to use transmit queue zero.
This use-case is connected with AF_XDP
. The example
../AF_XDP-interaction/ is sending important Real-Time traffic on XDP-socket
queue zero. Some HW and NIC drivers (e.g. igb and igc) don't have enough
hardware TX-queues to allocate seperate queues for XDP. Thus, these queues are
shared between XDP and network stack, and there is a potential lock-contention
and also HW queue usage contention.
Example
The BPF code in this example is rather simple:
- See: tc_txq_policy_kern.c
This BPF program is meant to be loaded in the TC egress hook.
TC-BPF loader
The tc
cmdline tool is notorious difficult to use, and have issues (mounting
BPF file-system) on Yocto build.
Thus, tc_txq_policy.c contains a C-code loader, that attach the BPF-prog to
the TC-hook, without depending on tc
command util. Furthermore, the loader
uses bpftool
skeleton feature (to generate a header file) allowing to create a
binary that contains the BPF-object itself, making it self-contained.
Gotchas: XPS
For TXQ (queue_mapping
) overwrite to work, you need to disable XPS (Transmit
Packet Steering), as XSP will have higher precedence than our BPF change to
queue_mapping
. This is done by writing 0 into each /sys/class/net/
tx-queue
file /sys/class/net/DEV/queues/tx-*/xps_cpus
.
A script for configuring and disabling XPS is provided here: xps_setup_ash.sh.
Script command line to disable XPS:
sudo ./xps_setup_ash.sh --dev DEVICE --default --disable
Different ways to view queue_mapping
Notice that queue_mapping
set in BPF-prog is like RX-recorded number
(skb_rx_queue_recorded
). When reaching TX-layer it will have been decremented
by one (by skb_get_rx_queue()
) at the TX netstack processing stage (in
__dev_queue_xmit()
).
perf probe
The perf tool can be used for recording and inspecting the skb->queue_mapping
.
Remember: BPF-prog queue_mapping
setting have been decremented by one at this
TX netstack processing stage.
perf probe -a 'dev_hard_start_xmit skb->dev->name:string skb->queue_mapping skb->hash'
Added new event:
probe:dev_hard_start_xmit (on dev_hard_start_xmit with name=skb->dev->name:string queue_mapping=skb->queue_mapping hash=skb->hash)
You can now use it in all perf tools, such as:
perf record -e probe:dev_hard_start_xmit -aR sleep 1
Afterwards run perf script
and see results.
bpftrace
It is also possible to monitor TXQ usage via a bpftrace
script.
- see monitor_txq_usage.bt.
The main part of the script is:
tracepoint:net:net_dev_start_xmit {
$qm = args->queue_mapping;
$dev = str(args->name, 15);
@stat_txq_usage[$dev] = lhist($qm, 0,32,1);
}
Or as oneliner:
bpftrace -e 't:net:net_dev_start_xmit {@txq[str(args->name, 15)]=lhist(args->queue_mapping, 0,32,1)}'
Inspecting loaded BPF
How do you see if these BPF TC-hook programs are loaded?
bpftool
The cmdline bpftool net
can list any network related BPF program:
root@main-ctrl2:~ # bpftool net xdp: eth1(5) driver id 59 tc: eth1(5) clsact/egress not_txq_zero:[17] id 17
There we see both the XDP BPF-program used by AF_XDP to redirect frames, and the TC hook BPF-prog loaded and attached.
tc egress
The tc command need to be longer and more explicit:
root@main-ctrl2:~ # tc filter show dev eth1 egress filter protocol all pref 49199 bpf chain 0 filter protocol all pref 49199 bpf chain 0 handle 0x1 not_txq_zero:[17] direct-action not_in_hw id 17 tag a761e11074b78959 jited