2022-01-31 21:40:18 +01:00
|
|
|
#+Title: Controlling TC qdisc TXQ selection via BPF
|
|
|
|
|
|
|
|
* Use-case
|
|
|
|
|
|
|
|
As a policy we don't want any traffic generated by the Linux networking stack,
|
|
|
|
to use transmit queue *zero*.
|
|
|
|
|
|
|
|
This use-case is connected with =AF_XDP=. The example
|
|
|
|
[[file:../AF_XDP-interaction/]] is sending important Real-Time traffic on XDP-socket
|
|
|
|
queue zero. Some HW and NIC drivers (e.g. igb and igc) don't have enough
|
|
|
|
hardware TX-queues to allocate seperate queues for XDP. Thus, these queues are
|
|
|
|
shared between XDP and network stack, and there is a potential lock-contention
|
|
|
|
and also HW queue usage contention.
|
|
|
|
|
|
|
|
* Example
|
|
|
|
|
|
|
|
The BPF code in this example is rather simple:
|
|
|
|
- See: [[file:tc_txq_policy_kern.c]]
|
|
|
|
|
|
|
|
This BPF program is meant to be loaded in the TC *egress* hook.
|
|
|
|
|
|
|
|
** TC-BPF loader
|
|
|
|
|
|
|
|
The =tc= cmdline tool is notorious difficult to use, and have issues (mounting
|
|
|
|
BPF file-system) on Yocto build.
|
|
|
|
|
|
|
|
Thus, [[file:tc_txq_policy.c]] contains a C-code loader, that attach the BPF-prog to
|
|
|
|
the TC-hook, without depending on =tc= command util. Furthermore, the loader
|
|
|
|
uses =bpftool= skeleton feature (to generate a header file) allowing to create a
|
|
|
|
binary that contains the BPF-object itself, making it self-contained.
|
|
|
|
|
2022-02-01 12:40:14 +01:00
|
|
|
* Gotchas: XPS
|
|
|
|
|
|
|
|
For TXQ (=queue_mapping=) overwrite to work, you need to *disable* XPS (Transmit
|
|
|
|
Packet Steering), as XSP will have higher precedence than our BPF change to
|
|
|
|
=queue_mapping=. This is done by writing 0 into each =/sys/class/net/= tx-queue
|
|
|
|
file =/sys/class/net/DEV/queues/tx-*/xps_cpus=.
|
|
|
|
|
|
|
|
A script for configuring and disabling XPS is provided here: [[file:xps_setup_ash.sh]].
|
|
|
|
|
|
|
|
Script command line to disable XPS:
|
|
|
|
#+begin_src sh
|
|
|
|
sudo ./xps_setup_ash.sh --dev DEVICE --default --disable
|
|
|
|
#+end_src
|
2022-01-31 21:40:18 +01:00
|
|
|
|
|
|
|
* Different ways to view queue_mapping
|
|
|
|
|
|
|
|
Notice that =queue_mapping= set in BPF-prog is like RX-recorded number
|
|
|
|
(=skb_rx_queue_recorded=). When reaching TX-layer it will have been decremented
|
|
|
|
by one (by =skb_get_rx_queue()=) at the TX netstack processing stage (in
|
|
|
|
=__dev_queue_xmit()=).
|
|
|
|
|
|
|
|
** perf probe
|
|
|
|
|
|
|
|
The perf tool can be used for recording and inspecting the =skb->queue_mapping=.
|
|
|
|
|
|
|
|
Remember: BPF-prog =queue_mapping= setting have been decremented by one at this
|
|
|
|
TX netstack processing stage.
|
|
|
|
|
|
|
|
#+begin_src sh
|
|
|
|
perf probe -a 'dev_hard_start_xmit skb->dev->name:string skb->queue_mapping skb->hash'
|
|
|
|
Added new event:
|
|
|
|
probe:dev_hard_start_xmit (on dev_hard_start_xmit with name=skb->dev->name:string queue_mapping=skb->queue_mapping hash=skb->hash)
|
|
|
|
|
|
|
|
You can now use it in all perf tools, such as:
|
|
|
|
perf record -e probe:dev_hard_start_xmit -aR sleep 1
|
|
|
|
#+end_src
|
|
|
|
|
|
|
|
Afterwards run =perf script= and see results.
|
2022-02-01 11:56:32 +01:00
|
|
|
|
|
|
|
** bpftrace
|
|
|
|
|
|
|
|
It is also possible to monitor TXQ usage via a =bpftrace= script.
|
|
|
|
* see [[file:monitor_txq_usage.bt]].
|
|
|
|
|
|
|
|
The main part of the script is:
|
|
|
|
#+begin_src sh
|
|
|
|
tracepoint:net:net_dev_start_xmit {
|
|
|
|
$qm = args->queue_mapping;
|
|
|
|
$dev = str(args->name, 15);
|
|
|
|
|
|
|
|
@stat_txq_usage[$dev] = lhist($qm, 0,32,1);
|
|
|
|
}
|
|
|
|
#+end_src
|
|
|
|
|
|
|
|
Or as oneliner:
|
2022-02-01 12:16:19 +01:00
|
|
|
#+begin_src sh
|
|
|
|
bpftrace -e 't:net:net_dev_start_xmit {@txq[str(args->name, 15)]=lhist(args->queue_mapping, 0,32,1)}'
|
|
|
|
#+end_src
|
2022-02-01 15:30:17 +01:00
|
|
|
|
|
|
|
* Inspecting loaded BPF
|
|
|
|
|
|
|
|
How do you see if these BPF TC-hook programs are loaded?
|
|
|
|
|
|
|
|
** bpftool
|
|
|
|
|
|
|
|
The cmdline =bpftool net= can list any network related BPF program:
|
|
|
|
|
|
|
|
#+begin_example
|
|
|
|
root@main-ctrl2:~ # bpftool net
|
|
|
|
xdp:
|
|
|
|
eth1(5) driver id 59
|
|
|
|
|
|
|
|
tc:
|
|
|
|
eth1(5) clsact/egress not_txq_zero:[17] id 17
|
|
|
|
#+end_example
|
|
|
|
|
|
|
|
There we see both the *XDP* BPF-program used by AF_XDP to redirect frames, and
|
|
|
|
the *TC* hook BPF-prog loaded and attached.
|
|
|
|
|
|
|
|
** tc egress
|
|
|
|
|
|
|
|
The tc command need to be longer and more explicit:
|
|
|
|
#+begin_example
|
|
|
|
root@main-ctrl2:~ # tc filter show dev eth1 egress
|
|
|
|
filter protocol all pref 49199 bpf chain 0
|
|
|
|
filter protocol all pref 49199 bpf chain 0 handle 0x1 not_txq_zero:[17] direct-action not_in_hw id 17 tag a761e11074b78959 jited
|
|
|
|
#+end_example
|