2020-12-11 19:25:35 +01:00
|
|
|
# PPing using XDP and TC-BPF
|
2021-02-25 11:16:38 +01:00
|
|
|
A re-implementation of [Kathie Nichols' passive ping
|
2021-03-26 16:57:48 +01:00
|
|
|
(pping)](https://github.com/pollere/pping) utility using XDP (on ingress) and
|
|
|
|
TC-BPF (on egress) for the packet capture logic.
|
2021-01-26 18:34:23 +01:00
|
|
|
|
|
|
|
## Simple description
|
2021-03-26 16:57:48 +01:00
|
|
|
Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It
|
|
|
|
can be used on endhosts as well as any (BPF-capable Linux) device which can see
|
|
|
|
both directions of the traffic (ex router or middlebox). Currently it only works
|
|
|
|
for TCP traffic which uses the TCP timestamp option, but could be extended to
|
|
|
|
also work with for example TCP seq/ACK numbers, the QUIC spinbit and ICMP
|
|
|
|
echo-reply messages. See the [TODO-list](./TODO.md) for more potential features
|
|
|
|
(which may or may not ever get implemented).
|
2021-01-26 18:34:23 +01:00
|
|
|
|
2021-03-26 16:57:48 +01:00
|
|
|
The fundamental logic of pping is to timestamp a pseudo-unique identifier for
|
|
|
|
outgoing packets, and then look for matches in the incoming packets. If a match
|
|
|
|
is found, the RTT is simply calculated as the time difference between the
|
|
|
|
current time and the timestamp.
|
2021-01-26 18:34:23 +01:00
|
|
|
|
2021-03-26 17:54:42 +01:00
|
|
|
This tool, just as Kathie's original pping implementation, uses TCP timestamps
|
|
|
|
as identifiers. For outgoing packets, the TSval (which is a timestamp in and off
|
|
|
|
itself) is timestamped. Incoming packets are then parsed for the TSecr, which
|
|
|
|
are the echoed TSval values from the receiver. The TCP timestamps are not
|
2021-03-26 16:57:48 +01:00
|
|
|
necessarily unique for every packet (they have a limited update frequency,
|
|
|
|
appears to be 1000 Hz for modern Linux systems), so only the first instance of
|
|
|
|
an identifier is timestamped, and matched against the first incoming packet with
|
|
|
|
the identifier. The mechanism to ensure only the first packet is timestamped and
|
|
|
|
matched differs from the one in Kathie's pping, and is further described in
|
|
|
|
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
|
2020-12-17 18:10:50 +01:00
|
|
|
|
2021-03-26 16:57:48 +01:00
|
|
|
## Design and technical description
|
2020-12-17 18:10:50 +01:00
|
|
|

|
2021-03-26 16:57:48 +01:00
|
|
|
|
|
|
|
### Files:
|
2021-04-22 18:06:09 +02:00
|
|
|
- **pping.c:** Userspace program that loads and attaches the BPF programs, pulls
|
2021-03-26 16:57:48 +01:00
|
|
|
the perf-buffer `rtt_events` to print out RTT messages and periodically cleans
|
|
|
|
up the hash-maps from old entries. Also passes user options to the BPF
|
|
|
|
programs by setting a "global variable" (stored in the programs .rodata
|
|
|
|
section).
|
2021-04-22 18:06:09 +02:00
|
|
|
- **pping_kern.c:** Contains the BPF programs that are loaded on tc (egress) and
|
|
|
|
XDP (ingress), as well as several common functions, a global constant `config`
|
|
|
|
(set from userspace) and map definitions. The tc program `pping_egress()`
|
|
|
|
parses outgoing packets for identifiers. If an identifier is found and the
|
|
|
|
sampling strategy allows it, a timestamp for the packet is created in
|
|
|
|
`packet_ts`. The XDP program `pping_ingress()` parses incomming packets for an
|
|
|
|
identifier. If found, it looks up the `packet_ts` map for a match on the
|
|
|
|
reverse flow (to match source/dest on egress). If there is a match, it
|
|
|
|
calculates the RTT from the stored timestamp and deletes the entry. The
|
|
|
|
calculated RTT (together with the flow-tuple) is pushed to the perf-buffer
|
|
|
|
`rtt_events`.
|
|
|
|
- **bpf_egress_loader.sh:** A shell script that's used by `pping.c` to setup a
|
|
|
|
clsact qdisc and attach the `pping_egress()` program to egress using
|
2021-03-26 16:57:48 +01:00
|
|
|
tc. **Note**: Unless your iproute2 comes with libbpf support, tc will use
|
|
|
|
iproute's own loading mechanism when loading and attaching object files
|
|
|
|
directly through the tc command line. To ensure that libbpf is always used to
|
2021-04-22 18:06:09 +02:00
|
|
|
load `pping_egress()`, `pping.c` actually loads the program and pins it to
|
2021-03-26 16:57:48 +01:00
|
|
|
`/sys/fs/bpf/pping/classifier`, and tc only attaches the pinned program.
|
2021-04-22 18:06:09 +02:00
|
|
|
- **functions.sh and parameters.sh:** Imported by `bpf_egress_loader.sh`.
|
|
|
|
- **pping.h:** Common header file included by `pping.c` and
|
|
|
|
`pping_kern.c`. Contains some common structs used by both (are part of the
|
|
|
|
maps).
|
2021-03-26 16:57:48 +01:00
|
|
|
|
|
|
|
### BPF Maps:
|
2021-04-22 18:06:09 +02:00
|
|
|
- **flow_state:** A hash-map storing some basic state for each flow, such as the
|
2021-03-26 16:57:48 +01:00
|
|
|
last seen identifier for the flow and when the last timestamp entry for the
|
2021-04-22 18:06:09 +02:00
|
|
|
flow was created. Entries are created by `pping_egress()`, and can be updated
|
|
|
|
or deleted by both `pping_egress()` and `pping_ingress()`. Leftover entries
|
2021-03-26 17:54:42 +01:00
|
|
|
are eventually removed by `pping.c`. Pinned at `/sys/fs/bpf/pping`.
|
2021-04-22 18:06:09 +02:00
|
|
|
- **packet_ts:** A hash-map storing a timestamp for a specific packet
|
|
|
|
identifier. Entries are created by `pping_egress()` and removed by
|
|
|
|
`pping_ingress()` if a match is found. Leftover entries are eventually
|
|
|
|
removed by `pping.c`. Pinned at `/sys/fs/bpf/pping`.
|
|
|
|
- **rtt_events:** A perf-buffer used by `pping_ingress()` to push calculated RTTs
|
2021-03-26 16:57:48 +01:00
|
|
|
to `pping.c`, which continuously polls the map the print out the RTTs.
|
|
|
|
|
|
|
|
## Similar projects
|
|
|
|
Passively measuring the RTT for TCP traffic is not a novel concept, and there
|
2021-03-26 17:54:42 +01:00
|
|
|
exists a number of other tools that can do so. A good overview of how passive
|
|
|
|
RTT calculation using TCP timestamps (as in this project) works is provided in
|
|
|
|
[this paper](https://doi.org/10.1145/2523426.2539132) from 2013.
|
2021-03-26 16:57:48 +01:00
|
|
|
|
|
|
|
- [pping](https://github.com/pollere/pping): This project is largely a
|
2021-03-26 17:54:42 +01:00
|
|
|
re-implementation of Kathie's pping, but by using BPF and XDP as well as
|
2021-03-26 16:57:48 +01:00
|
|
|
implementing some filtering logic the hope is to be able to create a always-on
|
|
|
|
tool that can scale well even to large amounts of massive flows.
|
|
|
|
- [ppviz](https://github.com/pollere/ppviz): Web-based visualization tool for
|
|
|
|
the "machine-friendly" output from Kathie's pping tool. If/when we implement a
|
|
|
|
similar machine readable output option it should hopefully work with this
|
|
|
|
implementation as well.
|
|
|
|
- [tcptrace](https://github.com/blitz/tcptrace): A post-processing tool which
|
|
|
|
can analyze a tcpdump file and among other things calculate RTTs based on
|
|
|
|
seq/ACK numbers (`-r` or `-R` flag).
|
|
|
|
- **Dapper**: A passive TCP data plane monitoring tool implemented in P4 which
|
|
|
|
can among other things calculate the RTT based on the matching seq/ACK
|
|
|
|
numbers. [Paper](https://doi.org/10.1145/3050220.3050228). [Unofficial
|
|
|
|
source](https://github.com/muhe1991/p4-programs-survey/tree/master/dapper).
|
|
|
|
- [P4 Tofino TCP RTT measurement](https://github.com/Princeton-Cabernet/p4-projects/tree/master/RTT-tofino):
|
|
|
|
A passive TCP RTT monitor based on seq/ACK numbers implemented in P4 for
|
|
|
|
Tofino programmable switches. [Paper](https://doi.org/10.1145/3405669.3405823).
|