2020-12-11 19:25:35 +01:00
|
|
|
# PPing using XDP and TC-BPF
|
2021-02-25 11:16:38 +01:00
|
|
|
A re-implementation of [Kathie Nichols' passive ping
|
2021-03-26 16:57:48 +01:00
|
|
|
(pping)](https://github.com/pollere/pping) utility using XDP (on ingress) and
|
|
|
|
TC-BPF (on egress) for the packet capture logic.
|
2021-01-26 18:34:23 +01:00
|
|
|
|
|
|
|
## Simple description
|
2021-03-26 16:57:48 +01:00
|
|
|
Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It
|
|
|
|
can be used on endhosts as well as any (BPF-capable Linux) device which can see
|
2021-12-08 10:13:50 +01:00
|
|
|
both directions of the traffic (ex router or middlebox). Currently it works for
|
|
|
|
TCP traffic which uses the TCP timestamp option and ICMP echo messages, but
|
|
|
|
could be extended to also work with for example TCP seq/ACK numbers, the QUIC
|
|
|
|
spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
|
|
|
|
features (which may or may not ever get implemented).
|
2021-01-26 18:34:23 +01:00
|
|
|
|
2021-03-26 16:57:48 +01:00
|
|
|
The fundamental logic of pping is to timestamp a pseudo-unique identifier for
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
packets, and then look for matches in the reply packets. If a match is found,
|
|
|
|
the RTT is simply calculated as the time difference between the current time and
|
|
|
|
the stored timestamp.
|
2021-01-26 18:34:23 +01:00
|
|
|
|
2021-03-26 17:54:42 +01:00
|
|
|
This tool, just as Kathie's original pping implementation, uses TCP timestamps
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
as identifiers for TCP traffic. The TSval (which is a timestamp in and off
|
|
|
|
itself) is used as an identifier and timestamped. Reply packets in the reverse
|
|
|
|
flow are then parsed for the TSecr, which are the echoed TSval values from the
|
|
|
|
receiver. The TCP timestamps are not necessarily unique for every packet (they
|
|
|
|
have a limited update frequency, appears to be 1000 Hz for modern Linux
|
|
|
|
systems), so only the first instance of an identifier is timestamped, and
|
|
|
|
matched against the first incoming packet with a matching reply identifier. The
|
|
|
|
mechanism to ensure only the first packet is timestamped and matched differs
|
|
|
|
from the one in Kathie's pping, and is further described in
|
|
|
|
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
|
2021-12-08 10:13:50 +01:00
|
|
|
|
|
|
|
For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
|
|
|
|
number as identifer to match against. Linux systems will typically use different
|
|
|
|
echo identifers for different instances of ping, and thus each ping instance
|
|
|
|
will be recongnized as a separate flow. Windows systems typically use a static
|
|
|
|
echo identifer, and thus all instaces of ping originating from a particular
|
|
|
|
Windows host and the same target host will be considered a single flow.
|
2020-12-17 18:10:50 +01:00
|
|
|
|
2021-06-22 18:10:21 +02:00
|
|
|
## Output formats
|
|
|
|
pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In
|
|
|
|
general, the output consists of two different types of events, flow-events which
|
|
|
|
gives information that a flow has started/ended, and RTT-events which provides
|
|
|
|
information on a computed RTT within a flow.
|
|
|
|
|
|
|
|
### Standard format
|
|
|
|
The standard format is quite similar to the Kathie's pping default output, and
|
|
|
|
is generally intended to be an easily understood human-readable format writing a
|
|
|
|
single line per event.
|
|
|
|
|
|
|
|
An example of the format is provided below:
|
|
|
|
```shell
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from dest
|
2021-12-08 10:13:50 +01:00
|
|
|
16:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
|
|
|
16:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
|
|
|
16:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
|
|
|
16:00:49.152489316 6.017589 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
|
|
|
16:00:49.878508114 TCP 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
|
2021-06-22 18:10:21 +02:00
|
|
|
```
|
|
|
|
|
|
|
|
### ppviz format
|
|
|
|
The ppviz format is primarily intended to be used to generate data that can be
|
|
|
|
visualized by Kathie's [ppviz](https://github.com/pollere/ppviz) tool. The
|
|
|
|
format is essentially a CSV format, using a single space as the separator, and
|
|
|
|
is further described [here](http://www.pollere.net/ppviz.html).
|
|
|
|
|
|
|
|
Note that the optional *FBytes*, *DBytes* and *PBytes* from the format
|
|
|
|
specification have not been included here, and do not appear to be used by
|
|
|
|
ppviz. Furthermore, flow events are not included in the output, as the those are
|
|
|
|
not used by ppviz.
|
|
|
|
|
|
|
|
An example of the format is provided below:
|
|
|
|
```shell
|
|
|
|
1623420121.483727575 0.005298909 0.005298909 10.11.1.1:5201+10.11.1.2:59532
|
|
|
|
1623420122.484530934 0.006016639 0.005298909 10.11.1.1:5201+10.11.1.2:59532
|
|
|
|
1623420123.485899736 0.005590783 0.005298909 10.11.1.1:5201+10.11.1.2:59532
|
|
|
|
1623420124.490584753 0.006123511 0.005298909 10.11.1.1:5201+10.11.1.2:59532
|
|
|
|
1623420125.492190751 0.005624835 0.005298909 10.11.1.1:5201+10.11.1.2:59532
|
|
|
|
```
|
|
|
|
### JSON format
|
|
|
|
The JSON format is primarily intended to be machine-readable, and thus uses no
|
|
|
|
spacing or newlines between entries to reduce the overhead. External tools such
|
|
|
|
as [jq](https://stedolan.github.io/jq/) can be used to pretty-print the format.
|
|
|
|
|
|
|
|
The format consists of an array at the root-level, and each flow or RTT even is
|
|
|
|
added as an object to the root-array. The events contain some additional fields
|
|
|
|
in the JSON format which is not displayed by the other formats. All times
|
|
|
|
(*timestamp*, *rtt* and *min_rtt*) are provided as integers in nanoseconds.
|
|
|
|
|
|
|
|
An example of a (pretty-printed) flow-event is provided below:
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"timestamp": 1623420837244545000,
|
|
|
|
"src_ip": "10.11.1.1",
|
|
|
|
"src_port": 5201,
|
|
|
|
"dest_ip": "10.11.1.2",
|
|
|
|
"dest_port": 59572,
|
|
|
|
"protocol": "TCP",
|
|
|
|
"flow_event": "opening",
|
|
|
|
"reason": "SYN-ACK",
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
"triggered_by": "dest"
|
2021-06-22 18:10:21 +02:00
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
An example of a (pretty-printed) RTT-even is provided below:
|
|
|
|
```json
|
|
|
|
{
|
|
|
|
"timestamp": 1623420838254558500,
|
|
|
|
"src_ip": "10.11.1.1",
|
|
|
|
"src_port": 5201,
|
|
|
|
"dest_ip": "10.11.1.2",
|
|
|
|
"dest_port": 59572,
|
|
|
|
"protocol": "TCP",
|
|
|
|
"rtt": 5977708,
|
|
|
|
"min_rtt": 5441848,
|
|
|
|
"sent_packets": 9393,
|
|
|
|
"sent_bytes": 492457296,
|
|
|
|
"rec_packets": 5922,
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
"rec_bytes": 37,
|
|
|
|
"match_on_egress": false
|
2021-06-22 18:10:21 +02:00
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2021-03-26 16:57:48 +01:00
|
|
|
## Design and technical description
|
2020-12-17 18:10:50 +01:00
|
|
|

|
2021-03-26 16:57:48 +01:00
|
|
|
|
|
|
|
### Files:
|
2021-04-22 18:06:09 +02:00
|
|
|
- **pping.c:** Userspace program that loads and attaches the BPF programs, pulls
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
the perf-buffer `events` to print out RTT messages and periodically cleans
|
2021-03-26 16:57:48 +01:00
|
|
|
up the hash-maps from old entries. Also passes user options to the BPF
|
|
|
|
programs by setting a "global variable" (stored in the programs .rodata
|
|
|
|
section).
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
- **pping_kern.c:** Contains the BPF programs that are loaded on egress (tc) and
|
|
|
|
ingress (XDP or tc), as well as several common functions, a global constant
|
|
|
|
`config` (set from userspace) and map definitions. Essentially the same pping
|
|
|
|
program is loaded on both ingress and egress. All packets are parsed for both
|
|
|
|
an identifier that can be used to create a timestamp entry `packet_ts`, and a
|
|
|
|
reply identifier that can be used to match the packet with a previously
|
|
|
|
timestamped one in the reverse flow. If a match is found, an RTT is calculated
|
|
|
|
and an RTT-event is pushed to userspace through the perf-buffer `events`. For
|
|
|
|
each packet with a valid identifier, the program also keeps track of and
|
|
|
|
updates the state flow and reverse flow, stored in the `flow_state` map.
|
2021-04-22 18:06:09 +02:00
|
|
|
- **pping.h:** Common header file included by `pping.c` and
|
|
|
|
`pping_kern.c`. Contains some common structs used by both (are part of the
|
|
|
|
maps).
|
2021-03-26 16:57:48 +01:00
|
|
|
|
|
|
|
### BPF Maps:
|
2021-04-22 18:06:09 +02:00
|
|
|
- **flow_state:** A hash-map storing some basic state for each flow, such as the
|
2021-03-26 16:57:48 +01:00
|
|
|
last seen identifier for the flow and when the last timestamp entry for the
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
flow was created. Entries are created, updated and deleted by the BPF pping
|
|
|
|
programs. Leftover entries are eventually removed by userspace (`pping.c`).
|
2021-04-22 18:06:09 +02:00
|
|
|
- **packet_ts:** A hash-map storing a timestamp for a specific packet
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
identifier. Entries are created by the BPF pping program if a valid identifier
|
|
|
|
is found, and removed if a match is found. Leftover entries are eventually
|
|
|
|
removed by userspace (`pping.c`).
|
2021-06-22 18:10:21 +02:00
|
|
|
- **events:** A perf-buffer used by the BPF programs to push flow or RTT events
|
|
|
|
to `pping.c`, which continuously polls the map the prints them out.
|
2021-03-26 16:57:48 +01:00
|
|
|
|
2021-05-06 17:54:31 +02:00
|
|
|
### A note on concurrency
|
|
|
|
The program uses "global" (not `PERCPU`) hash maps to keep state. As the BPF
|
|
|
|
programs need to see the global view to function properly, using `PERCPU` maps
|
|
|
|
is not an option. The program must be able to match against stored packet
|
|
|
|
timestamps regardless of the CPU the packets are processed on, and must also
|
|
|
|
have a global view of the flow state in order for the sampling to work
|
|
|
|
correctly.
|
|
|
|
|
|
|
|
As the BPF programs may run concurrently on different CPU cores accessing these
|
|
|
|
global hash maps, this may result in some concurrency issues. In practice, I do
|
|
|
|
not believe these will occur particularly often, as I'm under the impression
|
|
|
|
that packets from the same flow will typically be processed by the some
|
|
|
|
CPU. Furthermore, most of the concurrency issues will not be that problematic
|
|
|
|
even if they do occur. For now, I've therefore left these concurrency issues
|
|
|
|
unattended, even if some of them could be avoided with atomic operations and/or
|
|
|
|
spinlocks, in order to keep things simple and not hurt performance.
|
|
|
|
|
|
|
|
The (known) potential concurrency issues are:
|
|
|
|
|
|
|
|
#### Tracking last seen identifier
|
|
|
|
The tc/egress program keeps track of the last seen outgoing identifier for each
|
|
|
|
flow, by storing it in the `flow_state` map. This is done to detect the first
|
|
|
|
packet with a new identifier. If multiple packets are processed concurrently,
|
|
|
|
several of them could potentially detect themselves as being first with the same
|
|
|
|
identifier (which only matters if they also pass rate-limit check as well),
|
|
|
|
alternatively if the concurrent packets have different identifiers there may be
|
|
|
|
a lost update (but for TCP timestamps, concurrent packets would typically be
|
|
|
|
expected to have the same timestamp).
|
|
|
|
|
|
|
|
A possibly more severe issue is out-of-order packets. If a packet with an old
|
|
|
|
identifier arrives out of order, that identifier could be detected as a new
|
|
|
|
identifier. If for example the following flow of four packets with just two
|
|
|
|
different identifiers (id1 and id2) were to occur:
|
|
|
|
|
|
|
|
id1 -> id2 -> id1 -> id2
|
|
|
|
|
|
|
|
Then the tc/egress program would consider each of these packets to have new
|
|
|
|
identifiers and try to create a new timestamp for each of them if the sampling
|
|
|
|
strategy allows it. However even if the sampling strategy allows it, the
|
|
|
|
(incorrect) creation of timestamps for id1 and id2 the second time would only be
|
|
|
|
successful in case the first timestamps for id1 and id2 have already been
|
|
|
|
matched against (and thus deleted). Even if that is the case, they would only
|
|
|
|
result in reporting an incorrect RTT in case there are also new matches against
|
|
|
|
these identifiers.
|
|
|
|
|
|
|
|
This issue could be avoided entirely by requiring that new-id > old-id instead
|
|
|
|
of simply checking that new-id != old-id, as TCP timestamps should monotonically
|
2021-12-08 10:13:50 +01:00
|
|
|
increase. That may however not be a suitable solution for other types of
|
|
|
|
identifiers.
|
2021-05-06 17:54:31 +02:00
|
|
|
|
|
|
|
#### Rate-limiting new timestamps
|
|
|
|
In the tc/egress program packets to timestamp are sampled by using a per-flow
|
|
|
|
rate-limit, which is enforced by storing when the last timestamp was created in
|
|
|
|
the `flow_state` map. If multiple packets perform this check concurrently, it's
|
|
|
|
possible that multiple packets think they are allowed to create timestamps
|
|
|
|
before any of them are able to update the `last_timestamp`. When they update
|
|
|
|
`last_timestamp` it might also be slightly incorrect, however if they are
|
|
|
|
processed concurrently then they should also generate very similar timestamps.
|
|
|
|
|
|
|
|
If the packets have different identifiers, (which would typically not be
|
|
|
|
expected for concurrent TCP timestamps), then this would allow some packets to
|
|
|
|
bypass the rate-limit. By bypassing the rate-limit, the flow would use up some
|
|
|
|
additional map space and report some additional RTT(s) more than expected
|
|
|
|
(however the reported RTTs should still be correct).
|
|
|
|
|
|
|
|
If the packets have the same identifier, they must first have managed to bypass
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
the previous check for unique identifiers (see [previous
|
|
|
|
point](#tracking-last-seen-identifier)), and only one of them will be able to
|
|
|
|
successfully store a timestamp entry.
|
2021-05-06 17:54:31 +02:00
|
|
|
|
|
|
|
#### Matching against stored timestamps
|
|
|
|
The XDP/ingress program could potentially match multiple concurrent packets with
|
|
|
|
the same identifier against a single timestamp entry in `packet_ts`, before any
|
|
|
|
of them manage to delete the timestamp entry. This would result in multiple RTTs
|
|
|
|
being reported for the same identifier, but if they are processed concurrently
|
|
|
|
these RTTs should be very similar, so would mainly result in over-reporting
|
|
|
|
rather than reporting incorrect RTTs.
|
|
|
|
|
2021-05-07 14:54:12 +02:00
|
|
|
#### Updating flow statistics
|
|
|
|
Both the tc/egress and XDP/ingress programs will try to update some flow
|
|
|
|
statistics each time they successfully parse a packet with an
|
|
|
|
identifier. Specifically, they'll update the number of packets and bytes
|
|
|
|
sent/received. This is not done in an atomic fashion, so there could potentially
|
|
|
|
be some lost updates resulting an underestimate.
|
|
|
|
|
|
|
|
Furthermore, whenever the XDP/ingress program calculates an RTT, it will check
|
|
|
|
if this is the lowest RTT seen so far for the flow. If multiple RTTs are
|
|
|
|
calculated concurrently, then several could pass this check concurrently and
|
|
|
|
there may be a lost update. It should only be possible for multiple RTTs to be
|
|
|
|
calculated concurrently in case either the [timestamp rate-limit was
|
pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
|
|
|
bypassed](#rate-limiting-new-timestamps) or [multiple packets managed to match
|
|
|
|
against the same timestamp](#matching-against-stored-timestamps).
|
2021-05-06 17:54:31 +02:00
|
|
|
|
|
|
|
It's worth noting that with sampling the reported minimum-RTT is only an
|
|
|
|
estimate anyways (may never calculate RTT for packet with the true minimum
|
|
|
|
RTT). And even without sampling there is some inherent sampling due to TCP
|
|
|
|
timestamps only being updated at a limited rate (1000 Hz).
|
|
|
|
|
2021-03-26 16:57:48 +01:00
|
|
|
## Similar projects
|
|
|
|
Passively measuring the RTT for TCP traffic is not a novel concept, and there
|
2021-03-26 17:54:42 +01:00
|
|
|
exists a number of other tools that can do so. A good overview of how passive
|
|
|
|
RTT calculation using TCP timestamps (as in this project) works is provided in
|
|
|
|
[this paper](https://doi.org/10.1145/2523426.2539132) from 2013.
|
2021-03-26 16:57:48 +01:00
|
|
|
|
|
|
|
- [pping](https://github.com/pollere/pping): This project is largely a
|
2021-03-26 17:54:42 +01:00
|
|
|
re-implementation of Kathie's pping, but by using BPF and XDP as well as
|
2021-03-26 16:57:48 +01:00
|
|
|
implementing some filtering logic the hope is to be able to create a always-on
|
|
|
|
tool that can scale well even to large amounts of massive flows.
|
|
|
|
- [ppviz](https://github.com/pollere/ppviz): Web-based visualization tool for
|
2021-05-06 17:54:31 +02:00
|
|
|
the "machine-friendly" (-m) output from Kathie's pping tool. Running this
|
|
|
|
implementation of pping with --format="ppviz" will generate output that can be
|
|
|
|
used by ppviz.
|
2021-03-26 16:57:48 +01:00
|
|
|
- [tcptrace](https://github.com/blitz/tcptrace): A post-processing tool which
|
|
|
|
can analyze a tcpdump file and among other things calculate RTTs based on
|
|
|
|
seq/ACK numbers (`-r` or `-R` flag).
|
|
|
|
- **Dapper**: A passive TCP data plane monitoring tool implemented in P4 which
|
|
|
|
can among other things calculate the RTT based on the matching seq/ACK
|
|
|
|
numbers. [Paper](https://doi.org/10.1145/3050220.3050228). [Unofficial
|
|
|
|
source](https://github.com/muhe1991/p4-programs-survey/tree/master/dapper).
|
|
|
|
- [P4 Tofino TCP RTT measurement](https://github.com/Princeton-Cabernet/p4-projects/tree/master/RTT-tofino):
|
|
|
|
A passive TCP RTT monitor based on seq/ACK numbers implemented in P4 for
|
|
|
|
Tofino programmable switches. [Paper](https://doi.org/10.1145/3405669.3405823).
|