mirror of
https://github.com/xdp-project/bpf-examples.git
synced 2024-05-06 15:54:53 +00:00
pping: Add support for ICMP echo messages
Allow pping to passivly monitor RTT for ICMP echo request/reply flows. Use the echo identifier as ports, and echo sequence as packet identifier. Additionally, add protocol to standard output format in order to be able to distinguish between TCP and ICMP flows. The ppviz format does not include protocol, making it impossible to distinguish between TCP and ICMP traffic. Will add warning if ppviz format is used together with ICMP traffic in the future. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
This commit is contained in:
@@ -6,11 +6,11 @@ TC-BPF (on egress) for the packet capture logic.
|
||||
## Simple description
|
||||
Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It
|
||||
can be used on endhosts as well as any (BPF-capable Linux) device which can see
|
||||
both directions of the traffic (ex router or middlebox). Currently it only works
|
||||
for TCP traffic which uses the TCP timestamp option, but could be extended to
|
||||
also work with for example TCP seq/ACK numbers, the QUIC spinbit and ICMP
|
||||
echo-reply messages. See the [TODO-list](./TODO.md) for more potential features
|
||||
(which may or may not ever get implemented).
|
||||
both directions of the traffic (ex router or middlebox). Currently it works for
|
||||
TCP traffic which uses the TCP timestamp option and ICMP echo messages, but
|
||||
could be extended to also work with for example TCP seq/ACK numbers, the QUIC
|
||||
spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
|
||||
features (which may or may not ever get implemented).
|
||||
|
||||
The fundamental logic of pping is to timestamp a pseudo-unique identifier for
|
||||
outgoing packets, and then look for matches in the incoming packets. If a match
|
||||
@@ -18,15 +18,22 @@ is found, the RTT is simply calculated as the time difference between the
|
||||
current time and the stored timestamp.
|
||||
|
||||
This tool, just as Kathie's original pping implementation, uses TCP timestamps
|
||||
as identifiers. For outgoing packets, the TSval (which is a timestamp in and off
|
||||
itself) is timestamped. Incoming packets are then parsed for the TSecr, which
|
||||
are the echoed TSval values from the receiver. The TCP timestamps are not
|
||||
necessarily unique for every packet (they have a limited update frequency,
|
||||
appears to be 1000 Hz for modern Linux systems), so only the first instance of
|
||||
an identifier is timestamped, and matched against the first incoming packet with
|
||||
the identifier. The mechanism to ensure only the first packet is timestamped and
|
||||
matched differs from the one in Kathie's pping, and is further described in
|
||||
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
|
||||
as identifiers for TCP traffic. For outgoing packets, the TSval (which is a
|
||||
timestamp in and off itself) is timestamped. Incoming packets are then parsed
|
||||
for the TSecr, which are the echoed TSval values from the receiver. The TCP
|
||||
timestamps are not necessarily unique for every packet (they have a limited
|
||||
update frequency, appears to be 1000 Hz for modern Linux systems), so only the
|
||||
first instance of an identifier is timestamped, and matched against the first
|
||||
incoming packet with the identifier. The mechanism to ensure only the first
|
||||
packet is timestamped and matched differs from the one in Kathie's pping, and is
|
||||
further described in [SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
|
||||
|
||||
For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
|
||||
number as identifer to match against. Linux systems will typically use different
|
||||
echo identifers for different instances of ping, and thus each ping instance
|
||||
will be recongnized as a separate flow. Windows systems typically use a static
|
||||
echo identifer, and thus all instaces of ping originating from a particular
|
||||
Windows host and the same target host will be considered a single flow.
|
||||
|
||||
## Output formats
|
||||
pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In
|
||||
@@ -41,12 +48,12 @@ single line per event.
|
||||
|
||||
An example of the format is provided below:
|
||||
```shell
|
||||
16:00:46.142279766 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
|
||||
16:00:46.147705205 5.425439 ms 5.425439 ms 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:47.148905125 5.261430 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:48.151666385 5.972284 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:49.152489316 6.017589 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:49.878508114 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
|
||||
16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
|
||||
16:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:49.152489316 6.017589 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
|
||||
16:00:49.878508114 TCP 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
|
||||
```
|
||||
|
||||
### ppviz format
|
||||
@@ -196,8 +203,8 @@ these identifiers.
|
||||
|
||||
This issue could be avoided entirely by requiring that new-id > old-id instead
|
||||
of simply checking that new-id != old-id, as TCP timestamps should monotonically
|
||||
increase. That may however not be a suitable solution if/when we add support for
|
||||
other types of identifiers.
|
||||
increase. That may however not be a suitable solution for other types of
|
||||
identifiers.
|
||||
|
||||
#### Rate-limiting new timestamps
|
||||
In the tc/egress program packets to timestamp are sampled by using a per-flow
|
||||
|
@@ -14,14 +14,15 @@
|
||||
- If one only considers SEQ/ACK (and don't check for SACK
|
||||
options), could result in ex. delay from retransmission being
|
||||
included in RTT
|
||||
- [ ] ICMP (ex Echo/Reply)
|
||||
- [x] ICMP (ex Echo/Reply)
|
||||
- [ ] QUIC (based on spinbit)
|
||||
- [ ] DNS queries
|
||||
|
||||
## General pping
|
||||
- [x] Add sampling so that RTT is not calculated for every packet
|
||||
(with unique value) for large flows
|
||||
- [ ] Allow short bursts to bypass sampling in order to handle
|
||||
delayed ACKs
|
||||
delayed ACKs, reordered or lost packets etc.
|
||||
- [x] Keep some per-flow state
|
||||
- Will likely be needed for the sampling
|
||||
- [ ] Could potentially include keeping track of average RTT, which
|
||||
|
@@ -1,6 +1,6 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0-or-later */
|
||||
static const char *__doc__ =
|
||||
"Passive Ping - monitor flow RTT based on TCP timestamps";
|
||||
"Passive Ping - monitor flow RTT based on header inspection";
|
||||
|
||||
#include <bpf/bpf.h>
|
||||
#include <bpf/libbpf.h>
|
||||
@@ -51,16 +51,16 @@ enum PPING_OUTPUT_FORMAT {
|
||||
};
|
||||
|
||||
/*
|
||||
* BPF implementation of pping using libbpf
|
||||
* Uses TC-BPF for egress and XDP for ingress
|
||||
* - On egrees, packets are parsed for TCP TSval,
|
||||
* if found added to hashmap using flow+TSval as key,
|
||||
* and current time as value
|
||||
* - On ingress, packets are parsed for TCP TSecr,
|
||||
* if found looksup hashmap using reverse-flow+TSecr as key,
|
||||
* and calculates RTT as different between now map value
|
||||
* BPF implementation of pping using libbpf.
|
||||
* Uses TC-BPF for egress and XDP for ingress.
|
||||
* - On egrees, packets are parsed for an identifer,
|
||||
* if found added to hashmap using flow+identifier as key,
|
||||
* and current time as value.
|
||||
* - On ingress, packets are parsed for reply identifer,
|
||||
* if found looksup hashmap using reverse-flow+identifier as key,
|
||||
* and calculates RTT as different between now and stored timestamp.
|
||||
* - Calculated RTTs are pushed to userspace
|
||||
* (together with the related flow) and printed out
|
||||
* (together with the related flow) and printed out.
|
||||
*/
|
||||
|
||||
// Structure to contain arguments for clean_map (for passing to pthread_create)
|
||||
@@ -678,16 +678,17 @@ static void print_event_standard(void *ctx, int cpu, void *data,
|
||||
|
||||
if (e->event_type == EVENT_TYPE_RTT) {
|
||||
print_ns_datetime(stdout, e->rtt_event.timestamp);
|
||||
printf(" %llu.%06llu ms %llu.%06llu ms ",
|
||||
printf(" %llu.%06llu ms %llu.%06llu ms %s ",
|
||||
e->rtt_event.rtt / NS_PER_MS,
|
||||
e->rtt_event.rtt % NS_PER_MS,
|
||||
e->rtt_event.min_rtt / NS_PER_MS,
|
||||
e->rtt_event.min_rtt % NS_PER_MS);
|
||||
e->rtt_event.min_rtt % NS_PER_MS,
|
||||
proto_to_str(e->rtt_event.flow.proto));
|
||||
print_flow_ppvizformat(stdout, &e->rtt_event.flow);
|
||||
printf("\n");
|
||||
} else if (e->event_type == EVENT_TYPE_FLOW) {
|
||||
print_ns_datetime(stdout, e->flow_event.timestamp);
|
||||
printf(" ");
|
||||
printf(" %s ", proto_to_str(e->rtt_event.flow.proto));
|
||||
print_flow_ppvizformat(stdout, &e->flow_event.flow);
|
||||
printf(" %s due to %s from %s\n",
|
||||
flowevent_to_str(e->flow_event.event_info.event),
|
||||
@@ -701,6 +702,7 @@ static void print_event_ppviz(void *ctx, int cpu, void *data, __u32 data_size)
|
||||
const struct rtt_event *e = data;
|
||||
__u64 time = convert_monotonic_to_realtime(e->timestamp);
|
||||
|
||||
// ppviz format does not support flow events
|
||||
if (e->event_type != EVENT_TYPE_RTT)
|
||||
return;
|
||||
|
||||
|
@@ -8,6 +8,8 @@
|
||||
#include <linux/ip.h>
|
||||
#include <linux/ipv6.h>
|
||||
#include <linux/tcp.h>
|
||||
#include <linux/icmp.h>
|
||||
#include <linux/icmpv6.h>
|
||||
#include <stdbool.h>
|
||||
|
||||
// overwrite xdp/parsing_helpers.h value to avoid hitting verifier limit
|
||||
@@ -182,6 +184,64 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Attemps to fetch an identifier for an ICMPv6 header, based on the echo
|
||||
* request/reply sequence number.
|
||||
* If successful, identifer will be set to the echo sequence number, both
|
||||
* sport and dport will be set to the echo identifier, and 0 will be returned.
|
||||
* On failure, -1 will be returned.
|
||||
* Note: Will store the 16-bit echo sequence number in network byte order in
|
||||
* the 32-bit identifier.
|
||||
*/
|
||||
static int parse_icmp6_identifier(struct parsing_context *ctx, __u16 *sport,
|
||||
__u16 *dport, struct flow_event_info *fei,
|
||||
__u32 *identifier)
|
||||
{
|
||||
struct icmp6hdr *icmp6h;
|
||||
|
||||
if (parse_icmp6hdr(&ctx->nh, ctx->data_end, &icmp6h) < 0)
|
||||
return -1;
|
||||
|
||||
if (ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REQUEST)
|
||||
return -1;
|
||||
if (!ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REPLY)
|
||||
return -1;
|
||||
if (icmp6h->icmp6_code != 0)
|
||||
return -1;
|
||||
|
||||
fei->event = FLOW_EVENT_NONE;
|
||||
*sport = icmp6h->icmp6_identifier;
|
||||
*dport = *sport;
|
||||
*identifier = icmp6h->icmp6_sequence;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Same as parse_icmp6_identifier, but for an ICMP(v4) header instead.
|
||||
*/
|
||||
static int parse_icmp_identifier(struct parsing_context *ctx, __u16 *sport,
|
||||
__u16 *dport, struct flow_event_info *fei,
|
||||
__u32 *identifier)
|
||||
{
|
||||
struct icmphdr *icmph;
|
||||
|
||||
if (parse_icmphdr(&ctx->nh, ctx->data_end, &icmph) < 0)
|
||||
return -1;
|
||||
|
||||
if (ctx->is_egress && icmph->type != ICMP_ECHO)
|
||||
return -1;
|
||||
if (!ctx->is_egress && icmph->type != ICMP_ECHOREPLY)
|
||||
return -1;
|
||||
if (icmph->code != 0)
|
||||
return -1;
|
||||
|
||||
fei->event = FLOW_EVENT_NONE;
|
||||
*sport = icmph->un.echo.id;
|
||||
*dport = *sport;
|
||||
*identifier = icmph->un.echo.sequence;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Attempts to parse the packet limited by the data and data_end pointers,
|
||||
* to retrieve a protocol dependent packet identifier. If sucessful, the
|
||||
@@ -225,15 +285,21 @@ static int parse_packet_identifier(struct parsing_context *ctx,
|
||||
return -1;
|
||||
}
|
||||
|
||||
// Add new protocols here
|
||||
if (p_id->flow.proto == IPPROTO_TCP) {
|
||||
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port,
|
||||
// Parse identifer from suitable protocol
|
||||
if (p_id->flow.proto == IPPROTO_TCP)
|
||||
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port, fei,
|
||||
&p_id->identifier);
|
||||
else if (p_id->flow.proto == IPPROTO_ICMPV6 &&
|
||||
p_id->flow.ipv == AF_INET6)
|
||||
err = parse_icmp6_identifier(ctx, &saddr->port, &daddr->port,
|
||||
fei, &p_id->identifier);
|
||||
else if (p_id->flow.proto == IPPROTO_ICMP && p_id->flow.ipv == AF_INET)
|
||||
err = parse_icmp_identifier(ctx, &saddr->port, &daddr->port,
|
||||
fei, &p_id->identifier);
|
||||
else
|
||||
return -1; // No matching protocol
|
||||
if (err)
|
||||
return -1;
|
||||
} else {
|
||||
return -1;
|
||||
}
|
||||
return -1; // Failed parsing protocol
|
||||
|
||||
// Sucessfully parsed packet identifier - fill in IP-addresses and return
|
||||
if (p_id->flow.ipv == AF_INET) {
|
||||
@@ -267,7 +333,7 @@ static void fill_flow_event(struct flow_event *fe, __u64 timestamp,
|
||||
{
|
||||
fe->event_type = EVENT_TYPE_FLOW;
|
||||
fe->timestamp = timestamp;
|
||||
__builtin_memcpy(&fe->flow, flow, sizeof(struct network_tuple));
|
||||
fe->flow = *flow;
|
||||
fe->source = source;
|
||||
fe->reserved = 0; // Make sure it's initilized
|
||||
}
|
||||
|
Reference in New Issue
Block a user