pping: Add support for ICMP echo messages

Allow pping to passivly monitor RTT for ICMP echo request/reply
flows. Use the echo identifier as ports, and echo sequence as packet
identifier.

Additionally, add protocol to standard output format in order to be
able to distinguish between TCP and ICMP flows.

The ppviz format does not include protocol, making it impossible to
distinguish between TCP and ICMP traffic. Will add warning if ppviz
format is used together with ICMP traffic in the future.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
This commit is contained in:
Simon Sundberg
2021-12-08 10:13:50 +01:00
parent af5e660d8e
commit bd6ded5c21
4 changed files with 124 additions and 48 deletions

View File

@@ -6,11 +6,11 @@ TC-BPF (on egress) for the packet capture logic.
## Simple description ## Simple description
Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It
can be used on endhosts as well as any (BPF-capable Linux) device which can see can be used on endhosts as well as any (BPF-capable Linux) device which can see
both directions of the traffic (ex router or middlebox). Currently it only works both directions of the traffic (ex router or middlebox). Currently it works for
for TCP traffic which uses the TCP timestamp option, but could be extended to TCP traffic which uses the TCP timestamp option and ICMP echo messages, but
also work with for example TCP seq/ACK numbers, the QUIC spinbit and ICMP could be extended to also work with for example TCP seq/ACK numbers, the QUIC
echo-reply messages. See the [TODO-list](./TODO.md) for more potential features spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
(which may or may not ever get implemented). features (which may or may not ever get implemented).
The fundamental logic of pping is to timestamp a pseudo-unique identifier for The fundamental logic of pping is to timestamp a pseudo-unique identifier for
outgoing packets, and then look for matches in the incoming packets. If a match outgoing packets, and then look for matches in the incoming packets. If a match
@@ -18,15 +18,22 @@ is found, the RTT is simply calculated as the time difference between the
current time and the stored timestamp. current time and the stored timestamp.
This tool, just as Kathie's original pping implementation, uses TCP timestamps This tool, just as Kathie's original pping implementation, uses TCP timestamps
as identifiers. For outgoing packets, the TSval (which is a timestamp in and off as identifiers for TCP traffic. For outgoing packets, the TSval (which is a
itself) is timestamped. Incoming packets are then parsed for the TSecr, which timestamp in and off itself) is timestamped. Incoming packets are then parsed
are the echoed TSval values from the receiver. The TCP timestamps are not for the TSecr, which are the echoed TSval values from the receiver. The TCP
necessarily unique for every packet (they have a limited update frequency, timestamps are not necessarily unique for every packet (they have a limited
appears to be 1000 Hz for modern Linux systems), so only the first instance of update frequency, appears to be 1000 Hz for modern Linux systems), so only the
an identifier is timestamped, and matched against the first incoming packet with first instance of an identifier is timestamped, and matched against the first
the identifier. The mechanism to ensure only the first packet is timestamped and incoming packet with the identifier. The mechanism to ensure only the first
matched differs from the one in Kathie's pping, and is further described in packet is timestamped and matched differs from the one in Kathie's pping, and is
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md). further described in [SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
number as identifer to match against. Linux systems will typically use different
echo identifers for different instances of ping, and thus each ping instance
will be recongnized as a separate flow. Windows systems typically use a static
echo identifer, and thus all instaces of ping originating from a particular
Windows host and the same target host will be considered a single flow.
## Output formats ## Output formats
pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In
@@ -41,12 +48,12 @@ single line per event.
An example of the format is provided below: An example of the format is provided below:
```shell ```shell
16:00:46.142279766 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src 16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
16:00:46.147705205 5.425439 ms 5.425439 ms 10.11.1.1:5201+10.11.1.2:59528 16:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:47.148905125 5.261430 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528 16:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:48.151666385 5.972284 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528 16:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:49.152489316 6.017589 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528 16:00:49.152489316 6.017589 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:49.878508114 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest 16:00:49.878508114 TCP 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
``` ```
### ppviz format ### ppviz format
@@ -196,8 +203,8 @@ these identifiers.
This issue could be avoided entirely by requiring that new-id > old-id instead This issue could be avoided entirely by requiring that new-id > old-id instead
of simply checking that new-id != old-id, as TCP timestamps should monotonically of simply checking that new-id != old-id, as TCP timestamps should monotonically
increase. That may however not be a suitable solution if/when we add support for increase. That may however not be a suitable solution for other types of
other types of identifiers. identifiers.
#### Rate-limiting new timestamps #### Rate-limiting new timestamps
In the tc/egress program packets to timestamp are sampled by using a per-flow In the tc/egress program packets to timestamp are sampled by using a per-flow

View File

@@ -14,14 +14,15 @@
- If one only considers SEQ/ACK (and don't check for SACK - If one only considers SEQ/ACK (and don't check for SACK
options), could result in ex. delay from retransmission being options), could result in ex. delay from retransmission being
included in RTT included in RTT
- [ ] ICMP (ex Echo/Reply) - [x] ICMP (ex Echo/Reply)
- [ ] QUIC (based on spinbit) - [ ] QUIC (based on spinbit)
- [ ] DNS queries
## General pping ## General pping
- [x] Add sampling so that RTT is not calculated for every packet - [x] Add sampling so that RTT is not calculated for every packet
(with unique value) for large flows (with unique value) for large flows
- [ ] Allow short bursts to bypass sampling in order to handle - [ ] Allow short bursts to bypass sampling in order to handle
delayed ACKs delayed ACKs, reordered or lost packets etc.
- [x] Keep some per-flow state - [x] Keep some per-flow state
- Will likely be needed for the sampling - Will likely be needed for the sampling
- [ ] Could potentially include keeping track of average RTT, which - [ ] Could potentially include keeping track of average RTT, which

View File

@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */ /* SPDX-License-Identifier: GPL-2.0-or-later */
static const char *__doc__ = static const char *__doc__ =
"Passive Ping - monitor flow RTT based on TCP timestamps"; "Passive Ping - monitor flow RTT based on header inspection";
#include <bpf/bpf.h> #include <bpf/bpf.h>
#include <bpf/libbpf.h> #include <bpf/libbpf.h>
@@ -51,16 +51,16 @@ enum PPING_OUTPUT_FORMAT {
}; };
/* /*
* BPF implementation of pping using libbpf * BPF implementation of pping using libbpf.
* Uses TC-BPF for egress and XDP for ingress * Uses TC-BPF for egress and XDP for ingress.
* - On egrees, packets are parsed for TCP TSval, * - On egrees, packets are parsed for an identifer,
* if found added to hashmap using flow+TSval as key, * if found added to hashmap using flow+identifier as key,
* and current time as value * and current time as value.
* - On ingress, packets are parsed for TCP TSecr, * - On ingress, packets are parsed for reply identifer,
* if found looksup hashmap using reverse-flow+TSecr as key, * if found looksup hashmap using reverse-flow+identifier as key,
* and calculates RTT as different between now map value * and calculates RTT as different between now and stored timestamp.
* - Calculated RTTs are pushed to userspace * - Calculated RTTs are pushed to userspace
* (together with the related flow) and printed out * (together with the related flow) and printed out.
*/ */
// Structure to contain arguments for clean_map (for passing to pthread_create) // Structure to contain arguments for clean_map (for passing to pthread_create)
@@ -678,16 +678,17 @@ static void print_event_standard(void *ctx, int cpu, void *data,
if (e->event_type == EVENT_TYPE_RTT) { if (e->event_type == EVENT_TYPE_RTT) {
print_ns_datetime(stdout, e->rtt_event.timestamp); print_ns_datetime(stdout, e->rtt_event.timestamp);
printf(" %llu.%06llu ms %llu.%06llu ms ", printf(" %llu.%06llu ms %llu.%06llu ms %s ",
e->rtt_event.rtt / NS_PER_MS, e->rtt_event.rtt / NS_PER_MS,
e->rtt_event.rtt % NS_PER_MS, e->rtt_event.rtt % NS_PER_MS,
e->rtt_event.min_rtt / NS_PER_MS, e->rtt_event.min_rtt / NS_PER_MS,
e->rtt_event.min_rtt % NS_PER_MS); e->rtt_event.min_rtt % NS_PER_MS,
proto_to_str(e->rtt_event.flow.proto));
print_flow_ppvizformat(stdout, &e->rtt_event.flow); print_flow_ppvizformat(stdout, &e->rtt_event.flow);
printf("\n"); printf("\n");
} else if (e->event_type == EVENT_TYPE_FLOW) { } else if (e->event_type == EVENT_TYPE_FLOW) {
print_ns_datetime(stdout, e->flow_event.timestamp); print_ns_datetime(stdout, e->flow_event.timestamp);
printf(" "); printf(" %s ", proto_to_str(e->rtt_event.flow.proto));
print_flow_ppvizformat(stdout, &e->flow_event.flow); print_flow_ppvizformat(stdout, &e->flow_event.flow);
printf(" %s due to %s from %s\n", printf(" %s due to %s from %s\n",
flowevent_to_str(e->flow_event.event_info.event), flowevent_to_str(e->flow_event.event_info.event),
@@ -701,6 +702,7 @@ static void print_event_ppviz(void *ctx, int cpu, void *data, __u32 data_size)
const struct rtt_event *e = data; const struct rtt_event *e = data;
__u64 time = convert_monotonic_to_realtime(e->timestamp); __u64 time = convert_monotonic_to_realtime(e->timestamp);
// ppviz format does not support flow events
if (e->event_type != EVENT_TYPE_RTT) if (e->event_type != EVENT_TYPE_RTT)
return; return;

View File

@@ -8,6 +8,8 @@
#include <linux/ip.h> #include <linux/ip.h>
#include <linux/ipv6.h> #include <linux/ipv6.h>
#include <linux/tcp.h> #include <linux/tcp.h>
#include <linux/icmp.h>
#include <linux/icmpv6.h>
#include <stdbool.h> #include <stdbool.h>
// overwrite xdp/parsing_helpers.h value to avoid hitting verifier limit // overwrite xdp/parsing_helpers.h value to avoid hitting verifier limit
@@ -182,6 +184,64 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
return 0; return 0;
} }
/*
* Attemps to fetch an identifier for an ICMPv6 header, based on the echo
* request/reply sequence number.
* If successful, identifer will be set to the echo sequence number, both
* sport and dport will be set to the echo identifier, and 0 will be returned.
* On failure, -1 will be returned.
* Note: Will store the 16-bit echo sequence number in network byte order in
* the 32-bit identifier.
*/
static int parse_icmp6_identifier(struct parsing_context *ctx, __u16 *sport,
__u16 *dport, struct flow_event_info *fei,
__u32 *identifier)
{
struct icmp6hdr *icmp6h;
if (parse_icmp6hdr(&ctx->nh, ctx->data_end, &icmp6h) < 0)
return -1;
if (ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REQUEST)
return -1;
if (!ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REPLY)
return -1;
if (icmp6h->icmp6_code != 0)
return -1;
fei->event = FLOW_EVENT_NONE;
*sport = icmp6h->icmp6_identifier;
*dport = *sport;
*identifier = icmp6h->icmp6_sequence;
return 0;
}
/*
* Same as parse_icmp6_identifier, but for an ICMP(v4) header instead.
*/
static int parse_icmp_identifier(struct parsing_context *ctx, __u16 *sport,
__u16 *dport, struct flow_event_info *fei,
__u32 *identifier)
{
struct icmphdr *icmph;
if (parse_icmphdr(&ctx->nh, ctx->data_end, &icmph) < 0)
return -1;
if (ctx->is_egress && icmph->type != ICMP_ECHO)
return -1;
if (!ctx->is_egress && icmph->type != ICMP_ECHOREPLY)
return -1;
if (icmph->code != 0)
return -1;
fei->event = FLOW_EVENT_NONE;
*sport = icmph->un.echo.id;
*dport = *sport;
*identifier = icmph->un.echo.sequence;
return 0;
}
/* /*
* Attempts to parse the packet limited by the data and data_end pointers, * Attempts to parse the packet limited by the data and data_end pointers,
* to retrieve a protocol dependent packet identifier. If sucessful, the * to retrieve a protocol dependent packet identifier. If sucessful, the
@@ -225,15 +285,21 @@ static int parse_packet_identifier(struct parsing_context *ctx,
return -1; return -1;
} }
// Add new protocols here // Parse identifer from suitable protocol
if (p_id->flow.proto == IPPROTO_TCP) { if (p_id->flow.proto == IPPROTO_TCP)
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port, err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port, fei,
fei, &p_id->identifier); &p_id->identifier);
if (err) else if (p_id->flow.proto == IPPROTO_ICMPV6 &&
return -1; p_id->flow.ipv == AF_INET6)
} else { err = parse_icmp6_identifier(ctx, &saddr->port, &daddr->port,
return -1; fei, &p_id->identifier);
} else if (p_id->flow.proto == IPPROTO_ICMP && p_id->flow.ipv == AF_INET)
err = parse_icmp_identifier(ctx, &saddr->port, &daddr->port,
fei, &p_id->identifier);
else
return -1; // No matching protocol
if (err)
return -1; // Failed parsing protocol
// Sucessfully parsed packet identifier - fill in IP-addresses and return // Sucessfully parsed packet identifier - fill in IP-addresses and return
if (p_id->flow.ipv == AF_INET) { if (p_id->flow.ipv == AF_INET) {
@@ -267,7 +333,7 @@ static void fill_flow_event(struct flow_event *fe, __u64 timestamp,
{ {
fe->event_type = EVENT_TYPE_FLOW; fe->event_type = EVENT_TYPE_FLOW;
fe->timestamp = timestamp; fe->timestamp = timestamp;
__builtin_memcpy(&fe->flow, flow, sizeof(struct network_tuple)); fe->flow = *flow;
fe->source = source; fe->source = source;
fe->reserved = 0; // Make sure it's initilized fe->reserved = 0; // Make sure it's initilized
} }