pping: Add support for ICMP echo messages

Allow pping to passivly monitor RTT for ICMP echo request/reply
flows. Use the echo identifier as ports, and echo sequence as packet
identifier.

Additionally, add protocol to standard output format in order to be
able to distinguish between TCP and ICMP flows.

The ppviz format does not include protocol, making it impossible to
distinguish between TCP and ICMP traffic. Will add warning if ppviz
format is used together with ICMP traffic in the future.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
This commit is contained in:
Simon Sundberg
2021-12-08 10:13:50 +01:00
parent af5e660d8e
commit bd6ded5c21
4 changed files with 124 additions and 48 deletions

View File

@@ -6,11 +6,11 @@ TC-BPF (on egress) for the packet capture logic.
## Simple description
Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It
can be used on endhosts as well as any (BPF-capable Linux) device which can see
both directions of the traffic (ex router or middlebox). Currently it only works
for TCP traffic which uses the TCP timestamp option, but could be extended to
also work with for example TCP seq/ACK numbers, the QUIC spinbit and ICMP
echo-reply messages. See the [TODO-list](./TODO.md) for more potential features
(which may or may not ever get implemented).
both directions of the traffic (ex router or middlebox). Currently it works for
TCP traffic which uses the TCP timestamp option and ICMP echo messages, but
could be extended to also work with for example TCP seq/ACK numbers, the QUIC
spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
features (which may or may not ever get implemented).
The fundamental logic of pping is to timestamp a pseudo-unique identifier for
outgoing packets, and then look for matches in the incoming packets. If a match
@@ -18,15 +18,22 @@ is found, the RTT is simply calculated as the time difference between the
current time and the stored timestamp.
This tool, just as Kathie's original pping implementation, uses TCP timestamps
as identifiers. For outgoing packets, the TSval (which is a timestamp in and off
itself) is timestamped. Incoming packets are then parsed for the TSecr, which
are the echoed TSval values from the receiver. The TCP timestamps are not
necessarily unique for every packet (they have a limited update frequency,
appears to be 1000 Hz for modern Linux systems), so only the first instance of
an identifier is timestamped, and matched against the first incoming packet with
the identifier. The mechanism to ensure only the first packet is timestamped and
matched differs from the one in Kathie's pping, and is further described in
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
as identifiers for TCP traffic. For outgoing packets, the TSval (which is a
timestamp in and off itself) is timestamped. Incoming packets are then parsed
for the TSecr, which are the echoed TSval values from the receiver. The TCP
timestamps are not necessarily unique for every packet (they have a limited
update frequency, appears to be 1000 Hz for modern Linux systems), so only the
first instance of an identifier is timestamped, and matched against the first
incoming packet with the identifier. The mechanism to ensure only the first
packet is timestamped and matched differs from the one in Kathie's pping, and is
further described in [SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
number as identifer to match against. Linux systems will typically use different
echo identifers for different instances of ping, and thus each ping instance
will be recongnized as a separate flow. Windows systems typically use a static
echo identifer, and thus all instaces of ping originating from a particular
Windows host and the same target host will be considered a single flow.
## Output formats
pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In
@@ -41,12 +48,12 @@ single line per event.
An example of the format is provided below:
```shell
16:00:46.142279766 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
16:00:46.147705205 5.425439 ms 5.425439 ms 10.11.1.1:5201+10.11.1.2:59528
16:00:47.148905125 5.261430 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
16:00:48.151666385 5.972284 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
16:00:49.152489316 6.017589 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
16:00:49.878508114 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
16:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:49.152489316 6.017589 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
16:00:49.878508114 TCP 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
```
### ppviz format
@@ -196,8 +203,8 @@ these identifiers.
This issue could be avoided entirely by requiring that new-id > old-id instead
of simply checking that new-id != old-id, as TCP timestamps should monotonically
increase. That may however not be a suitable solution if/when we add support for
other types of identifiers.
increase. That may however not be a suitable solution for other types of
identifiers.
#### Rate-limiting new timestamps
In the tc/egress program packets to timestamp are sampled by using a per-flow

View File

@@ -14,14 +14,15 @@
- If one only considers SEQ/ACK (and don't check for SACK
options), could result in ex. delay from retransmission being
included in RTT
- [ ] ICMP (ex Echo/Reply)
- [x] ICMP (ex Echo/Reply)
- [ ] QUIC (based on spinbit)
- [ ] DNS queries
## General pping
- [x] Add sampling so that RTT is not calculated for every packet
(with unique value) for large flows
- [ ] Allow short bursts to bypass sampling in order to handle
delayed ACKs
delayed ACKs, reordered or lost packets etc.
- [x] Keep some per-flow state
- Will likely be needed for the sampling
- [ ] Could potentially include keeping track of average RTT, which

View File

@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
static const char *__doc__ =
"Passive Ping - monitor flow RTT based on TCP timestamps";
"Passive Ping - monitor flow RTT based on header inspection";
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
@@ -51,16 +51,16 @@ enum PPING_OUTPUT_FORMAT {
};
/*
* BPF implementation of pping using libbpf
* Uses TC-BPF for egress and XDP for ingress
* - On egrees, packets are parsed for TCP TSval,
* if found added to hashmap using flow+TSval as key,
* and current time as value
* - On ingress, packets are parsed for TCP TSecr,
* if found looksup hashmap using reverse-flow+TSecr as key,
* and calculates RTT as different between now map value
* - Calculated RTTs are pushed to userspace
* (together with the related flow) and printed out
* BPF implementation of pping using libbpf.
* Uses TC-BPF for egress and XDP for ingress.
* - On egrees, packets are parsed for an identifer,
* if found added to hashmap using flow+identifier as key,
* and current time as value.
* - On ingress, packets are parsed for reply identifer,
* if found looksup hashmap using reverse-flow+identifier as key,
* and calculates RTT as different between now and stored timestamp.
* - Calculated RTTs are pushed to userspace
* (together with the related flow) and printed out.
*/
// Structure to contain arguments for clean_map (for passing to pthread_create)
@@ -678,16 +678,17 @@ static void print_event_standard(void *ctx, int cpu, void *data,
if (e->event_type == EVENT_TYPE_RTT) {
print_ns_datetime(stdout, e->rtt_event.timestamp);
printf(" %llu.%06llu ms %llu.%06llu ms ",
printf(" %llu.%06llu ms %llu.%06llu ms %s ",
e->rtt_event.rtt / NS_PER_MS,
e->rtt_event.rtt % NS_PER_MS,
e->rtt_event.min_rtt / NS_PER_MS,
e->rtt_event.min_rtt % NS_PER_MS);
e->rtt_event.min_rtt % NS_PER_MS,
proto_to_str(e->rtt_event.flow.proto));
print_flow_ppvizformat(stdout, &e->rtt_event.flow);
printf("\n");
} else if (e->event_type == EVENT_TYPE_FLOW) {
print_ns_datetime(stdout, e->flow_event.timestamp);
printf(" ");
printf(" %s ", proto_to_str(e->rtt_event.flow.proto));
print_flow_ppvizformat(stdout, &e->flow_event.flow);
printf(" %s due to %s from %s\n",
flowevent_to_str(e->flow_event.event_info.event),
@@ -701,6 +702,7 @@ static void print_event_ppviz(void *ctx, int cpu, void *data, __u32 data_size)
const struct rtt_event *e = data;
__u64 time = convert_monotonic_to_realtime(e->timestamp);
// ppviz format does not support flow events
if (e->event_type != EVENT_TYPE_RTT)
return;

View File

@@ -8,6 +8,8 @@
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/tcp.h>
#include <linux/icmp.h>
#include <linux/icmpv6.h>
#include <stdbool.h>
// overwrite xdp/parsing_helpers.h value to avoid hitting verifier limit
@@ -182,6 +184,64 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
return 0;
}
/*
* Attemps to fetch an identifier for an ICMPv6 header, based on the echo
* request/reply sequence number.
* If successful, identifer will be set to the echo sequence number, both
* sport and dport will be set to the echo identifier, and 0 will be returned.
* On failure, -1 will be returned.
* Note: Will store the 16-bit echo sequence number in network byte order in
* the 32-bit identifier.
*/
static int parse_icmp6_identifier(struct parsing_context *ctx, __u16 *sport,
__u16 *dport, struct flow_event_info *fei,
__u32 *identifier)
{
struct icmp6hdr *icmp6h;
if (parse_icmp6hdr(&ctx->nh, ctx->data_end, &icmp6h) < 0)
return -1;
if (ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REQUEST)
return -1;
if (!ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REPLY)
return -1;
if (icmp6h->icmp6_code != 0)
return -1;
fei->event = FLOW_EVENT_NONE;
*sport = icmp6h->icmp6_identifier;
*dport = *sport;
*identifier = icmp6h->icmp6_sequence;
return 0;
}
/*
* Same as parse_icmp6_identifier, but for an ICMP(v4) header instead.
*/
static int parse_icmp_identifier(struct parsing_context *ctx, __u16 *sport,
__u16 *dport, struct flow_event_info *fei,
__u32 *identifier)
{
struct icmphdr *icmph;
if (parse_icmphdr(&ctx->nh, ctx->data_end, &icmph) < 0)
return -1;
if (ctx->is_egress && icmph->type != ICMP_ECHO)
return -1;
if (!ctx->is_egress && icmph->type != ICMP_ECHOREPLY)
return -1;
if (icmph->code != 0)
return -1;
fei->event = FLOW_EVENT_NONE;
*sport = icmph->un.echo.id;
*dport = *sport;
*identifier = icmph->un.echo.sequence;
return 0;
}
/*
* Attempts to parse the packet limited by the data and data_end pointers,
* to retrieve a protocol dependent packet identifier. If sucessful, the
@@ -225,15 +285,21 @@ static int parse_packet_identifier(struct parsing_context *ctx,
return -1;
}
// Add new protocols here
if (p_id->flow.proto == IPPROTO_TCP) {
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port,
fei, &p_id->identifier);
if (err)
return -1;
} else {
return -1;
}
// Parse identifer from suitable protocol
if (p_id->flow.proto == IPPROTO_TCP)
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port, fei,
&p_id->identifier);
else if (p_id->flow.proto == IPPROTO_ICMPV6 &&
p_id->flow.ipv == AF_INET6)
err = parse_icmp6_identifier(ctx, &saddr->port, &daddr->port,
fei, &p_id->identifier);
else if (p_id->flow.proto == IPPROTO_ICMP && p_id->flow.ipv == AF_INET)
err = parse_icmp_identifier(ctx, &saddr->port, &daddr->port,
fei, &p_id->identifier);
else
return -1; // No matching protocol
if (err)
return -1; // Failed parsing protocol
// Sucessfully parsed packet identifier - fill in IP-addresses and return
if (p_id->flow.ipv == AF_INET) {
@@ -267,7 +333,7 @@ static void fill_flow_event(struct flow_event *fe, __u64 timestamp,
{
fe->event_type = EVENT_TYPE_FLOW;
fe->timestamp = timestamp;
__builtin_memcpy(&fe->flow, flow, sizeof(struct network_tuple));
fe->flow = *flow;
fe->source = source;
fe->reserved = 0; // Make sure it's initilized
}