645 Commits

Author SHA1 Message Date
5343ed3377 update bpf-examples libbpf to v1.3.0
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
2024-01-31 16:14:08 +01:00
d1cc8a27e7 port track tcp payload offset as scalar in xdp_synproxy
commit 977bc146d4eb7070118d8a974919b33bb52732b4
Author: Eduard Zingerman <eddyz87@gmail.com>
Date:   Tue Nov 21 04:06:51 2023 +0200

    selftests/bpf: track tcp payload offset as scalar in xdp_synproxy

    This change prepares syncookie_{tc,xdp} for update in callbakcs
    verification logic. To allow bpf_loop() verification converge when
    multiple callback itreations are considered:
    - track offset inside TCP payload explicitly, not as a part of the
      pointer;
    - make sure that offset does not exceed MAX_PACKET_OFF enforced by
      verifier;
    - make sure that offset is tracked as unbound scalar between
      iterations, otherwise verifier won't be able infer that bpf_loop
      callback reaches identical states.

    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
    Link: https://lore.kernel.org/r/20231121020701.26440-2-eddyz87@gmail.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>

without above commit, syncookie_xdp program failed on kernel 6.7
with verifier error:
"BPF program is too large. Processed 1000001 insn"

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
2024-01-17 13:41:13 +01:00
86ec1b7f15 Merge pull request #111 from vincentmli/xdp-synproxy
Sync in kernel bpf selftest xdp synproxy fix: erroneous bitmask operation
2023-12-12 15:11:07 +01:00
3d6baf8905 fix: erroneous bitmask operation
In kernel selftest/bpf xdp synproxy has:
[0] b6a3451e084 (selftests/bpf: Fix erroneous bitmask operation)
sync the fix here.

It addresses an issue when xdp synproxy need to handle SYNACK
from backend server, see [1].

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=b6a3451e0847
[1]: https://lore.kernel.org/xdp-newbies/CAK3+h2z1r69Z5g+qTwCaJzgnD5sv93x67TLJ3gVQ70_nFE0AqQ@mail.gmail.com/T/#t

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
2023-12-11 23:44:46 +00:00
b9d1f89572 Add xdp-synproxy doc in Firewall/Router scenario
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
2023-12-05 19:23:05 +00:00
63cd4007b1 Merge pull request #103 from vincentmli/vli-dev
Add xdp-synproxy to bpf-examples
2023-10-26 21:33:45 +02:00
d4450991a2 Add xdp-synproxy Dockerfile and Kubernetes DaemonSet manifest
User could build xdp-synproxy container and runs in kubernetes
as daemonset to protect kubernetes node from SYN flood attack

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
2023-10-26 19:01:56 +00:00
fed8da5072 Add xdp-synproxy to bpf-examples
this code is from kernel bpf selftests xdp synproxy, removed the
tc part for simplicity, shows an exmaple of using libxdp
to attach xdp synproxy program on network interface.

if port is not in allowed ports, the packet will be dropped
by xdp synproxy by default, this would break tcp connections
to ports that user does not want to do synproxy, change the
default to allow connection pass through.

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
2023-10-26 19:01:49 +00:00
c726367fb4 Merge pull request #100 from simosund/pping-add-additional-counters
Add additional counters to ePPing
2023-10-25 16:48:09 +02:00
35012a2804 pping: Add errors to global counters
Add counters for runtime errors in the BPF programs to the global
counters. Specifically, add counters for failing to create entries in
the packet-timestamp, flow-state and aggregation-subnet maps. The
counters can easily be extended to include other errors in the
future. Output any non-zero counters at in an errors section at the
end of the global-counters report.

Example standard entry (linebreaks not part of actual output):

13:53:40.450555237: TCP=(pkts=110983, bytes=899455326), ICMP=(pkts=16,
bytes=1568), ECN=(Not-ECT=110999), errors=(store-packet-ts=210,
create-flow-state=8, create-agg-subnet-state=110999)

Example JSON entry:
{
  "timestamp": 1698235250698609700,
  "protocol_counters": {
    "TCP": {
      "packets": 111736,
      "bytes": 898999024
    },
    "ICMP": {
      "packets": 20,
      "bytes": 1960
    }
  },
  "ecn_counters": {
    "no_ECT": 111756
  },
  "errors": {
    "store_packet_ts": 165,
    "create_flow_state": 10,
    "create_agg_subnet_state": 111756
  }
}

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-25 16:12:41 +02:00
0707ac084d pping: Add ECN counters to the global counters
Add counters for the 4 ECN code points (00=Not-ECT, 01=ECT1, 10=ECT0
and 11=CE) to the global counters. These are reported together with
the global protocol counters when running in aggregated mode.

Example standard entry (linebreaks not part of actual output):

19:32:40.224309565: non-IP=(pkts=6, bytes=252), UDP=(pkts=9,
bytes=495), ECN=(Not-ECT=4, ECT1=3, CE=2)

Example JSON entry:
{
  "timestamp": 1698082435757528300,
  "protocol_counters": {
    "non_IP": {
      "packets": 6,
      "bytes": 252
    },
    "UDP": {
      "packets": 9,
      "bytes": 495
    }
  },
  "ecn_counters": {
    "no_ECT": 4,
    "ECT1": 3,
    "CE": 2
  }
}

Originally planned to also include a counter for ECN-echo in the TCP
header. However, adding parsing of a TCP field for ALL TCP packets is
currently challenging due to the parsing of the TCP-header being
conditional. First off, the TCP-header will only be parsed if the
program is configured to capture TCP RTTs (can be disabled by passing
the -C/--icmp flag without the -T/--tcp flag). Second off, parsing the
TCP-header is tied to parsing TCP timestamps, and the function for
parsing the TCP timestamps will signal failure regardless if it failed
to parse the TCP-header itself or just the TCP timestamps. Parsing of
a TCP field (like ECE) is thus only safe for the subset of packets
where TCP timestamps could successfully be parsed, which would create
misleading stats as the other ECN counters cover all IP-traffic.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-24 16:06:53 +02:00
b086b40567 pping: Document concurrency issue with global counters
Document a minor concurrency issue with the implementation of the
global counters reporting, which may result in the counters being
reported in an inconsistent state.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-24 15:32:55 +02:00
7ebf7d6125 pping: Add global per-protocol counters for aggregated output
Add global per-protocol counters for the aggregated output. These
counters include all the packets the eBPF program processes (even if
it cannot parse an IP-address, and thereby add it to the per-subnet
packet counts). Output the global counts at the end of every
aggregated report.

Example with standard output (linebreakes not part of output):

15:47:28.544011000: non-IP(pkts=6, bytes=252), TCP(pkts=88316,
bytes=3094356024), ICMP(pkts=3983, bytes=390110), 47(pkts=80)

Example with JSON output:
{
  "timestamp": 1697635992487286800,
  "protocol_counters": {
    "non_IP": {
      "packets": 4,
      "bytes": 168
    },
    "TCP": {
      "packets": 344633,
      "bytes": 16609641822
    },
    "ICMP": {
      "packets": 3960,
      "bytes": 388016
    },
    "47": {
      "packets": 60
    }
  }
}

Some implementation details:
Internally keep packet and byte counters for non-IP, TCP, UDP, ICMP
and ICMPv6, i.e. the "common protocols". To catch any other non-common
IP-protocol, keep an array of packet counters for every possible
IP-protocol [0, 255]. In the output, provide names for the common
protocols (e.g. "TCP"), while only outputting the protocol number of
non-common protocols. To avoid excessive output, only output counters
that are non-zero. This way, output is minimized while still allowing
for detecting unexpected (or even illegal) protocol numbers.

Unlike the per-prefix stats, do not reset the global counters. Instead
keep a copy of the previous counts and calculate the difference in
user space to report the difference since the previous report. This
unsynchronized approach is simpler than synchronized approach
swapping between two instances of the map used by the per-prefix
stats, but may result in small inconsistencies (ex. the packet-count
and byte-count may mismatch in case the counters are fetched when an
eBPF program has updated the packet-counter but not the byte-counter).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-24 15:32:55 +02:00
9ebcc2a2f9 pping: Move packet_info to per-CPU map to save stack space
Additions in the comming commits increase the maximum stack space used
by the eBPF programs past the 512 byte limit (causing verifier
rejection). To avoid this, move the relatively large packet_info
struct to a single-entry per-CPU array map.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-20 12:17:46 +02:00
2315d792fa pping: Add additional per-prefix packet counters
When running in aggregated mode (-a/--aggregate), the previous
per-prefix packet and byte counters only included traffic that the RTT
was tracked for, i.e. by default only TCP traffic with TCP
timestamps (which a flowstate could be created for) was counted. This
makes it hard to correlate RTTs with traffic load, as the total
traffic load to/from a given prefix is not known.

Therefore, split up the per-prefix counters into 3 sets of counters:
- One for TCP traffic with timestamps, i.e. the TCP traffic we can
  track RTTs of
- One of TCP traffic without timestamps, i.e TCP traffic we cannot
  track due to relying on TCP timestamps
- One for non-TCP traffic, which when combined with the other counters
  gives the total amount of (IP) traffic going to/from a prefix

Do NOT create NEW prefix entries for traffic which the RTT cannot be
tracked for. This means that if some prefix only sees traffic of a
type that RTTs cannot be captured for, they will use the global /0
backup entries.

To keep the standard output somewhat manageable (it is already quite
wide), only output the total packet and byte counts for the traffic
to/from the prefix. For the JSON format, output the counters for each
individual set (TCP_TS, TCP_noTS, and other) which are non-empty.

Example standard entry after update (same as before update, linebreaks
not part of actual output):

14:42:10.451929078: 10.11.1.10/32 -> rxpkts=4303, rxbytes=347742,
txpkts=37658, txbytes=1888184076, rtt-count=1202, min=0.006963 ms,
mean=2 ms, median=2 ms, p95=2 ms, max=3.10063 ms

Example JSON entry before update:

{
  "timestamp": 1697638197074346000,
  "ip_prefix": "10.11.1.10/32",
  "rx_packets": 2495,
  "tx_packets": 12121,
  "rx_bytes": 164670,
  "tx_bytes": 601306338,
  "count_rtt": 743,
  "min_rtt": 7717,
  "mean_rtt": 2021530,
  "median_rtt": 2000000,
  "p95_rtt": 2000000,
  "max_rtt": 4985117,
  "histogram": [
    739,
    4
  ]
}

Example JSON entry after update:

{
  "timestamp": 1697635990442789000,
  "ip_prefix": "10.11.1.10/32",
  "rx_stats": {
    "TCP_TS": {
      "packets": 1458,
      "bytes": 96232
    },
    "TCP_noTS": {
      "packets": 1,
      "bytes": 74
    },
    "other": {
      "packets": 1874,
      "bytes": 183460
    }
  },
  "tx_stats": {
    "TCP_TS": {
      "packets": 17270,
      "bytes": 905414662
    },
    "TCP_noTS": {
      "packets": 1,
      "bytes": 74
    },
    "other": {
      "packets": 1898,
      "bytes": 184204
    }
  },
  "count_rtt": 629,
  "min_rtt": 7775,
  "mean_rtt": 2038160,
  "median_rtt": 2000000,
  "p95_rtt": 2000000,
  "max_rtt": 13431771,
  "histogram": [
    627,
    0,
    0,
    2
  ]
}

This commit will considerably increase the overhead for traffic types
that RTT isn't tracked for compared to before. Previously, the eBPF
programs would early abort as soon as it discovered that the packet
was of a type which it couldn't track RTT for. Now, all IP packets
will have their IP-address processed and later used to lookup the
relevant prefixes to update the packet counters for. However, the
overhead for packets that the RTT can't be tracked for should still be
considerbly lower than for packets the RTT can be tracked for, and the
overhead for packets the RTT can be tracked for should not increase
much from previously.

Potential bug/issue. If the program is configured to NOT track RTTs
for TCP traffic (by using the -I/--icmp flag without the -T/--tcp
flag), the program will not parse the TCP header and thus be unable to
detect if it contains TCP timestamps. Therefore, all TCP packets will
then be counted as TCP packets without timestamps, regardless if they
have timestamps or not.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-20 12:17:46 +02:00
d13f429907 pping: Reverse the interpretation of rx/tx for aggregated stats
For the aggregated stats, report RX and TX from the perspective of the
capture point, instead of the perspective of the subnet.

Consider the following setup, consisting of subnet A, the capture
point (CP) where we're running ePPing, and subnet B.

A <-----> CP <-----> B

Now consider that we have a TCP stream uploading data from A to B, so
that we can capture RTTs between when the data packet from A reaches
CP to when the ACK from B gets back to the CP, i.e. CP -> B -> CP.

Previously, the RX stats for a subnet referred to packets received by
the subnet, i.e. packets with dst address in the subnet. Likewise, TX
packets were packets transmitted by the subnet, i.e. packets with src
address in the subnet. So the data packet from A -> B would be
reported as TX for subnet A and RX for subnet B.

However, the RTTs are by default (can be changed by the
--aggregate-reverse flag) aggregated from the perspective of the
capture point, so that the RTT CP -> B -> CP would be reported as an
RTT observed for subnet B.

Make the TX and RX stats consistent with the RTT, so that all subnet
stats are from the perspective of the CP. Make RX refer to packets the
CP has received from the subnet, i.e. packets with src in A, and TX
refer to packets the CP has transmitted to the subnet, i.e. packets
with dst in the subnet. So report a data packet from A -> B as RX for
subnet A and TX for subnet B.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-20 12:17:46 +02:00
a7972a6f0f Refactor proto_to_str() function to be thread-safe
Refactor the proto_to_str() function to write the protocol string to a
provided buffer instead of providing a pointer to a static
buffer. This makes it possible to safely use the function in
multi-threaded contexts or call it multiple times before printing the
returned value.

Also rename it to ipproto_to_str(), to clarify that it's for IP
protocols.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-20 12:16:23 +02:00
0200196244 pping: Rename aggregated rtts to aggregated stats
The (per-subnet) aggregated stats already include packet byte counts,
so it is not strictly only RTTs. Future commits will further extend
the non-RTT related statistics that are aggregated. Therefore, rename
structs, functions and paramters of the from "aggregated_rtts" to
"aggregated_stats".

To clarify which members of the aggregated_rtt_stats struct (now
renamed to aggregated_stats) which are related to the RTT, prefix
their names with "rtt_", e.g. "min" -> "rtt_min".

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-10-17 16:23:51 +02:00
8e7495f553 Merge pull request #99 from tirthendu-intel/xdpsock_mb_upstream
AF_XDP-example: add multi-buffer support to xdpsock
2023-09-26 10:17:43 +02:00
a85ef7c2b5 fix compilation error for arm64
Signed-off-by: Sachin Tiptur <coolsachints@gmail.com>
2023-09-21 23:27:28 +02:00
66c0394d7c xdpsock: add rx/tx counters for frags
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
2023-09-21 10:46:31 +02:00
f63c7633cc headers/linux: update if_xdp.h from kernel v6.5.0+
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
2023-09-21 08:32:53 +02:00
e5a12a2a72 AF_XDP-example: add multi-buffer support to xdpsock
* Add support for handling multi-buffer packets.
* Add a new CLI option to enable frag support.
* xdpsock_kern.c is modified to use num_socks as updated by userspace
  application.
* MAX_PKT_SIZE is set as 9728 as supported by many NICs.
* xdpsock_kern.o is loaded for both frags and shared uemem cases.

Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
2023-09-11 18:38:17 +02:00
bc9df640cb Merge pull request #91 from simosund/pping-systemd
Add systemd unit files for ePPing setup
2023-08-07 15:48:29 +02:00
f423e39d6b pping: Change single pping-service into generic template
Replace the systemd unit files that needed to be modified for a
specific interface with template files. The template files allows one
to instansiate a service for any interface (by running systemctl
start pping@<interface>.service), and multiple interfaces can be
monitored at once.

Each instance maintains a separtate "log" of data at
/sys/var/log/pping/<interface>/pping.<interface>.json which is rotated
one per minute (see the rotate-pping@.timer file) and placed in daily
subfolders.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-08 18:05:31 +02:00
c6751effb0 pping: Add script for cleaing up leftover tc programs
In case ePPing is not shut down cleanly (ex. when killed with SIGKILL
or OOM killer) it will not be able to detach its eBPF programs, may
remain attached and waste resources. Add a script which can be used
to clean up any remaining programs tc-eBPF programs.

Note, this script should not be run while the any instace of pping is
still running, as that will remove its tc programs and thus its
ability to function properly.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-08 16:17:24 +02:00
6582f6713c pping: Add systemd unit files for running pping
Add some example files for setting up ePPing with systemd.
The setup creates "log" files in /var/log/pping and rotates
them every minute (appending a date at the time of rotation).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-08 16:17:24 +02:00
4b677fd715 Merge pull request #59 from simosund/pping-agg-prototype
Add aggregation option to ePPing
2023-07-07 16:26:06 +02:00
59310e8ead pping: Preallocate memory for aggregation maps
When maps are not preallocated, the creation of map entries may
sometimes unpredictably fail with ENOMEM, despite plenty of free
memory being available. Solving this memory allocation issue may take
some time, so in the mean time let's just preallocate the memory for
the aggregation maps as well.

Preallocating the maps means the memory usage will be the same
regardless of the amount of traffic actually observed (i.e. regardless
of the number of aggregation entries that need to be created). To
compensate for this higher out-of-the-box memory usage, decrease the
histogram resolution from 1000 1ms bins to 250 4ms bins.

The memory usage (for the aggregation maps) should be approximately:
(56 + NR_BINS * 4) * CPUS * MAP_AGGREGATION_SIZE * 4

With the current values, that translates to roughly 66 MiB per CPU
core (down from ~254 MiB/core with 1000 bins).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
aadc7535c1 pping: Reopen output file on SIGHUP
Make the user space process reopen the output file (if used with the
-w/--write option) when it recieves a SIGHUP signal. This makes it
possible to for example rotate the output files with logrotate.

In case the program receives the SIGHUP signal is BEFORE the output
file has been moved, the program will throw a warning and then
continue writing to its current file handle until another SIGHUP
signal is received.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
965d34ffae pping: Add option to write to file
Add an option -w/--write <filename> which writes the output to the
provided file instead of to stdout. Fail if file the provided file
already exists to avoid data loss (if truncating file) or corrupting
data (if appending file, JSON is not concatable).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
1e704be790 pping: Refactor output handling
Refactor code for various output functions to avoid hard-coding stdout
and relying on global variables. Collect output-related parameters
into a new output_context struct which can easily be passed as an
explicit argument to output related functions. Get rid of the global
json_ctx and print_event_func variables, as corresponding information
is now included in the output_context which is directly passed to the
required functions.

Overall, these changes aim to make it more flexible how and where
output is written. This will make it easier to for example add support
for writing directly to files in the future.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
1319871358 pping: Add aggregation configuration to output
Add an initial entry containing information about the settings used
for the aggregation. For the standard output format it reuses the
message that was previously written to stderr (but is now written to
stdout), while for the json format it adds a more detailed json entry.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
46f5913e6f pping: Truncate aggregation histograms
In many scenarios, the upper range of an aggregation histogram may be
empty (ex. max histogram bins is for 1000ms but highest observed RTT
was 100 ms, leaving 900 trailing empty bins). As trailing empty bins
contain no useful information, simply truncate the histograms to the
highest non-empty bin.

The truncation of histograms has two benefits.

1. It avoids unnecessary processing of empty bins when internally
calculating statics from the histograms. This should not have any
impact on the output.

2. It reduces the size of the histogram in the JSON output
format. This can potentially save a lot of space in instances where
most maximum observed RTT for a prefix during an aggregation interval
is significantly lower than the highest histogram bin. Removing
trailing empty bins (unlike non-trailing ones) does not require
encoding any additional information (like the number of removed bins
or the index of the remaining ones). It can also never make the
histogram take up more space. Thus there are no obvious drawbacks with
"compressing" the histograms in this manner.

In the future it may be relevant to implement other ways to compress
the histograms, which may be more efficient for certain
distributions (ex. very sparse histograms). However as this method of
removing trailing empty bins is both simple and without drawbacks, so
it makes sense to make the default behavior for now.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
e9db312ad5 pping: Add fallback entry for aggregation maps
The aggregation maps may become full, in which case the BPF programs
will fail to create new entries for any IP-prefixes not currently in
the map. This would previously result in stats from traffic that
cannot fit into any aggregation entry to be missed.

Add a fallback entry for each map, so that in case the aggregation map
is full stats from any new IP-prefix will be added to this fallback
entry instead. The fallback entry is reported as 0.0.0.0/0 (for IPv4)
or ::/0 (for IPv6) in the output.

Note that this is implemented by adding specific "magic values" as
special keys in the aggregation maps (see IPV{4,6}_BACKUP_KEY in
pping.h). These special keys have been selected so that no real
traffic should collide with them by using prefixes from blocks
reserved for documentation. Furthermore, these entries are added by
the user space program AFTER the BPF programs are attached (as it's
not possible to do it in-between loading and attaching when using
libxdp). In case the BPF programs manage to fill the maps before the
user space component can create the backup entries, it will fail and
abort the program.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
2224edf85e pping: Add packet and byte counts to aggregated output
In addition to RTTs, also aggregate no. packets and bytes
transmitted and received for each IP-prefix. If both the src and dst
IP address of a packet is within the same IP-prefix, it will be
counted as both sent to and received by that prefix.

The packet stats are added for all successfully parsed
packets (i.e. packets that contain a valid identifier of some sort),
regardless of if the packet actually produces a valid RTT sample. This
means some IP-prefixes may only have packet stats, and no RTT stats,
so only output the packet stats in those instances. From a performance
perspective, it also means each BPF program needs to perform two
lookups of the aggregation map (one for src IP and one for dst IP) for
every packet that is successfully parsed. This is a substantial
increase from only having to perform a single lookup on the subset of
packets that produce an RTT sample.

Packets that are not successfully parsed (i.e. they don't contain a
valid identifier, e.g. UDP traffic) are still ignored to minimize
overhead, and will therefore not be included in the aggregated packet
stats. This means the aggregated packet stats may not include all
traffic for an IP-prefix. Future commits may add some counters to also
account for traffic that is not fully parsed.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
e5b6c55f42 pping: Add pkt_len to packet_info
Add a field with the total packet length (including all headers) to
the packet_info struct. This information will be needed in later
commits which add byte counts to the aggregated information.

Note that this information is already part of the parsing_context
struct, but this won't be available after the packet has been
parsed (once the parse_packet_identifier_{tc,xdp}() function have
finished). It is unfortunately not trivial to replace current instaces
which use pkt_len from the parsing_context to instead take it from
packet_info, as ex. the parse_tcp_identifier() already takes 5
arguments, and packet_info is not one of them. Therefore, keep both
the pkt_len in parsing_context and packet_info for now.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
0f6042bf0c pping: Expire old aggregation prefixes
Keep track of when the last update was made to each IP-prefix in the
aggregation map, and delete entries which are older than
--aggregate-timeout (30 seconds by default). If the user specifies
zero (0), that is interpreted as never expire an entry (which is
consistent with how the --cleanup-interval operates).

Note that as the BPF programs rotate between two maps (an active one
for BPF progs to use, and an inactive one the user space can operate
on), it may expire an aggregation prefix from one of the maps even if
it has seen recent action in the other map.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
ec92f5a91f pping: Add JSON format for aggregation
Add support for outputing the aggregated reports in JSON format. This
format includes the raw histogram bin counts, making it possible to
post-process the aggregated rtt statistics.

The user specifies the format for the aggregated output in the same
way as for the per-RTT output, by using the -F/--format argument. If
the user attempts to use the ppviz format for the aggregated
output (which is not supported) the program will error out.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
5a8eb8748a pping: Always initialize JSON array
Create start of JSON array during the start of program (if configured
to use JSON format) instead of at report. This ensures that
ePPing provides valid JSON output (an empty array, []) even if the
program is stopped before any report is generated. Before this change,
ePPing could generate empty output (""), which is not valid JSON
output, if it was stopped before the first report.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
989905e870 pping: Improve aggregated output format
Provide some statistics (min, mean, media, p95, max) instead of
dumping the raw bin counts.

While the raw bin counts provide more information and can be used for
further post processing, they are hard for a human to parse and make
sense of. Therefore, they are more suitable for a data-oriented
format, such as the JSON output.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
a301900fbd pping: Add switch for which IP stats are aggregated by
By default ePPing will aggregate RTTs based on the src IP of the reply
packet. I.e. the RTT A->B->A will be aggregated based on IP of B. In
some scenarios it may be more interesting to aggregate based on the
dst IP of the reply packet (IP of A in above example). Therefore, add
a switch (--aggregate-reverse) which makes ePPing aggregate RTTs
based on the dst IP of the reply packet instead of the src IP. In
other words, by default ePPing will aggregate traffic based on where
it's going to, but with this switch you can make ePPing aggregate
traffic based on where it's comming from instead.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
5ef4ffdd1b pping: Reset aggregated RTTs after each report
Instead of keeping all RTTs since ePPing started, reset the aggregated
stats after each time they're reported so the report only shows the
RTTs since the last report.

To avoid concurrency issues due to user space reading and resetting
the map while the BPF programs are updating it, use two aggregation
maps, one active and one inactive. Each time user space wishes to
report the aggregated RTTs it first switches which map is actively
used by the BPF progs, and then reads and resets the now inactive map.

As the RTT stats are now periodically reset, change the
histogram (aggregated_rtt_stats.bins) to use __u32 instead of __u64
counters as risk of overflowing is low (even if 1 million RTTs/s is
added to the same bin, it would take over an hour to overflow, and
report frequency is likely higher than that).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
3a7b15ab3e pping: Add option to aggregate RTTs
Add an option -a or --aggregate to provide an aggregate report of RTT
samples every X seconds. This is currently mutually exclusive with the
normal per-RTT sample reports.

The aggregated stats are never reset, and thus contain all RTTs since
the start of tracing. The next commit will change this to reset the
stats after every report, so that each report only contain the RTTs
since the last report.

The RTTs are aggregated and reported per IP-prefix, where the user can
modify the size of the prefixes used for IPv4 and IPv6 using the
--aggregate-subnet-v4/v6 flags.

In this intital implementation for aggregating RTTs, the minimum and
maximum RTT are tracked and all RTTs are added to a histogram. It uses
a predetermined number of bins of equal width (set to 1000 bins, each
1 ms wide), see RTT_AGG_NR_BINS and RTT_AGG_BIN_WIDTH in pping.h. In
the future this could be changed to use more sophisticated histograms
that better capture a wide variety of RTTs.

Implement the periodic reporting of RTTs by using a
timerfd (configured to the user-provided interval) and add it to the
main epoll-loop.

To minimize overhead from the hash lookups, use separate maps for IPv4
and IPv6, so that for IPv4 traffic the hashmap key is only 4
bytes (instead of 16). Furthermore, limit the maximum IPv6 prefix size
to 64 so that the IPv6 map can use a 8 byte key. This limits the
maximum prefix size for IPv6 to /64.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
149e2c6d90 pping: Define map sizes
Instead of specifying the map size directly in the map definitions,
add them as defines at the top of the file to make them easier to
change (don't have to find the correct map among the map
definitions). This pattern will also simplify future additions of
maps, where multiple maps may share the same size.

While at it, increase the default packet_ts to 131072 (2^17) entries,
as the previous value of 16384 (2^14) which, especially for the
packet_ts map, was fairly underdimensioned. If only half of the
timestamps are echoed back (due to ex. delayed ACK), it would in
theory be enough with just 16k / (500 * 1) = 32 concurrent flows to
fill it up with stale entries (assuming default cleanout interval of
1s). Increasing the size of these maps will increase the corresponding
memory cost from 2^14 * (48 + 4) = 832 KiB and 2^14 * (44 + 144) =
2.94 MiB to 2^17 * (48 + 4) = 6.5 MiB and 2^17 * (44 + 144) = 23.5
MiB, respectively, which should generally not be too problematic.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
699e8e839e pping: Improve cross-thread abort handling
Remove the global keep_running variable and instead write to a pipe to
tell the main thread to abort in case periodical map cleanup
fails. Add the reading-side of this pipe to the epoll loop in the main
thread and abort if anything is written to the pipe. To abort the main
thread, update the main loop so it silently stops if it receives the
special value PPING_ABORT.

As the map cleaning thread can now immediately tell the main loop to
abort, it is no longer necessary to have a short
timeout (EPOLL_TIMEOUT_MS) on the main loop quickly detect changes in
the keep_running flag. So change the epoll loop to wait indefinitely
for one of the fds to update instead of timing out frequently.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:30 +02:00
12dc63b965 pping: Use signalfd instead of signalhandler
Use the signalfd API to handle graceful shutdown on SIGINT/SIGTERM. To
watch the signalfd, create an epoll instance and add both the signalfd
and the perf-buffer to the epoll instance so that both can be
monitored in the main loop with epoll_wait().

This avoids the signal handler from interrupting the perf-buffer
polling and the other issues with the asynchronous signal
handling. Furthermore, the restructuring of the main loop to support
watching multiple file descriptors makes it possible to add additional
events to the main loop in the future (such as a periodical task
triggered by a timerfd).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-04 16:10:49 +02:00
1cb4e93b04 pping: Fix edge cases of parse_bounded_double()
Fix two edge cases with the parse_bounded_double() function.

1. It accept an empty string without raising an error. This should not
   have been an issue in practice as getopt_long() should have detected
   it as an lack of argument. This is addressed by adding a check for
   if it has parsed anything at all.

2. It could overflow/underflow without raising an error. This is
   addressed by adding a check of errno (which is set in case of
   overflow/underflow, but not in case of conversion error).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-05-25 16:30:56 +02:00
83a85adb96 pping: Minor cleanup of argument parsing
The parse_arguments() function used to have a separate variable for
each float (or rather a double) value it would parse from the user. As
only one argument will be parsed at the time this is redudant and will
require more and more variables as new options are added. Replace all
these variables with a single "user_float", which is used for all
options that parse a float from the user.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-05-25 16:30:02 +02:00
22ac4d9192 pping: Factor out sending of RTT event
Extract the logic for filling in and sending an RTT event to a
function. This makes it consistent with other send_*_event() functions
and will make it easier/cleaner to add an option to aggregate the RTT
instead of sending it.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-05-25 13:17:41 +02:00