Commit Graph

32 Commits

Author SHA1 Message Date
Simon Sundberg
2e7595b3ca pping: Replace boolean connection state flags with enum
The connection state had 3 boolean flags related to what state it was
in (is_empty, has_opened and has_closed). Only specific combinations
of these flags really made sense (has_opened/has_closed didn't really
mean anything if is_empty, and if has_closed one would expect is_empty
to be false and has_opened to be true etc.). Therefore, replace these
combinations of boolean values with a singular enum which is used to
check if the flow is empty, waiting to open (seen outgoing packet but
no response), is open or has closed.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-30 18:12:46 +02:00
Simon Sundberg
4fb1f6de64 pping: Combine flow state in each direction to a dualflow state
Combine the flow state entries for both the "forward" and "reverse"
direction of the flow into a single dualflow state. Change the
flowstate map to use the dualflow state so that state for both
directions can be retrieved using a single map lookup.

As flow states are now kept in pairs, cannot directly create/delete
states from the BPF map each time a flow opens/closes in one
direction. Therefore, update all logic related to creating/deleting
flows. For example, use "empty" slot in dualflow state instead of
creating a new map entry, and only delete the dual flow state entry
once both directions of the flow have closed/timed out.

Some implementation details:

Have implemented a simple memcmp function as I could not get the
__builtin_memcmp function to work (got error "libbpf: failed to
find BTF for extern 'memcmp': -2").

To ensure that both directions of the flow always look up the same
entry, use the "sorted" flow tuple (the (ip, port) pair that is
smaller is always first) as key. This is what the memcmp is used for.

To avoid storing two copies of the flow tuple (src -> dst and dst ->
src) and doing additional memcmps, always store the flow state for the
"sorted" direction as the first direction and the reverse as the
second direction. Then simply check if a flow is sorted or not to
determine which direction in the dual flow state that matches. Have
attempted to at least partially abstract this detail away from most of
the code by adding some get_flowstate_from* helpers.

The dual flow state simply stores the two (single direction) flow
states as the struct members dir1 and dir2. Use these two (admittedly
poorly named) members instead of a single array of size 2 in order to
avoid some issues with the verifier being worried that the array index
might be out of bounds.

Have added some new boolean members to the flow state to keep track of
"connection state". In addition the the previous has_opened, I now
also have a member for if the flow is "empty" or if it has been
closed. These are needed to cope with having to keep individual flow
states for both directions of the flow around as long as one direction
of the flow is used. I plan to replace these boolean "connection
state" members with a single enum in a future commit.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-30 18:12:46 +02:00
Simon Sundberg
75a979fc31 pping: Refactor parsing of packet identifiers
Refactor functions for parsing protocol-specific packet
identifiers (parse_tcp_identifier, parse_icmp6_identifer and
parse_icmp_identifer) so they no longer directly fill in the
packet_info struct. Instead make the functions take additional
pointers as arguments and fill in a protocol_info struct.

The reason for this change is to decouple the
parse_<protocol>_identifier functions from the logic of how the
packet_info struct should be filled. The parse_packet_indentifier is
now solely responsible for filling in the members of packet_info
struct correctly instead of working in tandem with the
parse_<protocol>_identifier, filling in some members each.

This might result in a minimal performance degradation as some values
are now first filled in the protocol_info struct and later copied to
packet_info instead of being filled in directly in packet_info.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-22 13:24:09 +01:00
Simon Sundberg
2935fb05cc pping: Minor formating fixes
Format code using clang-format from the kernel tree. However, leave
code in orginal format in some instances where clang-format clearly
reduces readability of code (ex. do not remove alginment of comments
for struct members and long options).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-22 09:11:53 +01:00
Simon Sundberg
11f1d88742 pping: Keep track of outstanding timestamps
Add a counter of outstanding (unmatched) timestamped entires in the
flow state. Before a timestamp lookup is attempted, check that there
are any outstanding timestamps, otherwise avoid the unecessary hash
map lookup.

Use 32 bit counter for outstanding timestamps to allow atomic
increments/decrements using __synch_fetch_and_add. This operation is
not supported on smaller integers, which is why such a large counter
is used. The atomicity is needed because the counter may be
concurrently accessed by both the ingress/egress hook as well as the
periodical map cleanup.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-15 15:39:53 +01:00
Simon Sundberg
72404b6767 pping: Add map cleanup debug info
Add some debug info to the periodical map cleanup process. Push debug
information through the events perf buffer by using newly added
map_clean_event.

The old user space map cleanup process had some simple debug
information that was lost when transitioning to using bpf_iter
instead. Therefore, add back similar (but more extensive) debug
information but now collected from the BPF-side. In addition to stats
on entries deleted by the cleanup process, also include stats on
entries deleted by ePPing itself due to matching (for timestamp
entries) or detecting FIN/RST (for flow entries)

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-10 09:45:22 +01:00
Simon Sundberg
be0921d116 pping: Use BPF iterators to do map gc
To improve the performance of the map cleanup, switch from the
user-spaced loop to using BPF iterators. With BPF iterators, a BPF
program can be run on each element in the map, and can thus be done in
kernel-space. This should hopefully also avoid the issue the previous
userspace loop had with resetting in case an element was removed by
the BPF programs during the cleanup.

Due to removal of userspace logic for map cleanup, no longer provide
any debug information about how many entires there are in each map and
how many of them were removed by the garbage collection. This will be
added back in the next commit.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-10 09:45:15 +01:00
Simon Sundberg
2647429081 pping: Add warnings for failing to create map entry
Send a warning notifying the user that PPing failed to create a
flow/timestamp entry due to the corresponding map being full. To avoid
sending a warning for every packet, only emit warnings every
WARN_MAP_FULL_INTERVAL (which is currently hard-coded to 1s).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:17:10 +01:00
Simon Sundberg
54886c92d9 pping: Refactor event handling code
Refactor code for how events are handled in the user space
application. Preparation for adding an additional event type which
should not be handled by the normal functions for printing RTT and
flow events.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:17:10 +01:00
Simon Sundberg
32bdf11a96 pping: Only consider flow opened on reply
Wait with sending a flow open message until a reply has been seen for
the flow. Likewise, only emit a flow closing event if the flow has
first been opened (that is, a reply has been seen).

This introduces potential (but unlikely) concurrency issues for flow
opening/closing messages which are further described in the README.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:17:10 +01:00
Simon Sundberg
8a8f538759 pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.

Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.

As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.

Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.

Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
Simon Sundberg
928a4144a9 pping: Add RTT-based sampling
Add an option (-R, --rtt-rate) to adapt the rate sampling based on the
RTT of the flow. The sampling rate will be C * RTT, where C is a
configurable constant (ex 1.0 to get one sample every RTT), and RTT
is either the current minimum (default) or smoothed RTT of the
flow (chosen via the -t or --rtt-type option).

The smoothed RTT (sRTT) is updated for each calculated RTT, and is
calculated in a similar manner to srtt in the kernel's TCP stack. The
sRTT is a moving average of all RTTs, and is calculated according to
the formula:

  srtt = 7/8 * prev_srtt + 1/8 * rtt

To allow the user to pass a non-integer C (ex 0.1 to get 10 RTT
samples for every RTT-period), fixed-point arithmetic has been used
in the eBPF programs (due to lack of support for floats). The maximum
value for C has been limited to 10000 in order for it to be unlikely
that the C * RTT calculation will overflow (with C = 10000, overflow
will only occur if RTT > 28 seconds).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:11:21 +01:00
Simon Sundberg
1cadbe0ae7 pping: Make parsed protocols configurable
Add command-line flags for each protocol that pping should attempt to
parse and report RTTs for (currently -T/--tcp and -C/--icmp). If no
protocol is specified assume TCP. To clarify this, output a message
before start on how ePPing has been configured (stating output format,
tracked protocols and which interface to run on).

Additionally, as the ppviz format was only designed for TCP it does
not have any field for which protocol an entry belongs to. Therefore,
emit a warning in case the user selects the ppviz format with anything
other than TCP.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:02:24 +01:00
Simon Sundberg
cfdf224d93 pping: Remove usage of deprecated libbpf API
The libbpf API has deprecated a number of functions used by the pping
loader. While a couple of functions have simply been renamed,
bpf_object__find_program_by_title has been completely deprecated in
favor of bpf_object__find_program_by_name. Therefore, change so that
BPF programs are found based on the C function names rather than
section names.

Also remove defines of section names as they are no longer used, and
change the section names in pping_kern.c to use "tc" instead of
"classifier/ingress" and "classifier/egress".

Finally replace the flags json_format and json_ppviz in pping_config
with a single enum for the different output formats. This makes the
logic for which output format to use clearer compared to relying on
multiple (supposedly) mutually exclusive flags (and implicitly
assuming standard format if neither flag was set).

One potential concern with this commit is that it introduces some
"magical strings". In case the function names in pping_kern.c are
changed it will require multiple changes in pping.c.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-01-17 18:43:04 +01:00
Simon Sundberg
2f5c3fc5b0 pping: Add tc ingress hook as alternative to XDP
For some machines, XDP may not be suitable due to ex. lack of XDP
support in NIC drivers or another program already being attached to
the XDP hook on the desired interface. Therefore, add an option to use
the tc-ingress hook instead of XDP to attach the pping ingress BPF
program on.

In practice, this adds an additional BPF program to the object file (a
TC ingress program). To avoid loading an unnecessary BPF program, also
explicitly disable autoloading for the ingress program not selected.

Also, change the tc programs to return TC_ACT_OK instead of
BPF_OK. While both should be compatible, the TC_ACT_* return codes
seem to be more commonly used for TC-BPF programs.

Concerns with this commit:
- The error messages for XDP attach failure has gotten slightly less
  descriptive. I plan to improve the code for attaching and detaching
  XDP programs in a separate commit, and will then address that.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-01-17 14:02:19 +01:00
Simon Sundberg
d85329f728 pping: Refactor output code
Simplify the three output functions by breaking them up into smaller
helper functions. Also introduce the pping_event union, which can hold
either an rtt_event or flow_event.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-06-23 15:02:26 +02:00
Simon Sundberg
1975367a3a pping: Add end-of-flow message from userspace map cleanup
Make the flow_timeout function call the current output function to
simulate a flow-closing event. Also some other minor cleanup/fixes.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-06-23 15:02:26 +02:00
Simon Sundberg
543f75c9d8 pping: Add support for "flow events"
Add "flow events" (flow opening or closing so far) which will trigger
a printout of message.

Note: The ppviz format will only print out the traditional rtt events
as the format does not include opening/closing messages.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-06-23 15:02:26 +02:00
Simon Sundberg
f96cfb7d7c pping: Track nr sent/received packets and bytes
Add per-flow tracking of number of packets and bytes
sent/received. Add these to the JSON output format.

Also update README regarding concurrency issue when updating these
statistics.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-06-23 15:02:25 +02:00
Simon Sundberg
b4a810b09b pping: Add timestamp and min-RTT to output
To add timestamp to output, push the timestamp when packet was
processed from kernel as part of the rtt-event. Also keep track of
minimum encountered RTT for each flow in kernel, and also push that as
part of the RTT-event.

Additionally, avoid pushing RTT messages at all if no flow-state
information can be found (due to ex. being deleted from egress side),
as no valid min-RTT can then be given. Furthermore, no longer delete
flow-information once seeing the FIN-flag on egress in order to keep
useful flow-state around for RTT-messages longer. Due to the
FIN-handshake process, it is sufficient if the ingress program deletes
the flow-state upon seeing FIN. However, still delete flow-state from
either ingress or egress upon seeing RST flag, as RST does not have a
handshake process allowing for delayed deletion.

While minimum RTT could also be tracked from the userspace process,
userspace is not aware of when the flow is closed so would have to add
additional logic to keep track of minimum RTT for each flow and
periodically clean them up. Furthermore, keeping RTT statistics in the
flow-state map is useful for implementing future features, such as an
RTT-based sampling interval. It would also be useful in case pping is
changed to no longer have a long-running userspace process printing
out all the calculated RTTs, but instead simply occasionally looks up
the RTT from the flow-state map.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-06-23 15:02:25 +02:00
Simon Sundberg
93b6c0eafa pping: Major refactor and add -f and -c options
Merge the pping_kern_tc.c, pping_kern_xdp.c and pping_helpers.h into
the single file pping_kern.c. Do not change any of the BPF code,
except renaming the map ts_start to packet_ts.

To handle both BPF programs kept in single ELF-file, change loading
mechanism to extract and attach both tc and XDP programs from it. Also
refactor main-method into several smaller functions to reduce its
size.

Finally, added the --force (-f) and --cleanup-interval (-c) options to
the argument parsing, and improved the parsing of the
--rate-limit (-r) option.

NOTE: The verifier rejects program in it's current state as too
large (over 1 million instructions). Setting the TCP_MAX_OPTIONS in
pping_kern.c to 5 (or less) solves this. Unsure at the moment what
causes the verifier to think the program is so large, as the code in
pping_kern.c is identical to the one from the three files it was
merged from.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-04-15 14:13:54 +02:00
Simon Sundberg
cc8a16650b pping: Add userspace configuration
Implement basic mechanic for parsing arguments from userspace and
passing them to a global config variable in the BPF programs.

This also changes the basic use of the program from:
    $./pping interface
to:
    $./pping -i interface

Also, revert to using the memset solution for the map_ipv4_to_ipv6
function to avoid the ipv4_prefix constant being stored in the .rodata
section. This makes it easier to set the value for the global config
variable from userspace, as the only thing left in the .rodata section
is the config struct.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-03-22 12:23:27 +01:00
Simon Sundberg
1446e6edec pping: Load tc-bpf program with libbpf
Load and pin the tc-bpf program in pping.c using libbpf, and only
attach the pinned program using iproute. That way, can use features
that are not supported by the old iproute loader, even if iproute does
not have libbpf support.

To support this change, extend bpf_egress_loader with option to load
pinned program. Additionally, remove configure script and parts of
Makefile that are no longer needed. Furthermore, remove multiple
definitions of ts_start map, and place singular definition in
pping_helpers.h which is included by both BPF programs.

Also, some minor fixes based on Toke's review.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-03-02 17:40:51 +01:00
Simon Sundberg
cecd6b54f2 pping: Inital rate limit implementation
Add a per-flow rate limit, limiting how often new timestamp entries
can be created. As part of this, add per-flow state keeping track
of when last timestamp was created and last seen identifier for each
flow.

Additionally, remove timestamp entry as soon as RTT is
calculated, as last seen identifier is used to find first unique value
instead. Furthermore, remove packet_timestamp struct and only use
__u64 as timestamp, as used memeber is no longer needed.

This initial commit lacks cleanup of flow-state, user-configuration of
rate limit, mechanism to handle bursts etc.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-03-01 18:16:48 +01:00
Simon Sundberg
3268ba87bb pping: Refactor TC and XDP programs
Refactor TC and XDP programs to reuse common logic for parsing
packets. Add functions for parsing packets for an identifier to
pping_helpers.h which both TC and XDP parts use. Also make it easier
to extend pping with support for new protocols, as only new parsing
functions have to be added and inserted into a single place.

Also add reserved members to end of structs in pping.h to indicate
padding.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-09 18:09:30 +01:00
Simon Sundberg
eafdf87d80 pping: Fix struct alginment issues
Move some members in network_tuple and rtt_event around to avoid holes.

Also remove some uncecessary parentheses before & operator, and add
local definitions of AF_INET and AF_INET6.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-09 13:00:28 +01:00
Simon Sundberg
670df84bd9 pping: Add IPv6 support
Several changes to add IPv6 support:
- Change structs in pping.h
  - replace ipv4_flow with network_tuple
  - rename ts_key to packet_id
  - rename ts_timestamp to packet_timestamp
- Add map_ipv4_to_ipv4 in pping_helpers.h
  - Also remove obsolete fill_ipv4_flow
- Rewrite pping_kern*
  - parse either IPv4 or IPv6 header (depending on proto)
  - Use map_ipv4_to_ipv6 to store IPv4 address in network_tuple
Support printout of IPv6 addresses in pping.c
  - Add function format_ip_address as wrapper over inet_ntop
  - Change handle_rtt_event to first format IP-address strings in
    local buffers, then perform single printout

While some steps have been taken to be more general towards different
types of packet identifiers (not just the currently supported TCP
timestamps), significant refactorization of pping_kern* will still be
required. Also, pping_kern_xdp and pping_kern_tc also have large
sections of very similar code that can be refactored into functions.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-08 20:28:46 +01:00
Simon Sundberg
7410d5cc2c pping: Various minor fixes
Perform various fixes and tweaks:
- Rename several defines to make them more informative
- Remove unrolling of loop in BPF programs
- Reuse defines for program sections between userspace and kernel
  space programs
- Perform fork+exec to run bpf_egress_loader script instead of
  system()
- Add comment to copied scripts indicating I've modified them
- Add pping.h and pping_helpers.h as dependencies in Makefile

Also, add a brief description of what PPing is and how it works to
README

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-04 19:48:01 +01:00
Simon Sundberg
b1ce4ee69b pping: Format headers and fix TODO.md
Format the header files in the Linux kernel style (missed in previous
commit). Also fix a formating error in TODO.md that cause empty
checkboxes to not display correctly.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-04 19:43:39 +01:00
Simon Sundberg
d3a5de62c4 pping: Format code and add SPDX lincense tags
Format the code using the .clang-format from the kernel source tree,
with a few manual tweaks here and there. Also, remove the TODO list
from comment of pping.c and instead put it in TODO.md.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-04 19:43:03 +01:00
Simon Sundberg
1010e53065 pping: Share ts_start map, pping at working stage
Make tc pin the ts_start map when loading the TC-BPF program, and
rewrite XDP loader to reuse map pinned by tc.

Also add comment with TODO list in pping.c.

Testing pping by adding a delay through a netem qdisc in the test
environment shows that the reported RTT will approach 100ms for any
delay lower than 100ms, but the correct RTT for any delay over
100ms. Root cause is unknown, but Pollere's original pping
implementation (as well as a bpftrace based pping implementation)
shows the same issue. This issue has not been observed when running on
real interfaces without a netem qdisc.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-04 19:41:40 +01:00
Simon Sundberg
954c66b0e8 pping: Add TC-BPF program for egress
Split and rename files so there is one userspace program (pping) and
two kernel-space ones (one for XDP and one for TC-BPF).

Copy the shell script for loading the TC-BPF program from
traffic-pacing-edt folder, but add support for loading a specific
section.

The XDP and TC-BPF programs do not share the ts_start map, so program
does not work.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2021-02-04 19:39:16 +01:00