Commit Graph

119 Commits

Author SHA1 Message Date
Simon Sundberg
a301900fbd pping: Add switch for which IP stats are aggregated by
By default ePPing will aggregate RTTs based on the src IP of the reply
packet. I.e. the RTT A->B->A will be aggregated based on IP of B. In
some scenarios it may be more interesting to aggregate based on the
dst IP of the reply packet (IP of A in above example). Therefore, add
a switch (--aggregate-reverse) which makes ePPing aggregate RTTs
based on the dst IP of the reply packet instead of the src IP. In
other words, by default ePPing will aggregate traffic based on where
it's going to, but with this switch you can make ePPing aggregate
traffic based on where it's comming from instead.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
Simon Sundberg
5ef4ffdd1b pping: Reset aggregated RTTs after each report
Instead of keeping all RTTs since ePPing started, reset the aggregated
stats after each time they're reported so the report only shows the
RTTs since the last report.

To avoid concurrency issues due to user space reading and resetting
the map while the BPF programs are updating it, use two aggregation
maps, one active and one inactive. Each time user space wishes to
report the aggregated RTTs it first switches which map is actively
used by the BPF progs, and then reads and resets the now inactive map.

As the RTT stats are now periodically reset, change the
histogram (aggregated_rtt_stats.bins) to use __u32 instead of __u64
counters as risk of overflowing is low (even if 1 million RTTs/s is
added to the same bin, it would take over an hour to overflow, and
report frequency is likely higher than that).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
Simon Sundberg
3a7b15ab3e pping: Add option to aggregate RTTs
Add an option -a or --aggregate to provide an aggregate report of RTT
samples every X seconds. This is currently mutually exclusive with the
normal per-RTT sample reports.

The aggregated stats are never reset, and thus contain all RTTs since
the start of tracing. The next commit will change this to reset the
stats after every report, so that each report only contain the RTTs
since the last report.

The RTTs are aggregated and reported per IP-prefix, where the user can
modify the size of the prefixes used for IPv4 and IPv6 using the
--aggregate-subnet-v4/v6 flags.

In this intital implementation for aggregating RTTs, the minimum and
maximum RTT are tracked and all RTTs are added to a histogram. It uses
a predetermined number of bins of equal width (set to 1000 bins, each
1 ms wide), see RTT_AGG_NR_BINS and RTT_AGG_BIN_WIDTH in pping.h. In
the future this could be changed to use more sophisticated histograms
that better capture a wide variety of RTTs.

Implement the periodic reporting of RTTs by using a
timerfd (configured to the user-provided interval) and add it to the
main epoll-loop.

To minimize overhead from the hash lookups, use separate maps for IPv4
and IPv6, so that for IPv4 traffic the hashmap key is only 4
bytes (instead of 16). Furthermore, limit the maximum IPv6 prefix size
to 64 so that the IPv6 map can use a 8 byte key. This limits the
maximum prefix size for IPv6 to /64.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
Simon Sundberg
149e2c6d90 pping: Define map sizes
Instead of specifying the map size directly in the map definitions,
add them as defines at the top of the file to make them easier to
change (don't have to find the correct map among the map
definitions). This pattern will also simplify future additions of
maps, where multiple maps may share the same size.

While at it, increase the default packet_ts to 131072 (2^17) entries,
as the previous value of 16384 (2^14) which, especially for the
packet_ts map, was fairly underdimensioned. If only half of the
timestamps are echoed back (due to ex. delayed ACK), it would in
theory be enough with just 16k / (500 * 1) = 32 concurrent flows to
fill it up with stale entries (assuming default cleanout interval of
1s). Increasing the size of these maps will increase the corresponding
memory cost from 2^14 * (48 + 4) = 832 KiB and 2^14 * (44 + 144) =
2.94 MiB to 2^17 * (48 + 4) = 6.5 MiB and 2^17 * (44 + 144) = 23.5
MiB, respectively, which should generally not be too problematic.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:57 +02:00
Simon Sundberg
699e8e839e pping: Improve cross-thread abort handling
Remove the global keep_running variable and instead write to a pipe to
tell the main thread to abort in case periodical map cleanup
fails. Add the reading-side of this pipe to the epoll loop in the main
thread and abort if anything is written to the pipe. To abort the main
thread, update the main loop so it silently stops if it receives the
special value PPING_ABORT.

As the map cleaning thread can now immediately tell the main loop to
abort, it is no longer necessary to have a short
timeout (EPOLL_TIMEOUT_MS) on the main loop quickly detect changes in
the keep_running flag. So change the epoll loop to wait indefinitely
for one of the fds to update instead of timing out frequently.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-06 18:01:30 +02:00
Simon Sundberg
12dc63b965 pping: Use signalfd instead of signalhandler
Use the signalfd API to handle graceful shutdown on SIGINT/SIGTERM. To
watch the signalfd, create an epoll instance and add both the signalfd
and the perf-buffer to the epoll instance so that both can be
monitored in the main loop with epoll_wait().

This avoids the signal handler from interrupting the perf-buffer
polling and the other issues with the asynchronous signal
handling. Furthermore, the restructuring of the main loop to support
watching multiple file descriptors makes it possible to add additional
events to the main loop in the future (such as a periodical task
triggered by a timerfd).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-07-04 16:10:49 +02:00
Simon Sundberg
1cb4e93b04 pping: Fix edge cases of parse_bounded_double()
Fix two edge cases with the parse_bounded_double() function.

1. It accept an empty string without raising an error. This should not
   have been an issue in practice as getopt_long() should have detected
   it as an lack of argument. This is addressed by adding a check for
   if it has parsed anything at all.

2. It could overflow/underflow without raising an error. This is
   addressed by adding a check of errno (which is set in case of
   overflow/underflow, but not in case of conversion error).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-05-25 16:30:56 +02:00
Simon Sundberg
83a85adb96 pping: Minor cleanup of argument parsing
The parse_arguments() function used to have a separate variable for
each float (or rather a double) value it would parse from the user. As
only one argument will be parsed at the time this is redudant and will
require more and more variables as new options are added. Replace all
these variables with a single "user_float", which is used for all
options that parse a float from the user.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-05-25 16:30:02 +02:00
Simon Sundberg
22ac4d9192 pping: Factor out sending of RTT event
Extract the logic for filling in and sending an RTT event to a
function. This makes it consistent with other send_*_event() functions
and will make it easier/cleaner to add an option to aggregate the RTT
instead of sending it.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-05-25 13:17:41 +02:00
Simon Sundberg
2578b819d8 pping: Lowercase pping_output_format enum
The enum pping_output_format was uppercased, which is unconventional
for a type. Change it to lowercase.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2023-05-25 13:16:01 +02:00
Simon Sundberg
9be0a6a8e4 pping: Move NULL-check to compile with LLVM-15
When compiled with LLVM-15 (clang-15 and llc-15), the verifier would
reject the tsmap_cleanup program as reported in #63. To prevent this
add a NULL-check for df_state after the map lookup, to convince the
verifier that we're not trying to dereference a pointer to a map value
before checking for NULL. This fix ensures that the generated bytecode
by LLVM-12 to LLVM-15 passes the verifier (tested on kernel 5.19.3).

There was already an NULL-check for df_state in the (inlined by the
compiler) function fstate_from_dfkey() which checked df_state before
accessing its fields (the specific access that angered the verifier
was df_state->dir2). However, with LLVM-15 the compiler reorders the
operations so that df_state->dir2 is accessed before the NULL-check is
performed, thus upsetting the verifier. This commit removes the
internal NULL-check in fstate_from_dfkey() and instead performs the
relevant NULL-check directly in the tsmap_cleanup prog instead. In all
other places that fstate_from_dfkey() ends up being called there are
already NULL-checks for df_state to enable early returns.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-09 11:52:15 +01:00
Simon Sundberg
91d7242a44 pping: Abort program if map cleanup fails
Previously the program would only print out an error message if the
cleanup of a map failed, and then keep running. Each time the
periodical cleanup failed the error message would be repeated, but no
further action taken. Change this behavior so that the program instead
terminates the cleanup thread and aborts the rest of the program.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-08 09:36:20 +01:00
Simon Sundberg
afc66aa570 pping: Close bpf_object on shutdown/error
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-08 09:36:20 +01:00
Simon Sundberg
0b0b1e9552 pping: Add hint on XDP failure
Due to a kernel bug for XDP programs loaded via libxdp that use global
functions (see https://lore.kernel.org/bpf/8735gkwy8h.fsf@toke.dk/t/),
XDP mode only works on relatively recent kernels where the bug is
patched (or kernels where the patch has been backported). As many
users may not have such a recent kernel they will only see a confusing
verifier error like the following:

Starting ePPing in standard mode tracking TCP on test123
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libbpf: prog 'pping_xdp_ingress': BPF program load failed: Invalid argument
libbpf: prog 'pping_xdp_ingress': -- BEGIN PROG LOAD LOG --
Func#1 is safe for any args that match its prototype
Validating pping_xdp_ingress() func#0...
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
; int pping_xdp_ingress(struct xdp_md *ctx)
0: (bf) r6 = r1                       ; R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0)
1: (b7) r7 = 0                        ; R7_w=invP0
; struct packet_info p_info = { 0 };
2: (7b) *(u64 *)(r10 -8) = r7         ; R7_w=invP0 R10=fp0 fp-8_w=00000000
3: (7b) *(u64 *)(r10 -16) = r7        ; R7_w=invP0 R10=fp0 fp-16_w=00000000
4: (7b) *(u64 *)(r10 -24) = r7        ; R7_w=invP0 R10=fp0 fp-24_w=00000000
5: (7b) *(u64 *)(r10 -32) = r7        ; R7_w=invP0 R10=fp0 fp-32_w=00000000
6: (7b) *(u64 *)(r10 -40) = r7        ; R7_w=invP0 R10=fp0 fp-40_w=00000000
7: (7b) *(u64 *)(r10 -48) = r7        ; R7_w=invP0 R10=fp0 fp-48_w=00000000
8: (7b) *(u64 *)(r10 -56) = r7        ; R7_w=invP0 R10=fp0 fp-56_w=00000000
9: (7b) *(u64 *)(r10 -64) = r7        ; R7_w=invP0 R10=fp0 fp-64_w=00000000
10: (7b) *(u64 *)(r10 -72) = r7       ; R7_w=invP0 R10=fp0 fp-72_w=00000000
11: (7b) *(u64 *)(r10 -80) = r7       ; R7_w=invP0 R10=fp0 fp-80_w=00000000
12: (7b) *(u64 *)(r10 -88) = r7       ; R7_w=invP0 R10=fp0 fp-88_w=00000000
13: (7b) *(u64 *)(r10 -96) = r7       ; R7_w=invP0 R10=fp0 fp-96_w=00000000
14: (7b) *(u64 *)(r10 -104) = r7      ; R7_w=invP0 R10=fp0 fp-104_w=00000000
15: (7b) *(u64 *)(r10 -112) = r7      ; R7_w=invP0 R10=fp0 fp-112_w=00000000
16: (7b) *(u64 *)(r10 -120) = r7      ; R7_w=invP0 R10=fp0 fp-120_w=00000000
17: (7b) *(u64 *)(r10 -128) = r7      ; R7_w=invP0 R10=fp0 fp-128_w=00000000
18: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
;
19: (07) r2 += -128                   ; R2=fp-128
; if (parse_packet_identifer_xdp(ctx, &p_info) < 0)
20: (85) call pc+13
R1 type=ctx expected=fp
Caller passes invalid args into func#1
processed 206542 insns (limit 1000000) max_states_per_insn 32 total_states 13238 peak_states 792 mark_read 40
-- END PROG LOAD LOG --
libbpf: failed to load program 'pping_xdp_ingress'
libbpf: failed to load object 'pping_kern.o'
Failed attaching ingress BPF program on interface test123: Invalid argument
Failed loading and attaching BPF programs in pping_kern.o

To help users that run into this issue when loading the program in
generic or unspecified mode, add a small hint suggesting to
upgrade the kernel or use the tc ingress mode instead in case
attaching the XDP program fails.

However, if loaded in native mode, instead give the suggestion to try
loading in generic mode instead. While libbpf and libxdp already add
some messages hinting at this, this hint clarifies how to do this with
ePPing (using the --xdp-mode argument).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-08 09:36:20 +01:00
Simon Sundberg
ff0c5a9b4a pping: Add option to configure XDP mode
Add an option to let the user configure which mode to load the XDP
program in (unspecified, native or generic).

Set the default mode to native (was unspecified previously) as that is
what the user most likely wants to use (generic or unpsecified falling
back on generic will likely have worse performance).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-08 09:35:46 +01:00
Simon Sundberg
8732c4f813 pping: Change default ingress program from XDP to tc
Using the XDP ingress hook requires a newer kernel (needs Toke's patch
fixing the verification of global function for BPF_PROG_TYPE_EXT
programs) than tc mode, is will likely perform worse than tc if
running in generic mode (due to no driver support for
XDP). Furthermore, even when XDP works and has driver support, its
performance benefit over tc is likely small as the packets are always
passed on to the network stack regardless (not creating a fast-path
that bypasses the network stack). Therefore, use the tc ingress hook
as default instead, and only use XDP if explicitly required by the
user (-I/--ingress hook xdp).

This partly addresses issue #49, as ePPing should no longer by default
get the confusing error message from failing verification if the
kernel lacks Toke's verifier patch.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-08 09:30:31 +01:00
Simon Sundberg
832bdea23f pping: Define BPF program names
Define the BPF program names in the user space component. The strings
corresponding to the BPF program names were before inserted in several
places, including in multiple string comparison, which is error prone
and could leave to subtle errors if the program names are changed and
not updated correctly in all places. With the program name string
being defined, they only have to be changed in a single place.

Currently only the names of the ingress programs occur in multiple
places, but also define the name for the egress program to be
consistent.

Note that even after this change one has the sync the defined values
with the actual program names declared in the pping_kern.c
file. Ideally, these would all be defined in a single place, but not
aware of a convenient way to make that happen (cannot use the defined
strings as function names as they are not identifiers, and if defined
as identifiers instead it would not be possible to use them as
strings).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-08 09:30:31 +01:00
Simon Sundberg
ddf25abfcc pping: Check if creating clsact on ingress
The userspace loader would only check if the tc clsact was created
when the egress program was loaded. Thus, if the ingress program
created the clsact the egress program would not have to create the
clsact, the ePPing would thus falsely believe it did not create a
clsact and fail to remove it on shutdown even if --force was used. Fix
this by checking if either ingress or egress created clsact.

This bug was introduced as a sneaky side effect of commit
78b45bde56 (pping: Use libxdp to load
and attach XDP program). Before this commit the egress program (for
which there is only a tc alternative) would be loaded first, and thus
it was sufficient to check if it created the clsact. When switching to
libxdp however, the ingress program (specifically the XDP program) had
to be loaded first, and thus the order of loading ingress and egress
program were swapped. Therefore, it was no longer sufficient to only
check the egress program as the tc ingress program may have created
the clsact before the the egress program is attached (and only
checking the ingress program would also not be enough as the tc
ingress program may never be loaded if XDP mode is used instead).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-08 09:30:31 +01:00
Toke Høiland-Jørgensen
af5db036ab Merge pull request #55 from simosund/pping-skip-syn
PPing: Add option to ignore SYN-packets
2022-11-06 14:18:10 +01:00
Simon Sundberg
e932174882 pping: Fix XDP ingress ifindex
Set the ingress_ifindex to the ctx->ingress_ifindex rather than
ctx->rx_queue_index. This fixes a bug that was accidently introduced
in commit #add8885, and which broke the localfilt functionality if the
XDP hook was used on ingress (the FIB lookup would fail).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-06 14:14:54 +01:00
Simon Sundberg
251c9b7ad3 pping: Wait for id shift before timestamping packet in new flow
Make ePPing wait until the first shift of identifier (the "edge")
before starting to timestamp packets for new flows (for TCP flows we
do not see the start of).

The reason this is necessary is that if ePPing start monitoring a flow
in the middle of it (ePPing did not see the start of the flow), then
we cannot know if the first TSval we see is actually the first
instance of the TSval in that flow, so we have to wait until the next
TSval to ensure we get the first instance of a TSval (otherwise we may
underestimate the RTT by up to the TCP timestamp update period). To
avoid the first RTT sample potentially being underestimated this fix
essentially ignores the first RTT sample instead.

However, it is not always necessary to wait until the first shift. For
TCP traffic where we see the initial handshake we know that we've seen
the start of the flow. Furthermore, for ICMP traffic it's generally
unlikely that there are duplicate identifiers to begin with, so also
allow that to start timestamping right away.

It should be noted that after the previous commit (which changed
ePPing to ignore TCP SYN-packets by default), ePPing will never see
the handshake and thus has to assume that it started to monitor all
flows in the middle. Therefore, ePPing will (by default) now miss both
the RTT during the handshake, as well as RTT for the first few packets
sent after the handshake (until the TSval is updated).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-04 15:37:51 +01:00
Simon Sundberg
70f255cbf8 pping: Ignore SYN packets by default
Make ePPing ignore TCP SYN packets by default, so that the
initial handshake phase of the connection is ignored. Add an
option (--include-syn/-s) to explicitly include SYN packets.

The main reason it can be a good idea to avoid SYN-packets is to avoid
being affected by SYN-flood attacks. When ePPing also includes
SYN-packets it becomes quite vulnerable to SYN-flood attacks, which
will quickly fill up its flow_state table, blocking actual useful
flows from being tracked. As ePPing will consider the connection
opened as soon as it sees the SYN-ACK (it will not wait for final
ACK), flow-state created from SYN-flood attacks will also stay around
in the flow-state table for a long time (5 minutes currently) as no
RST/FIN will be sent that can be used to close it.

The drawback from ignoring SYN-packets is that no RTTs will be
collected during the handshake phase, and all connections will be
considered opened due to "first observed packet".

A more refined approach could be to properly track the full TCP
handshake (SYN + SYN-ACK + ACK) instead of the more generic "open once
we see reply in reverse direction" used now. However, this adds a fair
bit of additional protocol-specific logic. Furthermore, to track the
full handshake we will still need to store some flow-state before the
handshake is completed, and thus such a solution would still be
vulnerable to SYN-flood attacks (although the incomplete flow states
could potentially be cleaned up faster).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-11-04 13:47:08 +01:00
Simon Sundberg
867b659534 pping: Remove outdated issue from TODO
The previous commit fixes the issue of reordered packets being able to
bypass the unique TSval check, so remove the corresponding section
from the issues in the TODO.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-09-01 15:47:55 +02:00
Simon Sundberg
03e9245ae9 pping: ensure TSval is monotonically increasing
The mechanism to ensure that only the first instance of each TSval is
timestamped is a simple equals check. This is check may fail if there
are reordered packets.

Consider a sequence of packets A, B, C and D, where A and B have
TSval=1 and C and D have TSval=2. If all packets arrive in
order (ABCD), then A and C will correctly be the only packets that are
timestamped (as B and D will have the same TSval as the previously
observed one). However, consider if B is reorderd so instead the
packets arrive as ACBD. In this scenario all ePPing will attempt to
timestamp all (instead of only A and C), as each packet now has a
different (but not always higher) TSval than the last seen
packet. Note that it will only sucessfully create the timestamps for
the later duplicated TSvals if the previous timestamp for the same
TSval has already been cleared out, so this is mainly an issue when
RTT < 1ms.

Fix this by only allowing a packet to be timestamped if its TSval is
stricly higher (accounting for wrap-around) than the last seen TSval,
and likewise only update last seen TSval if it is strictly higher than
the previous one.

To allow this calculation, also convert TSval and TSecr from network
byte order to host byte order when parsing the packet. While delaying
the transform from network to host byte order until the comparison
between the packet's TSval and last seen TSval could potentially save
the overhead of bpf_ntohs for some packets that do not need to go
through this check, most TCP packets will end up performing this
check, so performance difference should be minimal. Therefore, opt for
the simplier approach of converting TSval and TSecr directly, which
also makes them easier to interpret if ex. dumping the maps.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-09-01 15:40:12 +02:00
Toke Høiland-Jørgensen
1b38fda8fb Merge pull request #48 from simosund/pping-debugcount-fix
pping: Add missing debug timeout count
2022-06-21 21:17:36 +02:00
Simon Sundberg
3c698076ee pping: Add missing debug timeout count
The debug counter for timed out (deleted by periodical cleanup) flow
states was never incremented, so fix that.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-06-21 19:00:34 +02:00
Simon Sundberg
add888566d pping: Make packet parsing a global function
Use global functions to make use of function-by-function verification.
This allows the verifier to analyze parts of the program individually
from each other, which should help reduce the verification
complexity (the number of instructions the verifier must go through to
verify the program) and help prevent exponentially growing with every
loop or branch added to the code.

In this case, break out the packet parsing (parse_packet_identifier)
as a global function, so that it can be handled separately from the
logic after it (updating flow state, saving timestamps, matching
replies to timestamps, calculating and pushing RTTs) etc. To do this,
create small separate wrapper functions (parse_packet_identifier_tc()
and parse_packet_identifier_xdp()) for tc/xdp, so that the verifier
can correctly identify the arguments as pointers to
context (PTR_TO_CTX) when evaluating the global functions. Also create
small wrapper functions pping_tc() and pping_xdp() which call the
corresponding parse_packet_identifier_tc/xdp function.

For this to work in XDP mode (which is the default), the kernel must
have been patched with a fix that addresses an issue with how global
functions are verified for XDP programs, see:
https://lore.kernel.org/all/20220606075253.28422-1-toke@redhat.com/

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-06-21 17:58:59 +02:00
Simon Sundberg
404a70c524 pping: Remove packet pointers from packet_info
Do not provide pointers into the original packet from packet_info
anymore (which the verifier has to ensure are valid), and instead
directly parse all necessary data in parse_packet_identifier and then
only use the parsed data in later functions.

This allows a cleaner separation of concerns, where the parsing
functions parse all necessary data from the packets, and other
functions that need information about the packet only rely on the data
provided in packet_info (and do not attempt to parse any data on their
own).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-06-21 17:33:58 +02:00
Simon Sundberg
aaf6036655 pping: Move some info from parsing_context to packet_info
Remove the is_egress and ingress_ifindex from the parsing_context
struct to the packet_info struct. Also change the member is_egress to
is_ingress to better fit with the ingress_ifindex member.

These members were only in parsing_context because they were
convenient to fill in right from the start. However, it semantically
makes little sense for the parsing_context to contain these because
they are not used for any parsing, and they fit better with the
packet_info. This also allows later functions (is_local_address(),
pping_timestamp_packet() and pping_match_packet()) to get rid of their
dependency on parsing_context, as it was only used for the is_egress
and ingress_ifindex members (they do not do any parsing). After this
change, parsing_context is only used for the initial parsing, and
packet_info contains all the necessary data for all the functions
related to the pping logic that runs after the packet has been parsed.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-06-10 17:01:14 +02:00
Toke Høiland-Jørgensen
78b45bde56 pping: Use libxdp to load and attach XDP program
Since we now have libxdp as a submodule, we can switch pping over to using
libxdp for loading its XDP program. This requires switching the order of
attachment, because libxdp needs to be fed the BPF object in an unloaded
state (and will load it as part of attaching).

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
2022-04-20 17:08:36 +02:00
Toke Høiland-Jørgensen
58fcc521b7 pping: Guard local definitions of AF_* constants in pping_kern.c
The header files included from pping_kern.c include definitions of AF_INET
and AF_INET6, leading to warnings like:

pping_kern.c:25:9: warning: 'AF_INET' macro redefined [-Wmacro-redefined]
        ^
/usr/include/bits/socket.h:97:9: note: previous definition is here
        ^
pping_kern.c:26:9: warning: 'AF_INET6' macro redefined [-Wmacro-redefined]
        ^
/usr/include/bits/socket.h:105:9: note: previous definition is here
        ^
2 warnings generated.

Fix this by guarding the definitions behind suitable ifdefs.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
2022-04-14 21:29:32 +02:00
Simon Sundberg
2e7595b3ca pping: Replace boolean connection state flags with enum
The connection state had 3 boolean flags related to what state it was
in (is_empty, has_opened and has_closed). Only specific combinations
of these flags really made sense (has_opened/has_closed didn't really
mean anything if is_empty, and if has_closed one would expect is_empty
to be false and has_opened to be true etc.). Therefore, replace these
combinations of boolean values with a singular enum which is used to
check if the flow is empty, waiting to open (seen outgoing packet but
no response), is open or has closed.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-30 18:12:46 +02:00
Simon Sundberg
4fb1f6de64 pping: Combine flow state in each direction to a dualflow state
Combine the flow state entries for both the "forward" and "reverse"
direction of the flow into a single dualflow state. Change the
flowstate map to use the dualflow state so that state for both
directions can be retrieved using a single map lookup.

As flow states are now kept in pairs, cannot directly create/delete
states from the BPF map each time a flow opens/closes in one
direction. Therefore, update all logic related to creating/deleting
flows. For example, use "empty" slot in dualflow state instead of
creating a new map entry, and only delete the dual flow state entry
once both directions of the flow have closed/timed out.

Some implementation details:

Have implemented a simple memcmp function as I could not get the
__builtin_memcmp function to work (got error "libbpf: failed to
find BTF for extern 'memcmp': -2").

To ensure that both directions of the flow always look up the same
entry, use the "sorted" flow tuple (the (ip, port) pair that is
smaller is always first) as key. This is what the memcmp is used for.

To avoid storing two copies of the flow tuple (src -> dst and dst ->
src) and doing additional memcmps, always store the flow state for the
"sorted" direction as the first direction and the reverse as the
second direction. Then simply check if a flow is sorted or not to
determine which direction in the dual flow state that matches. Have
attempted to at least partially abstract this detail away from most of
the code by adding some get_flowstate_from* helpers.

The dual flow state simply stores the two (single direction) flow
states as the struct members dir1 and dir2. Use these two (admittedly
poorly named) members instead of a single array of size 2 in order to
avoid some issues with the verifier being worried that the array index
might be out of bounds.

Have added some new boolean members to the flow state to keep track of
"connection state". In addition the the previous has_opened, I now
also have a member for if the flow is "empty" or if it has been
closed. These are needed to cope with having to keep individual flow
states for both directions of the flow around as long as one direction
of the flow is used. I plan to replace these boolean "connection
state" members with a single enum in a future commit.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-30 18:12:46 +02:00
Simon Sundberg
75a979fc31 pping: Refactor parsing of packet identifiers
Refactor functions for parsing protocol-specific packet
identifiers (parse_tcp_identifier, parse_icmp6_identifer and
parse_icmp_identifer) so they no longer directly fill in the
packet_info struct. Instead make the functions take additional
pointers as arguments and fill in a protocol_info struct.

The reason for this change is to decouple the
parse_<protocol>_identifier functions from the logic of how the
packet_info struct should be filled. The parse_packet_indentifier is
now solely responsible for filling in the members of packet_info
struct correctly instead of working in tandem with the
parse_<protocol>_identifier, filling in some members each.

This might result in a minimal performance degradation as some values
are now first filled in the protocol_info struct and later copied to
packet_info instead of being filled in directly in packet_info.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-22 13:24:09 +01:00
Simon Sundberg
2935fb05cc pping: Minor formating fixes
Format code using clang-format from the kernel tree. However, leave
code in orginal format in some instances where clang-format clearly
reduces readability of code (ex. do not remove alginment of comments
for struct members and long options).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-22 09:11:53 +01:00
Simon Sundberg
8c0084dd0e pping: Update TODO about timestamp ref count
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-15 15:39:53 +01:00
Simon Sundberg
11f1d88742 pping: Keep track of outstanding timestamps
Add a counter of outstanding (unmatched) timestamped entires in the
flow state. Before a timestamp lookup is attempted, check that there
are any outstanding timestamps, otherwise avoid the unecessary hash
map lookup.

Use 32 bit counter for outstanding timestamps to allow atomic
increments/decrements using __synch_fetch_and_add. This operation is
not supported on smaller integers, which is why such a large counter
is used. The atomicity is needed because the counter may be
concurrently accessed by both the ingress/egress hook as well as the
periodical map cleanup.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-15 15:39:53 +01:00
Simon Sundberg
e7201c9eab pping: Update TODO based on map cleaning improvements
Also add description of three potential issues introduced by the
changes to cleanup process.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-10 09:59:21 +01:00
Simon Sundberg
f22025f716 pping: More aggressive map cleanup
Add conditions that allows removing old flow and timestamp entries
sooner.

For flow map, have added conditions that allow unopened flows and ICMP
flows to be removed earlier than open TCP flows (currently both set to
30 sec instead of 300 sec).

For timestamp entries, allow them to be removed if they're more than
TIMESTAMP_RTT_LIFETIME (currently 8) times higher than the flow's
sRTT.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-10 09:54:52 +01:00
Simon Sundberg
72404b6767 pping: Add map cleanup debug info
Add some debug info to the periodical map cleanup process. Push debug
information through the events perf buffer by using newly added
map_clean_event.

The old user space map cleanup process had some simple debug
information that was lost when transitioning to using bpf_iter
instead. Therefore, add back similar (but more extensive) debug
information but now collected from the BPF-side. In addition to stats
on entries deleted by the cleanup process, also include stats on
entries deleted by ePPing itself due to matching (for timestamp
entries) or detecting FIN/RST (for flow entries)

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-10 09:45:22 +01:00
Simon Sundberg
be0921d116 pping: Use BPF iterators to do map gc
To improve the performance of the map cleanup, switch from the
user-spaced loop to using BPF iterators. With BPF iterators, a BPF
program can be run on each element in the map, and can thus be done in
kernel-space. This should hopefully also avoid the issue the previous
userspace loop had with resetting in case an element was removed by
the BPF programs during the cleanup.

Due to removal of userspace logic for map cleanup, no longer provide
any debug information about how many entires there are in each map and
how many of them were removed by the garbage collection. This will be
added back in the next commit.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-10 09:45:15 +01:00
Simon Sundberg
d4c33259c7 pping: Improve program cleanup on shutdown
Explicitly stop the thread which performs periodical map cleanup on
shutdown, before attempting to free up any resources it might use.

For the cleanup order to make more sense, also setup the perf-buffer
before setting up the periodical map cleaning, so that the thread can
be stopped as the first part of the cleanup. This also matches better
with the next couple of commits where map cleaning debug information
will be pushed through the perf-buffer.

Finally, move the addition of the signalhandler earlier in the code to
eliminate a window where it was possible to terminate the program
without relevant cleanup code having a chance to run.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-09 19:20:51 +01:00
Simon Sundberg
6885771b14 pping: Allocate cleanup-args on main's stack
Allocate the clean_args on the stack of the main function rather than
the stack of setup_periodical_map_cleaning. The clean_args is used by
periodical_map_cleanup from a different thread, so allocating them on
stack for setup_periodical_map_cleaning which goes out of scope
directly after the thread is created opens up for errors where later
function calls may overwrite the arguments, causing unpredictable
behavior.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-03-09 14:54:20 +01:00
Simon Sundberg
b8215e70b2 pping: Add section on potential issues to TODO-list
Collect potential issues under a new section in the TODO list. These
are issues I generally don't think are that severe, but may still be
useful to note down and keep in mind.

Move the section on potential concurrency issues from README to the
new section in the TODO-list.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:17:10 +01:00
Simon Sundberg
2647429081 pping: Add warnings for failing to create map entry
Send a warning notifying the user that PPing failed to create a
flow/timestamp entry due to the corresponding map being full. To avoid
sending a warning for every packet, only emit warnings every
WARN_MAP_FULL_INTERVAL (which is currently hard-coded to 1s).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:17:10 +01:00
Simon Sundberg
54886c92d9 pping: Refactor event handling code
Refactor code for how events are handled in the user space
application. Preparation for adding an additional event type which
should not be handled by the normal functions for printing RTT and
flow events.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:17:10 +01:00
Simon Sundberg
32bdf11a96 pping: Only consider flow opened on reply
Wait with sending a flow open message until a reply has been seen for
the flow. Likewise, only emit a flow closing event if the flow has
first been opened (that is, a reply has been seen).

This introduces potential (but unlikely) concurrency issues for flow
opening/closing messages which are further described in the README.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:17:10 +01:00
Simon Sundberg
8a8f538759 pping: Do both timestamping and matching on ingress and egress
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.

Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.

As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.

Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.

Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:16:24 +01:00
Simon Sundberg
928a4144a9 pping: Add RTT-based sampling
Add an option (-R, --rtt-rate) to adapt the rate sampling based on the
RTT of the flow. The sampling rate will be C * RTT, where C is a
configurable constant (ex 1.0 to get one sample every RTT), and RTT
is either the current minimum (default) or smoothed RTT of the
flow (chosen via the -t or --rtt-type option).

The smoothed RTT (sRTT) is updated for each calculated RTT, and is
calculated in a similar manner to srtt in the kernel's TCP stack. The
sRTT is a moving average of all RTTs, and is calculated according to
the formula:

  srtt = 7/8 * prev_srtt + 1/8 * rtt

To allow the user to pass a non-integer C (ex 0.1 to get 10 RTT
samples for every RTT-period), fixed-point arithmetic has been used
in the eBPF programs (due to lack of support for floats). The maximum
value for C has been limited to 10000 in order for it to be unlikely
that the C * RTT calculation will overflow (with C = 10000, overflow
will only occur if RTT > 28 seconds).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:11:21 +01:00
Simon Sundberg
c79c4e8571 pping: Eliminate flow creation/deletion concurrency issue
Only push flow events for opening/closing flows if the
creation/deletion of the flow-state was successful (as indicated by
the bpf_map_*_elem() return value). This should avoid outputting
several flow creation/deletion messages in case multiple instances are
trying to create/delete a flow concurrently, as could theoretically
occur previously.

Also set the last_timestamp value before creating a new flow, to avoid
a race condition where the userspace cleanup might incorrectly
determine that a flow is old before the last_timestamp value can be
set. Explicitly skip the rate-limit for the first packet of a new flow
to avoid it failing the rate-limit. This also fixes an issue where the
first packet of a new flow would previously fail the rate-limit if the
rate-limit was higher than current time uptime (CLOCK_MONOTONIC).

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
2022-02-10 16:08:23 +01:00