This adds libxdp as a submodule and link target alongside libbpf. This
should make it just as easy for examples to use libxdp as it currently is
for libbpf. Some hoops need to be jumped through to make libxdp link
against the same version of libbpf as the one we use in this repository.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The header files included from pping_kern.c include definitions of AF_INET
and AF_INET6, leading to warnings like:
pping_kern.c:25:9: warning: 'AF_INET' macro redefined [-Wmacro-redefined]
^
/usr/include/bits/socket.h:97:9: note: previous definition is here
^
pping_kern.c:26:9: warning: 'AF_INET6' macro redefined [-Wmacro-redefined]
^
/usr/include/bits/socket.h:105:9: note: previous definition is here
^
2 warnings generated.
Fix this by guarding the definitions behind suitable ifdefs.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The connection state had 3 boolean flags related to what state it was
in (is_empty, has_opened and has_closed). Only specific combinations
of these flags really made sense (has_opened/has_closed didn't really
mean anything if is_empty, and if has_closed one would expect is_empty
to be false and has_opened to be true etc.). Therefore, replace these
combinations of boolean values with a singular enum which is used to
check if the flow is empty, waiting to open (seen outgoing packet but
no response), is open or has closed.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Combine the flow state entries for both the "forward" and "reverse"
direction of the flow into a single dualflow state. Change the
flowstate map to use the dualflow state so that state for both
directions can be retrieved using a single map lookup.
As flow states are now kept in pairs, cannot directly create/delete
states from the BPF map each time a flow opens/closes in one
direction. Therefore, update all logic related to creating/deleting
flows. For example, use "empty" slot in dualflow state instead of
creating a new map entry, and only delete the dual flow state entry
once both directions of the flow have closed/timed out.
Some implementation details:
Have implemented a simple memcmp function as I could not get the
__builtin_memcmp function to work (got error "libbpf: failed to
find BTF for extern 'memcmp': -2").
To ensure that both directions of the flow always look up the same
entry, use the "sorted" flow tuple (the (ip, port) pair that is
smaller is always first) as key. This is what the memcmp is used for.
To avoid storing two copies of the flow tuple (src -> dst and dst ->
src) and doing additional memcmps, always store the flow state for the
"sorted" direction as the first direction and the reverse as the
second direction. Then simply check if a flow is sorted or not to
determine which direction in the dual flow state that matches. Have
attempted to at least partially abstract this detail away from most of
the code by adding some get_flowstate_from* helpers.
The dual flow state simply stores the two (single direction) flow
states as the struct members dir1 and dir2. Use these two (admittedly
poorly named) members instead of a single array of size 2 in order to
avoid some issues with the verifier being worried that the array index
might be out of bounds.
Have added some new boolean members to the flow state to keep track of
"connection state". In addition the the previous has_opened, I now
also have a member for if the flow is "empty" or if it has been
closed. These are needed to cope with having to keep individual flow
states for both directions of the flow around as long as one direction
of the flow is used. I plan to replace these boolean "connection
state" members with a single enum in a future commit.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Refactor functions for parsing protocol-specific packet
identifiers (parse_tcp_identifier, parse_icmp6_identifer and
parse_icmp_identifer) so they no longer directly fill in the
packet_info struct. Instead make the functions take additional
pointers as arguments and fill in a protocol_info struct.
The reason for this change is to decouple the
parse_<protocol>_identifier functions from the logic of how the
packet_info struct should be filled. The parse_packet_indentifier is
now solely responsible for filling in the members of packet_info
struct correctly instead of working in tandem with the
parse_<protocol>_identifier, filling in some members each.
This might result in a minimal performance degradation as some values
are now first filled in the protocol_info struct and later copied to
packet_info instead of being filled in directly in packet_info.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Format code using clang-format from the kernel tree. However, leave
code in orginal format in some instances where clang-format clearly
reduces readability of code (ex. do not remove alginment of comments
for struct members and long options).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add a counter of outstanding (unmatched) timestamped entires in the
flow state. Before a timestamp lookup is attempted, check that there
are any outstanding timestamps, otherwise avoid the unecessary hash
map lookup.
Use 32 bit counter for outstanding timestamps to allow atomic
increments/decrements using __synch_fetch_and_add. This operation is
not supported on smaller integers, which is why such a large counter
is used. The atomicity is needed because the counter may be
concurrently accessed by both the ingress/egress hook as well as the
periodical map cleanup.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add conditions that allows removing old flow and timestamp entries
sooner.
For flow map, have added conditions that allow unopened flows and ICMP
flows to be removed earlier than open TCP flows (currently both set to
30 sec instead of 300 sec).
For timestamp entries, allow them to be removed if they're more than
TIMESTAMP_RTT_LIFETIME (currently 8) times higher than the flow's
sRTT.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add some debug info to the periodical map cleanup process. Push debug
information through the events perf buffer by using newly added
map_clean_event.
The old user space map cleanup process had some simple debug
information that was lost when transitioning to using bpf_iter
instead. Therefore, add back similar (but more extensive) debug
information but now collected from the BPF-side. In addition to stats
on entries deleted by the cleanup process, also include stats on
entries deleted by ePPing itself due to matching (for timestamp
entries) or detecting FIN/RST (for flow entries)
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
To improve the performance of the map cleanup, switch from the
user-spaced loop to using BPF iterators. With BPF iterators, a BPF
program can be run on each element in the map, and can thus be done in
kernel-space. This should hopefully also avoid the issue the previous
userspace loop had with resetting in case an element was removed by
the BPF programs during the cleanup.
Due to removal of userspace logic for map cleanup, no longer provide
any debug information about how many entires there are in each map and
how many of them were removed by the garbage collection. This will be
added back in the next commit.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Explicitly stop the thread which performs periodical map cleanup on
shutdown, before attempting to free up any resources it might use.
For the cleanup order to make more sense, also setup the perf-buffer
before setting up the periodical map cleaning, so that the thread can
be stopped as the first part of the cleanup. This also matches better
with the next couple of commits where map cleaning debug information
will be pushed through the perf-buffer.
Finally, move the addition of the signalhandler earlier in the code to
eliminate a window where it was possible to terminate the program
without relevant cleanup code having a chance to run.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Allocate the clean_args on the stack of the main function rather than
the stack of setup_periodical_map_cleaning. The clean_args is used by
periodical_map_cleanup from a different thread, so allocating them on
stack for setup_periodical_map_cleaning which goes out of scope
directly after the thread is created opens up for errors where later
function calls may overwrite the arguments, causing unpredictable
behavior.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
If there were more transmit slots, then we umem free the
packet, but we continued sending it anyhow.
The places tx_pkt() is currently used this never happened.
Still fix the bug.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Collect potential issues under a new section in the TODO list. These
are issues I generally don't think are that severe, but may still be
useful to note down and keep in mind.
Move the section on potential concurrency issues from README to the
new section in the TODO-list.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Send a warning notifying the user that PPing failed to create a
flow/timestamp entry due to the corresponding map being full. To avoid
sending a warning for every packet, only emit warnings every
WARN_MAP_FULL_INTERVAL (which is currently hard-coded to 1s).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Refactor code for how events are handled in the user space
application. Preparation for adding an additional event type which
should not be handled by the normal functions for printing RTT and
flow events.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Wait with sending a flow open message until a reply has been seen for
the flow. Likewise, only emit a flow closing event if the flow has
first been opened (that is, a reply has been seen).
This introduces potential (but unlikely) concurrency issues for flow
opening/closing messages which are further described in the README.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add an option (-R, --rtt-rate) to adapt the rate sampling based on the
RTT of the flow. The sampling rate will be C * RTT, where C is a
configurable constant (ex 1.0 to get one sample every RTT), and RTT
is either the current minimum (default) or smoothed RTT of the
flow (chosen via the -t or --rtt-type option).
The smoothed RTT (sRTT) is updated for each calculated RTT, and is
calculated in a similar manner to srtt in the kernel's TCP stack. The
sRTT is a moving average of all RTTs, and is calculated according to
the formula:
srtt = 7/8 * prev_srtt + 1/8 * rtt
To allow the user to pass a non-integer C (ex 0.1 to get 10 RTT
samples for every RTT-period), fixed-point arithmetic has been used
in the eBPF programs (due to lack of support for floats). The maximum
value for C has been limited to 10000 in order for it to be unlikely
that the C * RTT calculation will overflow (with C = 10000, overflow
will only occur if RTT > 28 seconds).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Only push flow events for opening/closing flows if the
creation/deletion of the flow-state was successful (as indicated by
the bpf_map_*_elem() return value). This should avoid outputting
several flow creation/deletion messages in case multiple instances are
trying to create/delete a flow concurrently, as could theoretically
occur previously.
Also set the last_timestamp value before creating a new flow, to avoid
a race condition where the userspace cleanup might incorrectly
determine that a flow is old before the last_timestamp value can be
set. Explicitly skip the rate-limit for the first packet of a new flow
to avoid it failing the rate-limit. This also fixes an issue where the
first packet of a new flow would previously fail the rate-limit if the
rate-limit was higher than current time uptime (CLOCK_MONOTONIC).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add command-line flags for each protocol that pping should attempt to
parse and report RTTs for (currently -T/--tcp and -C/--icmp). If no
protocol is specified assume TCP. To clarify this, output a message
before start on how ePPing has been configured (stating output format,
tracked protocols and which interface to run on).
Additionally, as the ppviz format was only designed for TCP it does
not have any field for which protocol an entry belongs to. Therefore,
emit a warning in case the user selects the ppviz format with anything
other than TCP.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Allow pping to passivly monitor RTT for ICMP echo request/reply
flows. Use the echo identifier as ports, and echo sequence as packet
identifier.
Additionally, add protocol to standard output format in order to be
able to distinguish between TCP and ICMP flows.
The ppviz format does not include protocol, making it impossible to
distinguish between TCP and ICMP traffic. Will add warning if ppviz
format is used together with ICMP traffic in the future.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
The echoed TCP timestamp (TSecr) is only valid if the ACK flag is
set. So make sure to only attempt to match on ACK packets.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Digging into the return value of netdev_pick_tx().
Want to be able to debug the case where a socket
selects another queue_id.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The BPF-prog "not_txq_zero" also needed to take into account
that skb->queue_mapping usually isn't set for locally
generated traffic.
I worry that sockets can set another queue id that could
override our (BPF choice) in netdev_pick_tx().
See sk_tx_queue_set() and sk_tx_queue_get().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
This version of the XPS script have been modified to
work with the shell ash. As bash was not avail on
the Yocto target host.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Also intercept SIGTERM (in addition the the previously intercepted
SIGINT) and perform graceful shutdown.
Perhaps it also makes sense to perform graceful shutdown on some
additional signals, like SIGHUP and SIGQUIT?
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>