Add a counter of outstanding (unmatched) timestamped entires in the
flow state. Before a timestamp lookup is attempted, check that there
are any outstanding timestamps, otherwise avoid the unecessary hash
map lookup.
Use 32 bit counter for outstanding timestamps to allow atomic
increments/decrements using __synch_fetch_and_add. This operation is
not supported on smaller integers, which is why such a large counter
is used. The atomicity is needed because the counter may be
concurrently accessed by both the ingress/egress hook as well as the
periodical map cleanup.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add conditions that allows removing old flow and timestamp entries
sooner.
For flow map, have added conditions that allow unopened flows and ICMP
flows to be removed earlier than open TCP flows (currently both set to
30 sec instead of 300 sec).
For timestamp entries, allow them to be removed if they're more than
TIMESTAMP_RTT_LIFETIME (currently 8) times higher than the flow's
sRTT.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add some debug info to the periodical map cleanup process. Push debug
information through the events perf buffer by using newly added
map_clean_event.
The old user space map cleanup process had some simple debug
information that was lost when transitioning to using bpf_iter
instead. Therefore, add back similar (but more extensive) debug
information but now collected from the BPF-side. In addition to stats
on entries deleted by the cleanup process, also include stats on
entries deleted by ePPing itself due to matching (for timestamp
entries) or detecting FIN/RST (for flow entries)
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
To improve the performance of the map cleanup, switch from the
user-spaced loop to using BPF iterators. With BPF iterators, a BPF
program can be run on each element in the map, and can thus be done in
kernel-space. This should hopefully also avoid the issue the previous
userspace loop had with resetting in case an element was removed by
the BPF programs during the cleanup.
Due to removal of userspace logic for map cleanup, no longer provide
any debug information about how many entires there are in each map and
how many of them were removed by the garbage collection. This will be
added back in the next commit.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Send a warning notifying the user that PPing failed to create a
flow/timestamp entry due to the corresponding map being full. To avoid
sending a warning for every packet, only emit warnings every
WARN_MAP_FULL_INTERVAL (which is currently hard-coded to 1s).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Wait with sending a flow open message until a reply has been seen for
the flow. Likewise, only emit a flow closing event if the flow has
first been opened (that is, a reply has been seen).
This introduces potential (but unlikely) concurrency issues for flow
opening/closing messages which are further described in the README.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add an option (-R, --rtt-rate) to adapt the rate sampling based on the
RTT of the flow. The sampling rate will be C * RTT, where C is a
configurable constant (ex 1.0 to get one sample every RTT), and RTT
is either the current minimum (default) or smoothed RTT of the
flow (chosen via the -t or --rtt-type option).
The smoothed RTT (sRTT) is updated for each calculated RTT, and is
calculated in a similar manner to srtt in the kernel's TCP stack. The
sRTT is a moving average of all RTTs, and is calculated according to
the formula:
srtt = 7/8 * prev_srtt + 1/8 * rtt
To allow the user to pass a non-integer C (ex 0.1 to get 10 RTT
samples for every RTT-period), fixed-point arithmetic has been used
in the eBPF programs (due to lack of support for floats). The maximum
value for C has been limited to 10000 in order for it to be unlikely
that the C * RTT calculation will overflow (with C = 10000, overflow
will only occur if RTT > 28 seconds).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Only push flow events for opening/closing flows if the
creation/deletion of the flow-state was successful (as indicated by
the bpf_map_*_elem() return value). This should avoid outputting
several flow creation/deletion messages in case multiple instances are
trying to create/delete a flow concurrently, as could theoretically
occur previously.
Also set the last_timestamp value before creating a new flow, to avoid
a race condition where the userspace cleanup might incorrectly
determine that a flow is old before the last_timestamp value can be
set. Explicitly skip the rate-limit for the first packet of a new flow
to avoid it failing the rate-limit. This also fixes an issue where the
first packet of a new flow would previously fail the rate-limit if the
rate-limit was higher than current time uptime (CLOCK_MONOTONIC).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add command-line flags for each protocol that pping should attempt to
parse and report RTTs for (currently -T/--tcp and -C/--icmp). If no
protocol is specified assume TCP. To clarify this, output a message
before start on how ePPing has been configured (stating output format,
tracked protocols and which interface to run on).
Additionally, as the ppviz format was only designed for TCP it does
not have any field for which protocol an entry belongs to. Therefore,
emit a warning in case the user selects the ppviz format with anything
other than TCP.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Allow pping to passivly monitor RTT for ICMP echo request/reply
flows. Use the echo identifier as ports, and echo sequence as packet
identifier.
Additionally, add protocol to standard output format in order to be
able to distinguish between TCP and ICMP flows.
The ppviz format does not include protocol, making it impossible to
distinguish between TCP and ICMP traffic. Will add warning if ppviz
format is used together with ICMP traffic in the future.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
The echoed TCP timestamp (TSecr) is only valid if the ACK flag is
set. So make sure to only attempt to match on ACK packets.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
The libbpf API has deprecated a number of functions used by the pping
loader. While a couple of functions have simply been renamed,
bpf_object__find_program_by_title has been completely deprecated in
favor of bpf_object__find_program_by_name. Therefore, change so that
BPF programs are found based on the C function names rather than
section names.
Also remove defines of section names as they are no longer used, and
change the section names in pping_kern.c to use "tc" instead of
"classifier/ingress" and "classifier/egress".
Finally replace the flags json_format and json_ppviz in pping_config
with a single enum for the different output formats. This makes the
logic for which output format to use clearer compared to relying on
multiple (supposedly) mutually exclusive flags (and implicitly
assuming standard format if neither flag was set).
One potential concern with this commit is that it introduces some
"magical strings". In case the function names in pping_kern.c are
changed it will require multiple changes in pping.c.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Make several changes to functions related to attaching and detaching
the BPF programs:
- Check the BPF program id when detaching programs to ensure that the
correct programs are removed.
- When attaching tc-programs, keep track of if the clsact qdisc was
created or existed previously. Attempt to delete the qdisc if it was
created and attaching failed. If the --force argument was given, also
attempt to delete qdisc on shutdown in case it did not previously
exist.
- Rely on XDP flags to replace existing XDP program if --force is used
rather than explicitly detaching any XDP program first.
- Print out hints for why pping might have failed attaching the XDP
program.
Also, use libbpf_strerror instead of strerror to better display
libbpf-specific error codes, and for more reliable error handling in
general (don't need to ensure the error codes are positive).
Finally, change return codes of tc programs to TC_ACT_UNSPEC from
TC_ACT_OK to allow other TC-BPF programs to be used on the same
interface as pping.
Concerns with this commit:
- When attaching a tc program libbpf will emit a warning if the
clsact qdisc already exists on the interface. The fact that the
clsact already exists is not an issue, and is handled in tc_attach
by checking for EEXIST, so the warning could be a bit
misleading/confusing for the user.
- The tc_attach and xdp_attach functions attempt to return the u32
prog_id in an int. In case the programs are assigned a very high
id (> 2^31) this may cause it to be interpreted as an error instead.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
For some machines, XDP may not be suitable due to ex. lack of XDP
support in NIC drivers or another program already being attached to
the XDP hook on the desired interface. Therefore, add an option to use
the tc-ingress hook instead of XDP to attach the pping ingress BPF
program on.
In practice, this adds an additional BPF program to the object file (a
TC ingress program). To avoid loading an unnecessary BPF program, also
explicitly disable autoloading for the ingress program not selected.
Also, change the tc programs to return TC_ACT_OK instead of
BPF_OK. While both should be compatible, the TC_ACT_* return codes
seem to be more commonly used for TC-BPF programs.
Concerns with this commit:
- The error messages for XDP attach failure has gotten slightly less
descriptive. I plan to improve the code for attaching and detaching
XDP programs in a separate commit, and will then address that.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Make the flow_timeout function call the current output function to
simulate a flow-closing event. Also some other minor cleanup/fixes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add "flow events" (flow opening or closing so far) which will trigger
a printout of message.
Note: The ppviz format will only print out the traditional rtt events
as the format does not include opening/closing messages.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add per-flow tracking of number of packets and bytes
sent/received. Add these to the JSON output format.
Also update README regarding concurrency issue when updating these
statistics.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Also, remove comments about concurrency issues from code in
pping_kern.c as it is now documented in README.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add the option to output in JSON format by passing '-j' or '--json' to
pping. Include the protocol in the JSON format, and fix so kernel-side
actually stores the protocol in the flow_address struct.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
To add timestamp to output, push the timestamp when packet was
processed from kernel as part of the rtt-event. Also keep track of
minimum encountered RTT for each flow in kernel, and also push that as
part of the RTT-event.
Additionally, avoid pushing RTT messages at all if no flow-state
information can be found (due to ex. being deleted from egress side),
as no valid min-RTT can then be given. Furthermore, no longer delete
flow-information once seeing the FIN-flag on egress in order to keep
useful flow-state around for RTT-messages longer. Due to the
FIN-handshake process, it is sufficient if the ingress program deletes
the flow-state upon seeing FIN. However, still delete flow-state from
either ingress or egress upon seeing RST flag, as RST does not have a
handshake process allowing for delayed deletion.
While minimum RTT could also be tracked from the userspace process,
userspace is not aware of when the flow is closed so would have to add
additional logic to keep track of minimum RTT for each flow and
periodically clean them up. Furthermore, keeping RTT statistics in the
flow-state map is useful for implementing future features, such as an
RTT-based sampling interval. It would also be useful in case pping is
changed to no longer have a long-running userspace process printing
out all the calculated RTTs, but instead simply occasionally looks up
the RTT from the flow-state map.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
When both BPF programs are kept in the same file, no longer need to
pin the maps in order to share them between the programs.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Reduce IPV6_EXT_MAX_CHAIN to 3 to avoid hitting the verifier limit of
processing 1 million instructions, This results in fewer loops in
parsing_helpers.h/skip_ip6hdrnext which simplifies the verifier
analysis. IPv6 extension headers do not appear to be that common, so
this is unlikely to cause a considerable limitation.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Merge the pping_kern_tc.c, pping_kern_xdp.c and pping_helpers.h into
the single file pping_kern.c. Do not change any of the BPF code,
except renaming the map ts_start to packet_ts.
To handle both BPF programs kept in single ELF-file, change loading
mechanism to extract and attach both tc and XDP programs from it. Also
refactor main-method into several smaller functions to reduce its
size.
Finally, added the --force (-f) and --cleanup-interval (-c) options to
the argument parsing, and improved the parsing of the
--rate-limit (-r) option.
NOTE: The verifier rejects program in it's current state as too
large (over 1 million instructions). Setting the TCP_MAX_OPTIONS in
pping_kern.c to 5 (or less) solves this. Unsure at the moment what
causes the verifier to think the program is so large, as the code in
pping_kern.c is identical to the one from the three files it was
merged from.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>