If there were more transmit slots, then we umem free the
packet, but we continued sending it anyhow.
The places tx_pkt() is currently used this never happened.
Still fix the bug.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Collect potential issues under a new section in the TODO list. These
are issues I generally don't think are that severe, but may still be
useful to note down and keep in mind.
Move the section on potential concurrency issues from README to the
new section in the TODO-list.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Send a warning notifying the user that PPing failed to create a
flow/timestamp entry due to the corresponding map being full. To avoid
sending a warning for every packet, only emit warnings every
WARN_MAP_FULL_INTERVAL (which is currently hard-coded to 1s).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Refactor code for how events are handled in the user space
application. Preparation for adding an additional event type which
should not be handled by the normal functions for printing RTT and
flow events.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Wait with sending a flow open message until a reply has been seen for
the flow. Likewise, only emit a flow closing event if the flow has
first been opened (that is, a reply has been seen).
This introduces potential (but unlikely) concurrency issues for flow
opening/closing messages which are further described in the README.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Perform both timestamping and matching on both ingress and egress
hooks. This makes it more similar to Kathie's pping, allowing the tool
to capture RTTs in both directions when deployed on just a single
interface.
Like Kathie's pping, by default filter out RTTs for packets going to
the local machine (will only include local processing delays). This
behavior can be disabled by passing the -l/--include-local option.
As packets that are timestamped on ingress and matched on egress will
include the local machines processing delay, add the "match_on_egress"
member to the JSON output that can be used to differentiate between
RTTs that include the local processing delay, and those which don't.
Finally, report the source and destination addresses from the perspective
of the reply packet, rather than the timestamped packet, to be
consistent with Kathie's pping.
Overall, refactor large parts of pping_kern to allow both timestamping
and matching, as well as updating both the flow and reverse flow and
handle flow-events related to them, in one go. Also update README to
reflect changes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add an option (-R, --rtt-rate) to adapt the rate sampling based on the
RTT of the flow. The sampling rate will be C * RTT, where C is a
configurable constant (ex 1.0 to get one sample every RTT), and RTT
is either the current minimum (default) or smoothed RTT of the
flow (chosen via the -t or --rtt-type option).
The smoothed RTT (sRTT) is updated for each calculated RTT, and is
calculated in a similar manner to srtt in the kernel's TCP stack. The
sRTT is a moving average of all RTTs, and is calculated according to
the formula:
srtt = 7/8 * prev_srtt + 1/8 * rtt
To allow the user to pass a non-integer C (ex 0.1 to get 10 RTT
samples for every RTT-period), fixed-point arithmetic has been used
in the eBPF programs (due to lack of support for floats). The maximum
value for C has been limited to 10000 in order for it to be unlikely
that the C * RTT calculation will overflow (with C = 10000, overflow
will only occur if RTT > 28 seconds).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Only push flow events for opening/closing flows if the
creation/deletion of the flow-state was successful (as indicated by
the bpf_map_*_elem() return value). This should avoid outputting
several flow creation/deletion messages in case multiple instances are
trying to create/delete a flow concurrently, as could theoretically
occur previously.
Also set the last_timestamp value before creating a new flow, to avoid
a race condition where the userspace cleanup might incorrectly
determine that a flow is old before the last_timestamp value can be
set. Explicitly skip the rate-limit for the first packet of a new flow
to avoid it failing the rate-limit. This also fixes an issue where the
first packet of a new flow would previously fail the rate-limit if the
rate-limit was higher than current time uptime (CLOCK_MONOTONIC).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add command-line flags for each protocol that pping should attempt to
parse and report RTTs for (currently -T/--tcp and -C/--icmp). If no
protocol is specified assume TCP. To clarify this, output a message
before start on how ePPing has been configured (stating output format,
tracked protocols and which interface to run on).
Additionally, as the ppviz format was only designed for TCP it does
not have any field for which protocol an entry belongs to. Therefore,
emit a warning in case the user selects the ppviz format with anything
other than TCP.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Allow pping to passivly monitor RTT for ICMP echo request/reply
flows. Use the echo identifier as ports, and echo sequence as packet
identifier.
Additionally, add protocol to standard output format in order to be
able to distinguish between TCP and ICMP flows.
The ppviz format does not include protocol, making it impossible to
distinguish between TCP and ICMP traffic. Will add warning if ppviz
format is used together with ICMP traffic in the future.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
The echoed TCP timestamp (TSecr) is only valid if the ACK flag is
set. So make sure to only attempt to match on ACK packets.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Digging into the return value of netdev_pick_tx().
Want to be able to debug the case where a socket
selects another queue_id.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The BPF-prog "not_txq_zero" also needed to take into account
that skb->queue_mapping usually isn't set for locally
generated traffic.
I worry that sockets can set another queue id that could
override our (BPF choice) in netdev_pick_tx().
See sk_tx_queue_set() and sk_tx_queue_get().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
This version of the XPS script have been modified to
work with the shell ash. As bash was not avail on
the Yocto target host.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Also intercept SIGTERM (in addition the the previously intercepted
SIGINT) and perform graceful shutdown.
Perhaps it also makes sense to perform graceful shutdown on some
additional signals, like SIGHUP and SIGQUIT?
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
This seems to be a common occuring issue with tc cmdline.
And the C-code have inherited the issue in the API.
Trying to replace a TC-BPF prog often result in appending a new prog
(as a new tc filter instance).
Be careful to set both handle and prio and the replace flag.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The reason for going this route is that this allow us to
create a user binary that contains the BPF object file.
Thus, we can avoid having to load the BPF file from
a specific location or having to be in same dir as file.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The Yocto build this is intended for doesn't have /bin/bash
adapt script.
External program "getopt" not avail.
The 'sort' tool is also different, as it comes from busybox.
Adapt the cmdline options for 'sort'.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Yocto build have a problem with loading this via tc
# tc filter replace dev eth1 egress prio 0xC000 handle 1 bpf da obj tc_txq_policy_kern.o
Continuing without mounted eBPF fs. Too old kernel?
mkdir (null)/globals failed: No such file or directory
Unable to load program
It can be worked around via mounting BPF file-system manually:
# mount -t bpf bpf /sys/fs/bpf/
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The libbpf API has deprecated a number of functions used by the pping
loader. While a couple of functions have simply been renamed,
bpf_object__find_program_by_title has been completely deprecated in
favor of bpf_object__find_program_by_name. Therefore, change so that
BPF programs are found based on the C function names rather than
section names.
Also remove defines of section names as they are no longer used, and
change the section names in pping_kern.c to use "tc" instead of
"classifier/ingress" and "classifier/egress".
Finally replace the flags json_format and json_ppviz in pping_config
with a single enum for the different output formats. This makes the
logic for which output format to use clearer compared to relying on
multiple (supposedly) mutually exclusive flags (and implicitly
assuming standard format if neither flag was set).
One potential concern with this commit is that it introduces some
"magical strings". In case the function names in pping_kern.c are
changed it will require multiple changes in pping.c.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
The rate-limit and cleanup-interval arguments were only verified to be
positive. Add a check for an upper bound to avoid user being able to
pass values that result in an internal overflow. The limits for both
rate-limit and cleanup-interval have been set to one week which should
be more then enough for any reasonable user.
Additionally, disable the period cleanup entirely if the value 0 is
passed to cleanup-interval.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>