Update README, mainly add a new section with a brief descriptions and
some examples of the output formats.
Also, update the files and maps list to reflect recent changes (BPF
programs can now push flow-events, and the map rtt_events has been
renamed to just events.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Simplify the three output functions by breaking them up into smaller
helper functions. Also introduce the pping_event union, which can hold
either an rtt_event or flow_event.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Make the flow_timeout function call the current output function to
simulate a flow-closing event. Also some other minor cleanup/fixes.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add "flow events" (flow opening or closing so far) which will trigger
a printout of message.
Note: The ppviz format will only print out the traditional rtt events
as the format does not include opening/closing messages.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Use a JSON-writer library from iproute instead of complicated printf
statement. Also output timestamp, rtt and min_rtt as integers in
nanoseconds, rather than floats in seconds.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Change order of parameters for format_ip_address to follow the
convention of the printf functions where buffer is placed first,
instead of the conventions of the inet_ntop functions where buffer is
placed last.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add per-flow tracking of number of packets and bytes
sent/received. Add these to the JSON output format.
Also update README regarding concurrency issue when updating these
statistics.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Also, remove comments about concurrency issues from code in
pping_kern.c as it is now documented in README.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
The format option can take the values "standard" (default), "json" and
ppviz (new name for "machine-friendly").
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add Kathie's "machine friendly" as an optional output format when
passing '-m' or '--machine-friendly' to pping. This format can be used
together with Kathie's ppviz tool to visaulize the output.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add the option to output in JSON format by passing '-j' or '--json' to
pping. Include the protocol in the JSON format, and fix so kernel-side
actually stores the protocol in the flow_address struct.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
To add timestamp to output, push the timestamp when packet was
processed from kernel as part of the rtt-event. Also keep track of
minimum encountered RTT for each flow in kernel, and also push that as
part of the RTT-event.
Additionally, avoid pushing RTT messages at all if no flow-state
information can be found (due to ex. being deleted from egress side),
as no valid min-RTT can then be given. Furthermore, no longer delete
flow-information once seeing the FIN-flag on egress in order to keep
useful flow-state around for RTT-messages longer. Due to the
FIN-handshake process, it is sufficient if the ingress program deletes
the flow-state upon seeing FIN. However, still delete flow-state from
either ingress or egress upon seeing RST flag, as RST does not have a
handshake process allowing for delayed deletion.
While minimum RTT could also be tracked from the userspace process,
userspace is not aware of when the flow is closed so would have to add
additional logic to keep track of minimum RTT for each flow and
periodically clean them up. Furthermore, keeping RTT statistics in the
flow-state map is useful for implementing future features, such as an
RTT-based sampling interval. It would also be useful in case pping is
changed to no longer have a long-running userspace process printing
out all the calculated RTTs, but instead simply occasionally looks up
the RTT from the flow-state map.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Initial working version of DHCP relay using XDP is created. Currently, this code
has user program and a xdp ebpf program. User program takes network interface and
dhcp relay server IP as inputs and store it in a map. XDP program filters the
incoming DHCP requests and inserts option 82 in the DHCP request packets and
overwrites the destination IP to that of DHCP relay server IP.An optional argu
-ment for user program is also provided to unload the xdp program.
README file provides to instructions to build and load the xdp program.
Signed-off-by: Sachin Tiptur <sachin.tiptur.satyanarayana.gupta@hof-university.de>
[ whitespace fixes ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
There's a reason why Wireguard doesn't preserve DSCP marks across the
encapsulation, so let's make sure we warn about bypassing this in the
README.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
This is a fun example showing how to use BPF to preserve DSCP values across
an encapsulating interface, such as Wireguard. It relies on the
encapsulation layer preserving the skb->hash value across the
encapsulation, which is commonly the case on kernel encapsulation
protocols (including Wireguard), and uses a pair of TC BPF programs and a
map to re-match the packets after encapsulation and add back the DSCP
value.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Add a check that the protocol version field matches the expected value when
parsing IPv4 and IPv6 headers. This makes it possible to parse an IP header
that we don't know the version of (such as on interfaces that don't use an
Ethernet header).
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Seems we need to copy a few more bytes at once for the DHCP relay daemon,
so let's extend __bpf_memcpy to handle copies of up to 288 bytes.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Add the directory and Makefile rules to prepare for storing library
functions in lib/util like we do in xdp-tools. With this, library code can
be added by just dropping the .c and .h into lib/util and updating
lib/util/util.mk with the object name.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
We want to be able to use the new bpf_tc_attach() function for attaching TC
programs, so check for it in configure.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
This updates the libbpf submodule to the latest upstream version, which
notably includes the new API for directly attaching TC programs without
shelling out to the 'tc' binary.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The wildcards picking up header files only included specific subdirectories
of include/ and headers/. There are actually files in multiple subdirs,
though, so just expand the wildcard to include all subdirectories to make
sure objects are rebuilt properly.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
It's way too difficult to read packet data in XDP because LLVM will mostly
generate code that doesn't pass the verifier. Thankfully, Cilium has a nice
workaround for this in the form of hand-written BPF assembly to perform the
reads in a way that the verifier will understand. Let's import these
helpers so they can be used by the examples in this repository, along with
some of the other BPF helpers that it relies on.
This commit imports these files wholesale from Cilium:
- include/bpf/builtins.h
- include/bpf/compiler.h
- include/bpf/errno.h
And also adds include/xdp/context_helpers.h which only contains the
xdp_load_bytes() and xdp_store_bytes() helpers from Cilium's
include/bpf/ctx/xdp.h (as the other functions in that file are specific to
how the Cilium code is structured).
We also extend the maximum size supported by the efficient memcpy()
implementation in builtins.h to 280 bytes, and the mask size applied to
packet data copies up to 0x3ff.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
When both BPF programs are kept in the same file, no longer need to
pin the maps in order to share them between the programs.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Update documentation to reflect the current state of pping (after
merging pping_kern_tc and pping_kern_xdp into a single file).
Also add another point to the TODO list that has been discussed at a
previous meeting.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Reduce IPV6_EXT_MAX_CHAIN to 3 to avoid hitting the verifier limit of
processing 1 million instructions, This results in fewer loops in
parsing_helpers.h/skip_ip6hdrnext which simplifies the verifier
analysis. IPv6 extension headers do not appear to be that common, so
this is unlikely to cause a considerable limitation.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Merge the pping_kern_tc.c, pping_kern_xdp.c and pping_helpers.h into
the single file pping_kern.c. Do not change any of the BPF code,
except renaming the map ts_start to packet_ts.
To handle both BPF programs kept in single ELF-file, change loading
mechanism to extract and attach both tc and XDP programs from it. Also
refactor main-method into several smaller functions to reduce its
size.
Finally, added the --force (-f) and --cleanup-interval (-c) options to
the argument parsing, and improved the parsing of the
--rate-limit (-r) option.
NOTE: The verifier rejects program in it's current state as too
large (over 1 million instructions). Setting the TCP_MAX_OPTIONS in
pping_kern.c to 5 (or less) solves this. Unsure at the moment what
causes the verifier to think the program is so large, as the code in
pping_kern.c is identical to the one from the three files it was
merged from.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add a check that opt_size is at least 2 in
pping_helpers.h/prase_tcp_ts, otherwise terminate the loop
unsucessfully. Only check the lower bound of opt_size, the upper
bound will be checked in the first step of the next loop iteration
anyways.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Declare opt_size in pping_helpers.h/parse_tcp_ts volatile to ensure
compiler always reads it from stack as u8, which avoids confusing the
verifier into thinking it might have a negative value.
Old solution of having &=0x3f before adding opt_size to pos could
potentially cause weird behavior if a packet with an invalid TCP
option size arrived (for example, if opt_size was 64 it would be
interpreted as 0, and the loop would simply check the same position
again on each iteration). Simply changing the check to 0xff was not
possible because the compiler would optimize that away (as it knows
that to have no effect on a u8).
Also change check that TCP timestamp is not outside of boundaries from
pos+opt_size to pos+10. Before declaring opt_size as volatile compiler
automatically did this transformation, but now have to explicitly do
this. If this conversion is not done the verifier will reject the
program as it due to its goldfish memory isn't sure that opt_size has
to be 10 at this point.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Refactor init_rodata to search for the first map with ".rodata" in its
name. Should be more robust than previous solution which first tried
to construct the name for the rodata map, and then find the map by
name.
Also remove some outcommented code that was not used.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Update the README, the pping diagram (eBPF_pping_design.png) and TODO
to be more up to date with the current implementation.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Minimal example to show that the close() operation on a bpf_link can hang
indefinitely if the kernel is loaded (for example by traffic on an
interface with an XDP program loaded).
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Implement basic mechanic for parsing arguments from userspace and
passing them to a global config variable in the BPF programs.
This also changes the basic use of the program from:
$./pping interface
to:
$./pping -i interface
Also, revert to using the memset solution for the map_ipv4_to_ipv6
function to avoid the ipv4_prefix constant being stored in the .rodata
section. This makes it easier to set the value for the global config
variable from userspace, as the only thing left in the .rodata section
is the config struct.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
This example demonstrates how to write a simple eBPF Qdisc classifier
that classifies flows depending on their destination TCP port. The
example script, runner.sh shows how you can use the eBPF Qdisc
classifier and implement the same functionality using u32. The script
creates two network namespaces called Left and Right, representing two
different hosts. The script then illustrates the classifiers in action
using iperf3 by starting clients on the Left namespace that connect to
iperf3 servers on the Right namespace. The Qdisc classifiers give TCP
ports 8080 and 8081 a high rate limit, while TCP port 8082 represents
all other traffic capped at 20 Mbps.
Signed-off-by: Frey Alfredsson <freysteinn@freysteinn.com>
Add a check that to ensure verifier that opt_size is positive in case
its been read in from stack. Also enable (uncomment) the flow-state
cleanup from the XDP program as the added check avoids verifier
rejection.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Verifier might have rejected XDP program due to opt_size being loaded
from memory, see
https://blog.path.net/ebpf-xdp-and-network-security. Add check of
opt_size to attempt to convince verifier that it's not a negative
value or anything else crazy. Leads to verifier instead thinking the
program is too large (over 1m instructions).
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
As reported in the xdp-tutorial (where this code is from), there were a
couple of sizeof checks in parsing_helpers.h that was using the pointer
size instead of the size of the struct being pointed to.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Add parsing of TCP FIN/RST to determine if connection is being closed,
and if so delete state directly from BPF programs.
Only enabled on tc-program, as verifier is unhappy about in on XDP
side for some reason.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>