Merge pull request #16 from simosund/pping_Better_output

Pping better output
2024-05-06 15:54:53 +00:00 · 2021-06-24 14:28:19 +02:00
parent 0c8df86a5e 32fc35f527
commit a84f197e7e
9 changed files with 1155 additions and 128 deletions
--- a/pping/Makefile
+++ b/pping/Makefile
@@ -3,7 +3,7 @@
 USER_TARGETS   := pping
 BPF_TARGETS    := pping_kern

-LDFLAGS    += -pthread
+LDLIBS     += -pthread
 EXTRA_DEPS += pping.h

 LIB_DIR = ../lib
--- a/pping/README.md
+++ b/pping/README.md
@@ -15,7 +15,7 @@ echo-reply messages. See the [TODO-list](./TODO.md) for more potential features
 The fundamental logic of pping is to timestamp a pseudo-unique identifier for
 outgoing packets, and then look for matches in the incoming packets. If a match
 is found, the RTT is simply calculated as the time difference between the
-current time and the timestamp.
+current time and the stored timestamp.

 This tool, just as Kathie's original pping implementation, uses TCP timestamps
 as identifiers. For outgoing packets, the TSval (which is a timestamp in and off
@@ -28,6 +28,89 @@ the identifier. The mechanism to ensure only the first packet is timestamped and
 matched differs from the one in Kathie's pping, and is further described in
 [SAMPLING_DESIGN](./SAMPLING_DESIGN.md).

+## Output formats
+pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In
+general, the output consists of two different types of events, flow-events which
+gives information that a flow has started/ended, and RTT-events which provides
+information on a computed RTT within a flow.
+
+### Standard format
+The standard format is quite similar to the Kathie's pping default output, and
+is generally intended to be an easily understood human-readable format writing a
+single line per event.
+
+An example of the format is provided below:
+```shell
+16:00:46.142279766 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
+16:00:46.147705205 5.425439 ms 5.425439 ms 10.11.1.1:5201+10.11.1.2:59528
+16:00:47.148905125 5.261430 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
+16:00:48.151666385 5.972284 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
+16:00:49.152489316 6.017589 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
+16:00:49.878508114 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
+```
+
+### ppviz format
+The ppviz format is primarily intended to be used to generate data that can be
+visualized by Kathie's [ppviz](https://github.com/pollere/ppviz) tool. The
+format is essentially a CSV format, using a single space as the separator, and
+is further described [here](http://www.pollere.net/ppviz.html).
+
+Note that the optional *FBytes*, *DBytes* and *PBytes* from the format
+specification have not been included here, and do not appear to be used by
+ppviz. Furthermore, flow events are not included in the output, as the those are
+not used by ppviz.
+
+An example of the format is provided below:
+```shell
+1623420121.483727575 0.005298909 0.005298909 10.11.1.1:5201+10.11.1.2:59532
+1623420122.484530934 0.006016639 0.005298909 10.11.1.1:5201+10.11.1.2:59532
+1623420123.485899736 0.005590783 0.005298909 10.11.1.1:5201+10.11.1.2:59532
+1623420124.490584753 0.006123511 0.005298909 10.11.1.1:5201+10.11.1.2:59532
+1623420125.492190751 0.005624835 0.005298909 10.11.1.1:5201+10.11.1.2:59532
+```
+### JSON format
+The JSON format is primarily intended to be machine-readable, and thus uses no
+spacing or newlines between entries to reduce the overhead. External tools such
+as [jq](https://stedolan.github.io/jq/) can be used to pretty-print the format.
+
+The format consists of an array at the root-level, and each flow or RTT even is
+added as an object to the root-array. The events contain some additional fields
+in the JSON format which is not displayed by the other formats. All times
+(*timestamp*, *rtt* and *min_rtt*) are provided as integers in nanoseconds.
+
+An example of a (pretty-printed) flow-event is provided below:
+```json
+{
+    "timestamp": 1623420837244545000,
+    "src_ip": "10.11.1.1",
+    "src_port": 5201,
+    "dest_ip": "10.11.1.2",
+    "dest_port": 59572,
+    "protocol": "TCP",
+    "flow_event": "opening",
+    "reason": "SYN-ACK",
+    "triggered_by": "src"
+}
+```
+
+An example of a (pretty-printed) RTT-even is provided below:
+```json
+{
+    "timestamp": 1623420838254558500,
+    "src_ip": "10.11.1.1",
+    "src_port": 5201,
+    "dest_ip": "10.11.1.2",
+    "dest_port": 59572,
+    "protocol": "TCP",
+    "rtt": 5977708,
+    "min_rtt": 5441848,
+    "sent_packets": 9393,
+    "sent_bytes": 492457296,
+    "rec_packets": 5922,
+    "rec_bytes": 37
+}
+```
+
 ## Design and technical description
 !["Design of eBPF pping](./eBPF_pping_design.png)

@@ -47,7 +130,8 @@ matched differs from the one in Kathie's pping, and is further described in
  reverse flow (to match source/dest on egress). If there is a match, it
  calculates the RTT from the stored timestamp and deletes the entry. The
  calculated RTT (together with the flow-tuple) is pushed to the perf-buffer
-  `rtt_events`.
+  `events`. Both `pping_egress()` and `pping_ingress` can also push flow-events
+  to the `events` buffer.
 - **bpf_egress_loader.sh:** A shell script that's used by `pping.c` to setup a
  clsact qdisc and attach the `pping_egress()` program to egress using
  tc. **Note**: Unless your iproute2 comes with libbpf support, tc will use
@@ -65,13 +149,111 @@ matched differs from the one in Kathie's pping, and is further described in
  last seen identifier for the flow and when the last timestamp entry for the
  flow was created. Entries are created by `pping_egress()`, and can be updated
  or deleted by both `pping_egress()` and `pping_ingress()`. Leftover entries
-  are eventually removed by `pping.c`. Pinned at `/sys/fs/bpf/pping`.
+  are eventually removed by `pping.c`.
 - **packet_ts:** A hash-map storing a timestamp for a specific packet
  identifier. Entries are created by `pping_egress()` and removed by
-  `pping_ingress()` if a match is found. Leftover entries are eventually
-  removed by `pping.c`. Pinned at `/sys/fs/bpf/pping`.
- **rtt_events:** A perf-buffer used by `pping_ingress()` to push calculated RTTs
-  to `pping.c`, which continuously polls the map the print out the RTTs.
+  `pping_ingress()` if a match is found. Leftover entries are eventually removed
+  by `pping.c`.
+- **events:** A perf-buffer used by the BPF programs to push flow or RTT events
+  to `pping.c`, which continuously polls the map the prints them out.
+
+### A note on concurrency
+The program uses "global" (not `PERCPU`) hash maps to keep state. As the BPF
+programs need to see the global view to function properly, using `PERCPU` maps
+is not an option. The program must be able to match against stored packet
+timestamps regardless of the CPU the packets are processed on, and must also
+have a global view of the flow state in order for the sampling to work
+correctly.
+
+As the BPF programs may run concurrently on different CPU cores accessing these
+global hash maps, this may result in some concurrency issues. In practice, I do
+not believe these will occur particularly often, as I'm under the impression
+that packets from the same flow will typically be processed by the some
+CPU. Furthermore, most of the concurrency issues will not be that problematic
+even if they do occur. For now, I've therefore left these concurrency issues
+unattended, even if some of them could be avoided with atomic operations and/or
+spinlocks, in order to keep things simple and not hurt performance.
+
+The (known) potential concurrency issues are:
+
+#### Tracking last seen identifier
+The tc/egress program keeps track of the last seen outgoing identifier for each
+flow, by storing it in the `flow_state` map. This is done to detect the first
+packet with a new identifier. If multiple packets are processed concurrently,
+several of them could potentially detect themselves as being first with the same
+identifier (which only matters if they also pass rate-limit check as well),
+alternatively if the concurrent packets have different identifiers there may be
+a lost update (but for TCP timestamps, concurrent packets would typically be
+expected to have the same timestamp).
+
+A possibly more severe issue is out-of-order packets. If a packet with an old
+identifier arrives out of order, that identifier could be detected as a new
+identifier. If for example the following flow of four packets with just two
+different identifiers (id1 and id2) were to occur:
+
+id1 -> id2 -> id1 -> id2
+
+Then the tc/egress program would consider each of these packets to have new
+identifiers and try to create a new timestamp for each of them if the sampling
+strategy allows it. However even if the sampling strategy allows it, the
+(incorrect) creation of timestamps for id1 and id2 the second time would only be
+successful in case the first timestamps for id1 and id2 have already been
+matched against (and thus deleted). Even if that is the case, they would only
+result in reporting an incorrect RTT in case there are also new matches against
+these identifiers.
+
+This issue could be avoided entirely by requiring that new-id > old-id instead
+of simply checking that new-id != old-id, as TCP timestamps should monotonically
+increase. That may however not be a suitable solution if/when we add support for
+other types of identifiers.
+
+#### Rate-limiting new timestamps
+In the tc/egress program packets to timestamp are sampled by using a per-flow
+rate-limit, which is enforced by storing when the last timestamp was created in
+the `flow_state` map. If multiple packets perform this check concurrently, it's
+possible that multiple packets think they are allowed to create timestamps
+before any of them are able to update the `last_timestamp`. When they update
+`last_timestamp` it might also be slightly incorrect, however if they are
+processed concurrently then they should also generate very similar timestamps.
+
+If the packets have different identifiers, (which would typically not be
+expected for concurrent TCP timestamps), then this would allow some packets to
+bypass the rate-limit. By bypassing the rate-limit, the flow would use up some
+additional map space and report some additional RTT(s) more than expected
+(however the reported RTTs should still be correct).
+
+If the packets have the same identifier, they must first have managed to bypass
+the previous check for unique identifiers (see [previous point](#Tracking last
+seen identifier)), and only one of them will be able to successfully store a
+timestamp entry.
+
+#### Matching against stored timestamps
+The XDP/ingress program could potentially match multiple concurrent packets with
+the same identifier against a single timestamp entry in `packet_ts`, before any
+of them manage to delete the timestamp entry. This would result in multiple RTTs
+being reported for the same identifier, but if they are processed concurrently
+these RTTs should be very similar, so would mainly result in over-reporting
+rather than reporting incorrect RTTs.
+
+#### Updating flow statistics
+Both the tc/egress and XDP/ingress programs will try to update some flow
+statistics each time they successfully parse a packet with an
+identifier. Specifically, they'll update the number of packets and bytes
+sent/received. This is not done in an atomic fashion, so there could potentially
+be some lost updates resulting an underestimate.
+
+Furthermore, whenever the XDP/ingress program calculates an RTT, it will check
+if this is the lowest RTT seen so far for the flow. If multiple RTTs are
+calculated concurrently, then several could pass this check concurrently and
+there may be a lost update. It should only be possible for multiple RTTs to be
+calculated concurrently in case either the [timestamp rate-limit was
+bypassed](#Rate-limiting new timestamps) or [multiple packets managed to match
+against the same timestamp](#Matching against stored timestamps).
+
+It's worth noting that with sampling the reported minimum-RTT is only an
+estimate anyways (may never calculate RTT for packet with the true minimum
+RTT). And even without sampling there is some inherent sampling due to TCP
+timestamps only being updated at a limited rate (1000 Hz).

 ## Similar projects
 Passively measuring the RTT for TCP traffic is not a novel concept, and there
@@ -84,9 +266,9 @@ RTT calculation using TCP timestamps (as in this project) works is provided in
  implementing some filtering logic the hope is to be able to create a always-on
  tool that can scale well even to large amounts of massive flows.
 - [ppviz](https://github.com/pollere/ppviz): Web-based visualization tool for
-  the "machine-friendly" output from Kathie's pping tool. If/when we implement a
-  similar machine readable output option it should hopefully work with this
-  implementation as well.
+  the "machine-friendly" (-m) output from Kathie's pping tool. Running this
+  implementation of pping with --format="ppviz" will generate output that can be
+  used by ppviz.
 - [tcptrace](https://github.com/blitz/tcptrace): A post-processing tool which
  can analyze a tcpdump file and among other things calculate RTTs based on
  seq/ACK numbers (`-r` or `-R` flag).
--- a/pping/TODO.md
+++ b/pping/TODO.md
@@ -25,7 +25,7 @@
  - [ ] Could potentially include keeping track of average RTT, which
        may be useful for some decisions (ex. how often to sample,
        when entry can be removed etc)
-  - [ ] Could potentially include keeping track of minimum RTT (as
+  - [x] Could potentially include keeping track of minimum RTT (as
        done by the original pping), ex. to track bufferbloat
  - [ ] Could potentially include keeping track of if flow is
        bi-directional
@@ -38,11 +38,6 @@
    unnecessarily large, which slows down the cleaning and may block
    new entries
 - [ ] Use libxdp to load XDP program
- [ ] Add option for machine-readable output (as original pping)
-  - It may be a good idea to keep the same format as original pping,
-    so that tools such as [ppviz](https://github.com/pollere/ppviz)
-    works for both pping implementations.
- [ ] Add timestamps to output (as original pping)
 - [ ] Add support for other hooks
  - Ex TC-BFP on ingress instead of XDP?

@@ -59,3 +54,9 @@
 - [x] Add IPv6 support
 - [x] Refactor to support easy addition of other protocols
 - [x] Load tc-bpf program with libbpf (only attach it with tc)
+- [x] Switch to libbpf TC-BPF API for attaching the TC-BPF program
+- [x] Add option for machine-readable output (as original pping)
+  - It may be a good idea to keep the same format as original pping,
+    so that tools such as [ppviz](https://github.com/pollere/ppviz)
+    works for both pping implementations.
+- [x] Add timestamps to output (as original pping)
--- a/pping/pping.c
+++ b/pping/pping.c
@@ -23,6 +23,7 @@ static const char *__doc__ =
 #include <time.h>
 #include <pthread.h>

+#include "json_writer.h"
 #include "pping.h" //common structs for user-space and BPF parts

 #define NS_PER_SECOND 1000000000UL
@@ -40,6 +41,9 @@ static const char *__doc__ =

 #define MAX_PATH_LEN 1024

+#define MON_TO_REAL_UPDATE_FREQ                                                \
+	(1 * NS_PER_SECOND) // Update offset between CLOCK_MONOTONIC and CLOCK_REALTIME once per second
+
 /* 
 * BPF implementation of pping using libbpf
 * Uses TC-BPF for egress and XDP for ingress
@@ -64,20 +68,24 @@ struct map_cleanup_args {
 struct pping_config {
 	struct bpf_config bpf_config;
 	__u64 cleanup_interval;
-	int xdp_flags;
-	int ifindex;
-	char ifname[IF_NAMESIZE];
-	bool force;
 	char *object_path;
 	char *ingress_sec;
 	char *egress_sec;
 	char *pin_dir;
 	char *packet_map;
 	char *flow_map;
-	char *rtt_map;
+	char *event_map;
+	int xdp_flags;
+	int ifindex;
+	char ifname[IF_NAMESIZE];
+	bool json_format;
+	bool ppviz_format;
+	bool force;
 };

 static volatile int keep_running = 1;
+static json_writer_t *json_ctx = NULL;
+static void (*print_event_func)(void *, int, void *, __u32) = NULL;

 static const struct option long_options[] = {
 	{ "help",             no_argument,       NULL, 'h' },
@@ -85,6 +93,7 @@ static const struct option long_options[] = {
 	{ "rate-limit",       required_argument, NULL, 'r' }, // Sampling rate-limit in ms
 	{ "force",            no_argument,       NULL, 'f' }, // Detach any existing XDP program on interface
 	{ "cleanup-interval", required_argument, NULL, 'c' }, // Map cleaning interval in s
+	{ "format",           required_argument, NULL, 'F' }, // Which format to output in (standard/json/ppviz)
 	{ 0, 0, NULL, 0 }
 };

@@ -135,8 +144,11 @@ static int parse_arguments(int argc, char *argv[], struct pping_config *config)
 	double rate_limit_ms, cleanup_interval_s;

 	config->ifindex = 0;
+	config->force = false;
+	config->json_format = false;
+	config->ppviz_format = false;

-	while ((opt = getopt_long(argc, argv, "hfi:r:c:", long_options,
+	while ((opt = getopt_long(argc, argv, "hfi:r:c:F:", long_options,
 				  NULL)) != -1) {
 		switch (opt) {
 		case 'i':
@@ -173,6 +185,16 @@ static int parse_arguments(int argc, char *argv[], struct pping_config *config)
 			config->cleanup_interval =
 				cleanup_interval_s * NS_PER_SECOND;
 			break;
+		case 'F':
+			if (strcmp(optarg, "json") == 0) {
+				config->json_format = true;
+			} else if (strcmp(optarg, "ppviz") == 0) {
+				config->ppviz_format = true;
+			} else if (strcmp(optarg, "standard") != 0) {
+				fprintf(stderr, "format must be \"standard\", \"json\" or \"ppviz\"\n");
+				return -EINVAL;
+			}
+			break;
 		case 'f':
 			config->force = true;
 			break;
@@ -336,16 +358,16 @@ static int tc_bpf_clear(char *interface)
 * Returns time of CLOCK_MONOTONIC as nanoseconds in a single __u64.
 * On failure, the value 0 is returned (and errno will be set).
 */
-static __u64 get_time_ns(void)
+static __u64 get_time_ns(clockid_t clockid)
 {
 	struct timespec t;
-	if (clock_gettime(CLOCK_MONOTONIC, &t) != 0)
+	if (clock_gettime(clockid, &t) != 0)
 		return 0;

 	return (__u64)t.tv_sec * NS_PER_SECOND + (__u64)t.tv_nsec;
 }

-static bool packet_ts_timeout(void *val_ptr, __u64 now)
+static bool packet_ts_timeout(void *key_ptr, void *val_ptr, __u64 now)
 {
 	__u64 ts = *(__u64 *)val_ptr;
 	if (now > ts && now - ts > TIMESTAMP_LIFETIME)
@@ -353,11 +375,23 @@ static bool packet_ts_timeout(void *val_ptr, __u64 now)
 	return false;
 }

-static bool flow_timeout(void *val_ptr, __u64 now)
+static bool flow_timeout(void *key_ptr, void *val_ptr, __u64 now)
 {
+	struct flow_event fe;
 	__u64 ts = ((struct flow_state *)val_ptr)->last_timestamp;
-	if (now > ts && now - ts > FLOW_LIFETIME)
+
+	if (now > ts && now - ts > FLOW_LIFETIME) {
+		if (print_event_func) {
+			fe.event_type = EVENT_TYPE_FLOW;
+			fe.timestamp = now;
+			memcpy(&fe.flow, key_ptr, sizeof(struct network_tuple));
+			fe.event_info.event = FLOW_EVENT_CLOSING;
+			fe.event_info.reason = EVENT_REASON_FLOW_TIMEOUT;
+			fe.source = EVENT_SOURCE_USERSPACE;
+			print_event_func(NULL, 0, &fe, sizeof(fe));
+		}
 		return true;
+	}
 	return false;
 }

@@ -369,12 +403,12 @@ static bool flow_timeout(void *val_ptr, __u64 now)
 */
 //TODO - maybe add some pointer to arguments for del_decision_func?
 static int clean_map(int map_fd, size_t key_size, size_t value_size,
-		     bool (*del_decision_func)(void *, __u64))
+		     bool (*del_decision_func)(void *, void *, __u64))
 {
 	int removed = 0;
 	void *key, *prev_key, *value;
 	bool delete_prev = false;
-	__u64 now_nsec = get_time_ns();
+	__u64 now_nsec = get_time_ns(CLOCK_MONOTONIC);

 #ifdef DEBUG
 	int entries = 0;
@@ -401,7 +435,7 @@ static int clean_map(int map_fd, size_t key_size, size_t value_size,
 		}

 		if (bpf_map_lookup_elem(map_fd, key, value) == 0)
-			delete_prev = del_decision_func(value, now_nsec);
+			delete_prev = del_decision_func(key, value, now_nsec);
 #ifdef DEBUG
 		entries++;
 #endif
@@ -412,10 +446,11 @@ static int clean_map(int map_fd, size_t key_size, size_t value_size,
 		removed++;
 	}
 #ifdef DEBUG
-	duration = get_time_ns() - now_nsec;
-	printf("%d: Gone through %d entries and removed %d of them in %llu.%09llu s\n",
-	       map_fd, entries, removed, duration / NS_PER_SECOND,
-	       duration % NS_PER_SECOND);
+	duration = get_time_ns(CLOCK_MONOTONIC) - now_nsec;
+	fprintf(stderr,
+		"%d: Gone through %d entries and removed %d of them in %llu.%09llu s\n",
+		map_fd, entries, removed, duration / NS_PER_SECOND,
+		duration % NS_PER_SECOND);
 #endif
 cleanup:
 	free(key);
@@ -441,12 +476,33 @@ static void *periodic_map_cleanup(void *args)
 	pthread_exit(NULL);
 }

+static __u64 convert_monotonic_to_realtime(__u64 monotonic_time)
+{
+	static __u64 offset = 0;
+	static __u64 offset_updated = 0;
+	__u64 now_mon = get_time_ns(CLOCK_MONOTONIC);
+	__u64 now_rt;
+
+	if (offset == 0 ||
+	    (now_mon > offset_updated &&
+	     now_mon - offset_updated > MON_TO_REAL_UPDATE_FREQ)) {
+		now_mon = get_time_ns(CLOCK_MONOTONIC);
+		now_rt = get_time_ns(CLOCK_REALTIME);
+
+		if (now_rt < now_mon)
+			return 0;
+		offset = now_rt - now_mon;
+		offset_updated = now_mon;
+	}
+	return monotonic_time + offset;
+}
+
 /*
 * Wrapper around inet_ntop designed to handle the "bug" that mapped IPv4
 * addresses are formated as IPv6 addresses for AF_INET6
 */
-static int format_ip_address(int af, const struct in6_addr *addr, char *buf,
-			     size_t size)
+static int format_ip_address(char *buf, size_t size, int af,
+			     const struct in6_addr *addr)
 {
 	if (af == AF_INET)
 		return inet_ntop(af, &addr->s6_addr[12],
@@ -456,18 +512,193 @@ static int format_ip_address(int af, const struct in6_addr *addr, char *buf,
 	return -EINVAL;
 }

-static void handle_rtt_event(void *ctx, int cpu, void *data, __u32 data_size)
+static const char *proto_to_str(__u16 proto)
+{
+	static char buf[8];
+
+	switch (proto) {
+	case IPPROTO_TCP:
+		return "TCP";
+	case IPPROTO_ICMP:
+		return "ICMP";
+	case IPPROTO_ICMPV6:
+		return "ICMPv6";
+	default:
+		snprintf(buf, sizeof(buf), "%d", proto);
+		return buf;
+	}
+}
+
+static const char *flowevent_to_str(enum flow_event_type fe)
+{
+	switch (fe) {
+	case FLOW_EVENT_NONE:
+		return "none";
+	case FLOW_EVENT_OPENING:
+		return "opening";
+	case FLOW_EVENT_CLOSING:
+		return "closing";
+	default:
+		return "unknown";
+	}
+}
+
+static const char *eventreason_to_str(enum flow_event_reason er)
+{
+	switch (er) {
+	case EVENT_REASON_SYN:
+		return "SYN";
+	case EVENT_REASON_SYN_ACK:
+		return "SYN-ACK";
+	case EVENT_REASON_FIRST_OBS_PCKT:
+		return "first observed packet";
+	case EVENT_REASON_FIN:
+		return "FIN";
+	case EVENT_REASON_FIN_ACK:
+		return "FIN-ACK";
+	case EVENT_REASON_RST:
+		return "RST";
+	case EVENT_REASON_FLOW_TIMEOUT:
+		return "flow timeout";
+	default:
+		return "unknown";
+	}
+}
+
+static const char *eventsource_to_str(enum flow_event_source es)
+{
+	switch (es) {
+	case EVENT_SOURCE_EGRESS:
+		return "src";
+	case EVENT_SOURCE_INGRESS:
+		return "dest";
+	case EVENT_SOURCE_USERSPACE:
+		return "userspace-cleanup";
+	default:
+		return "unknown";
+	}
+}
+
+static void print_flow_ppvizformat(FILE *stream, const struct network_tuple *flow)
 {
-	const struct rtt_event *e = data;
 	char saddr[INET6_ADDRSTRLEN];
 	char daddr[INET6_ADDRSTRLEN];

-	format_ip_address(e->flow.ipv, &e->flow.saddr.ip, saddr, sizeof(saddr));
-	format_ip_address(e->flow.ipv, &e->flow.daddr.ip, daddr, sizeof(daddr));
+	format_ip_address(saddr, sizeof(saddr), flow->ipv, &flow->saddr.ip);
+	format_ip_address(daddr, sizeof(daddr), flow->ipv, &flow->daddr.ip);
+	fprintf(stream, "%s:%d+%s:%d", saddr, ntohs(flow->saddr.port), daddr,
+		ntohs(flow->daddr.port));
+}

-	printf("%llu.%06llu ms %s:%d+%s:%d\n", e->rtt / NS_PER_MS,
-	       e->rtt % NS_PER_MS, saddr, ntohs(e->flow.saddr.port), daddr,
-	       ntohs(e->flow.daddr.port));
+static void print_ns_datetime(FILE *stream, __u64 monotonic_ns)
+{
+	char timestr[9];
+	__u64 ts = convert_monotonic_to_realtime(monotonic_ns);
+	time_t ts_s = ts / NS_PER_SECOND;
+
+	strftime(timestr, sizeof(timestr), "%H:%M:%S", localtime(&ts_s));
+	fprintf(stream, "%s.%09llu", timestr, ts % NS_PER_SECOND);
+}
+
+static void print_event_standard(void *ctx, int cpu, void *data,
+				 __u32 data_size)
+{
+	const union pping_event *e = data;
+
+	if (e->event_type == EVENT_TYPE_RTT) {
+		print_ns_datetime(stdout, e->rtt_event.timestamp);
+		printf(" %llu.%06llu ms %llu.%06llu ms ",
+		       e->rtt_event.rtt / NS_PER_MS,
+		       e->rtt_event.rtt % NS_PER_MS,
+		       e->rtt_event.min_rtt / NS_PER_MS,
+		       e->rtt_event.min_rtt % NS_PER_MS);
+		print_flow_ppvizformat(stdout, &e->rtt_event.flow);
+		printf("\n");
+	} else if (e->event_type == EVENT_TYPE_FLOW) {
+		print_ns_datetime(stdout, e->flow_event.timestamp);
+		printf(" ");
+		print_flow_ppvizformat(stdout, &e->flow_event.flow);
+		printf(" %s due to %s from %s\n",
+		       flowevent_to_str(e->flow_event.event_info.event),
+		       eventreason_to_str(e->flow_event.event_info.reason),
+		       eventsource_to_str(e->flow_event.source));
+	}
+}
+
+static void print_event_ppviz(void *ctx, int cpu, void *data, __u32 data_size)
+{
+	const struct rtt_event *e = data;
+	__u64 time = convert_monotonic_to_realtime(e->timestamp);
+
+	if (e->event_type != EVENT_TYPE_RTT)
+		return;
+
+	printf("%llu.%09llu %llu.%09llu %llu.%09llu ", time / NS_PER_SECOND,
+	       time % NS_PER_SECOND, e->rtt / NS_PER_SECOND,
+	       e->rtt % NS_PER_SECOND, e->min_rtt / NS_PER_SECOND, e->min_rtt);
+	print_flow_ppvizformat(stdout, &e->flow);
+	printf("\n");
+}
+
+static void print_common_fields_json(json_writer_t *ctx,
+				     const union pping_event *e)
+{
+	const struct network_tuple *flow = &e->rtt_event.flow;
+	char saddr[INET6_ADDRSTRLEN];
+	char daddr[INET6_ADDRSTRLEN];
+
+	format_ip_address(saddr, sizeof(saddr), flow->ipv, &flow->saddr.ip);
+	format_ip_address(daddr, sizeof(daddr), flow->ipv, &flow->daddr.ip);
+
+	jsonw_u64_field(ctx, "timestamp",
+			convert_monotonic_to_realtime(e->rtt_event.timestamp));
+	jsonw_string_field(ctx, "src_ip", saddr);
+	jsonw_hu_field(ctx, "src_port", ntohs(flow->saddr.port));
+	jsonw_string_field(ctx, "dest_ip", daddr);
+	jsonw_hu_field(ctx, "dest_port", ntohs(flow->daddr.port));
+	jsonw_string_field(ctx, "protocol", proto_to_str(flow->proto));
+}
+
+static void print_rttevent_fields_json(json_writer_t *ctx,
+				       const struct rtt_event *re)
+{
+	jsonw_u64_field(ctx, "rtt", re->rtt);
+	jsonw_u64_field(ctx, "min_rtt", re->min_rtt);
+	jsonw_u64_field(ctx, "sent_packets", re->sent_pkts);
+	jsonw_u64_field(ctx, "sent_bytes", re->sent_bytes);
+	jsonw_u64_field(ctx, "rec_packets", re->rec_pkts);
+	jsonw_u64_field(ctx, "rec_bytes", re->rec_bytes);
+}
+
+static void print_flowevent_fields_json(json_writer_t *ctx,
+					const struct flow_event *fe)
+{
+	jsonw_string_field(ctx, "flow_event",
+			   flowevent_to_str(fe->event_info.event));
+	jsonw_string_field(ctx, "reason",
+			   eventreason_to_str(fe->event_info.reason));
+	jsonw_string_field(ctx, "triggered_by", eventsource_to_str(fe->source));
+}
+
+static void print_event_json(void *ctx, int cpu, void *data, __u32 data_size)
+{
+	const union pping_event *e = data;
+
+	if (e->event_type != EVENT_TYPE_RTT && e->event_type != EVENT_TYPE_FLOW)
+		return;
+
+	if (!json_ctx) {
+		json_ctx = jsonw_new(stdout);
+		jsonw_start_array(json_ctx);
+	}
+
+	jsonw_start_object(json_ctx);
+	print_common_fields_json(json_ctx, e);
+	if (e->event_type == EVENT_TYPE_RTT)
+		print_rttevent_fields_json(json_ctx, &e->rtt_event);
+	else // flow-event
+		print_flowevent_fields_json(json_ctx, &e->flow_event);
+	jsonw_end_object(json_ctx);
 }

 static void handle_missed_rtt_event(void *ctx, int cpu, __u64 lost_cnt)
@@ -593,17 +824,18 @@ int main(int argc, char *argv[])
 		.pin_dir = "/sys/fs/bpf/pping",
 		.packet_map = "packet_ts",
 		.flow_map = "flow_state",
-		.rtt_map = "rtt_events",
+		.event_map = "events",
 		.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST,
-		.force = false,
 	};

 	struct perf_buffer *pb = NULL;
 	struct perf_buffer_opts pb_opts = {
-		.sample_cb = handle_rtt_event,
+		.sample_cb = print_event_standard,
 		.lost_cb = handle_missed_rtt_event,
 	};

+	print_event_func = print_event_standard;
+
 	// Detect if running as root
 	if (geteuid() != 0) {
 		printf("This program must be run as root.\n");
@@ -626,6 +858,14 @@ int main(int argc, char *argv[])
 		return EXIT_FAILURE;
 	}

+	if (config.json_format) {
+		pb_opts.sample_cb = print_event_json;
+		print_event_func = print_event_json;
+	} else if (config.ppviz_format) {
+		pb_opts.sample_cb = print_event_ppviz;
+		print_event_func = print_event_ppviz;
+	}
+
 	err = load_attach_bpfprogs(&obj, &config, &tc_attached, &xdp_attached);
 	if (err) {
 		fprintf(stderr,
@@ -643,13 +883,13 @@ int main(int argc, char *argv[])

 	// Set up perf buffer
 	pb = perf_buffer__new(bpf_object__find_map_fd_by_name(obj,
-							      config.rtt_map),
+							      config.event_map),
 			      PERF_BUFFER_PAGES, &pb_opts);
 	err = libbpf_get_error(pb);
 	if (err) {
 		pb = NULL;
 		fprintf(stderr, "Failed to open perf buffer %s: %s\n",
-			config.rtt_map, strerror(err));
+			config.event_map, strerror(err));
 		goto cleanup;
 	}

@@ -695,5 +935,10 @@ cleanup:
 				config.pin_dir, strerror(-err));
 	}

+	if (config.json_format && json_ctx) {
+		jsonw_end_array(json_ctx);
+		jsonw_destroy(&json_ctx);
+	}
+
 	return err != 0;
 }
--- a/pping/pping.h
+++ b/pping/pping.h
@@ -4,10 +4,37 @@

 #include <linux/types.h>
 #include <linux/in6.h>
+#include <stdbool.h>

 #define INGRESS_PROG_SEC "xdp"
 #define EGRESS_PROG_SEC "classifier"

+/* For the event_type members of rtt_event and flow_event */
+#define EVENT_TYPE_FLOW 1
+#define EVENT_TYPE_RTT 2
+
+enum __attribute__((__packed__)) flow_event_type {
+	FLOW_EVENT_NONE,
+	FLOW_EVENT_OPENING,
+	FLOW_EVENT_CLOSING
+};
+
+enum __attribute__((__packed__)) flow_event_reason {
+	EVENT_REASON_SYN,
+	EVENT_REASON_SYN_ACK,
+	EVENT_REASON_FIRST_OBS_PCKT,
+	EVENT_REASON_FIN,
+	EVENT_REASON_FIN_ACK,
+	EVENT_REASON_RST,
+	EVENT_REASON_FLOW_TIMEOUT
+};
+
+enum __attribute__((__packed__)) flow_event_source {
+	EVENT_SOURCE_EGRESS,
+	EVENT_SOURCE_INGRESS,
+	EVENT_SOURCE_USERSPACE
+};
+
 struct bpf_config {
 	__u64 rate_limit;
 };
@@ -39,7 +66,12 @@ struct network_tuple {
 };

 struct flow_state {
+	__u64 min_rtt;
 	__u64 last_timestamp;
+	__u64 sent_pkts;
+	__u64 sent_bytes;
+	__u64 rec_pkts;
+	__u64 rec_bytes;
 	__u32 last_id;
 	__u32 reserved;
 };
@@ -49,10 +81,52 @@ struct packet_id {
 	__u32 identifier; //tsval for TCP packets
 };

+/*
+ * An RTT event message that can be passed from the bpf-programs to user-space.
+ * The initial event_type memeber is used to allow multiplexing between
+ * different event types in a single perf buffer. Memebers up to and including
+ * flow are identical to other event types.
+ * Uses explicit padding instead of packing based on recommendations in cilium's
+ * BPF reference documentation at https://docs.cilium.io/en/stable/bpf/#llvm.
+ */
 struct rtt_event {
-	__u64 rtt;
+	__u64 event_type;
+	__u64 timestamp;
 	struct network_tuple flow;
+	__u32 padding;
+	__u64 rtt;
+	__u64 min_rtt;
+	__u64 sent_pkts;
+	__u64 sent_bytes;
+	__u64 rec_pkts;
+	__u64 rec_bytes;
 	__u32 reserved;
 };

+struct flow_event_info {
+	enum flow_event_type event;
+	enum flow_event_reason reason;
+};
+
+/*
+ * A flow event message that can be passed from the bpf-programs to user-space.
+ * The initial event_type memeber is used to allow multiplexing between
+ * different event types in a single perf buffer. Memebers up to and including
+ * flow are identical to other event types.
+ */
+struct flow_event {
+	__u64 event_type;
+	__u64 timestamp;
+	struct network_tuple flow;
+	struct flow_event_info event_info;
+	enum flow_event_source source;
+	__u8 reserved;
+};
+
+union pping_event {
+	__u64 event_type;
+	struct rtt_event rtt_event;
+	struct flow_event flow_event;
+};
+
 #endif
--- a/pping/pping_kern.c
+++ b/pping/pping_kern.c
@@ -60,7 +60,7 @@ struct {
 	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
 	__uint(key_size, sizeof(__u32));
 	__uint(value_size, sizeof(__u32));
-} rtt_events SEC(".maps");
+} events SEC(".maps");

 // Help functions

@@ -130,14 +130,15 @@ static int parse_tcp_ts(struct tcphdr *tcph, void *data_end, __u32 *tsval,

 /*
 * Attempts to fetch an identifier for TCP packets, based on the TCP timestamp
- * option. If sucessful, identifier will be set to TSval if is_ingress, TSecr
- * otherwise, the port-members of saddr and daddr will be set the the TCP source
- * and dest, respectively, and 0 will be returned. On failure, -1 will be
- * returned. Additionally, if the connection is closing (FIN or RST flag), sets
- * flow_closing to true.
+ * option.
+ * If successful, identifier will be set to TSval if is_ingress, or TSecr
+ * otherwise, the port-members of saddr and daddr will be set to the TCP source
+ * and dest, respectively, fei will be filled appropriately (based on
+ * SYN/FIN/RST) and 0 will be returned.
+ * On failure, -1 will be returned.
 */
 static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
-				__be16 *dport, bool *flow_closing,
+				__be16 *dport, struct flow_event_info *fei,
 				__u32 *identifier)
 {
 	__u32 tsval, tsecr;
@@ -146,18 +147,27 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
 	if (parse_tcphdr(&ctx->nh, ctx->data_end, &tcph) < 0)
 		return -1;

-	// Check if connection is closing
-	if (tcph->fin || tcph->rst) {
-		*flow_closing = true;
-		/* bpf_printk("Detected connection closing on %d\n", */
-		/* 	   ctx->is_egress); //Upsets verifier? */
-	}
-
 	// Do not timestamp pure ACKs
 	if (ctx->is_egress && ctx->nh.pos - ctx->data >= ctx->pkt_len &&
 	    !tcph->syn)
 		return -1;

+	// Check if connection is opening/closing
+	if (tcph->syn) {
+		fei->event = FLOW_EVENT_OPENING;
+		fei->reason =
+			tcph->ack ? EVENT_REASON_SYN_ACK : EVENT_REASON_SYN;
+	} else if (tcph->rst) {
+		fei->event = FLOW_EVENT_CLOSING;
+		fei->reason = EVENT_REASON_RST;
+	} else if (!ctx->is_egress && tcph->fin) {
+		fei->event = FLOW_EVENT_CLOSING;
+		fei->reason =
+			tcph->ack ? EVENT_REASON_FIN_ACK : EVENT_REASON_FIN;
+	} else {
+		fei->event = FLOW_EVENT_NONE;
+	}
+
 	if (parse_tcp_ts(tcph, ctx->data_end, &tsval, &tsecr) < 0)
 		return -1; //Possible TODO, fall back on seq/ack instead

@@ -170,7 +180,7 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
 /*
 * Attempts to parse the packet limited by the data and data_end pointers,
 * to retrieve a protocol dependent packet identifier. If sucessful, the
- * pointed to p_id will be filled with parsed information from the packet
+ * pointed to p_id and fei will be filled with parsed information from the
 * packet, and 0 will be returned. On failure, -1 will be returned.
 * If is_egress saddr and daddr will match source and destination of packet,
 * respectively, and identifier will be set to the identifer for an outgoing
@@ -179,7 +189,8 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
 * set to the identifier of a response.
 */
 static int parse_packet_identifier(struct parsing_context *ctx,
-				   struct packet_id *p_id, bool *flow_closing)
+				   struct packet_id *p_id,
+				   struct flow_event_info *fei)
 {
 	int proto, err;
 	struct ethhdr *eth;
@@ -201,18 +212,18 @@ static int parse_packet_identifier(struct parsing_context *ctx,
 	// Parse IPv4/6 header
 	if (proto == bpf_htons(ETH_P_IP)) {
 		p_id->flow.ipv = AF_INET;
-		proto = parse_iphdr(&ctx->nh, ctx->data_end, &iph);
+		p_id->flow.proto = parse_iphdr(&ctx->nh, ctx->data_end, &iph);
 	} else if (proto == bpf_htons(ETH_P_IPV6)) {
 		p_id->flow.ipv = AF_INET6;
-		proto = parse_ip6hdr(&ctx->nh, ctx->data_end, &ip6h);
+		p_id->flow.proto = parse_ip6hdr(&ctx->nh, ctx->data_end, &ip6h);
 	} else {
 		return -1;
 	}

 	// Add new protocols here
-	if (proto == IPPROTO_TCP) {
+	if (p_id->flow.proto == IPPROTO_TCP) {
 		err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port,
-					   flow_closing, &p_id->identifier);
+					   fei, &p_id->identifier);
 		if (err)
 			return -1;
 	} else {
@@ -230,6 +241,32 @@ static int parse_packet_identifier(struct parsing_context *ctx,
 	return 0;
 }

+/*
+ * Returns the number of unparsed bytes left in the packet (bytes after nh.pos)
+ */
+static __u32 remaining_pkt_payload(struct parsing_context *ctx)
+{
+	// pkt_len - (pos - data) fails because compiler transforms it to pkt_len - pos + data (pkt_len - pos not ok because value - pointer)
+	// data + pkt_len - pos fails on (data+pkt_len) - pos due to math between pkt_pointer and unbounded register
+	__u32 parsed_bytes = ctx->nh.pos - ctx->data;
+	return parsed_bytes < ctx->pkt_len ? ctx->pkt_len - parsed_bytes : 0;
+}
+
+/*
+ * Fills in event_type, timestamp, flow, source and reserved.
+ * Does not fill in the flow_info.
+ */
+static void fill_flow_event(struct flow_event *fe, __u64 timestamp,
+			    struct network_tuple *flow,
+			    enum flow_event_source source)
+{
+	fe->event_type = EVENT_TYPE_FLOW;
+	fe->timestamp = timestamp;
+	__builtin_memcpy(&fe->flow, flow, sizeof(struct network_tuple));
+	fe->source = source;
+	fe->reserved = 0; // Make sure it's initilized
+}
+
 // Programs

 // TC-BFP for parsing packet identifier from egress traffic and add to map
@@ -237,7 +274,8 @@ SEC(EGRESS_PROG_SEC)
 int pping_egress(struct __sk_buff *skb)
 {
 	struct packet_id p_id = { 0 };
-	__u64 p_ts;
+	struct flow_event fe;
+	__u64 now;
 	struct parsing_context pctx = {
 		.data = (void *)(long)skb->data,
 		.data_end = (void *)(long)skb->data_end,
@@ -245,63 +283,56 @@ int pping_egress(struct __sk_buff *skb)
 		.nh = { .pos = pctx.data },
 		.is_egress = true,
 	};
-	bool flow_closing = false;
 	struct flow_state *f_state;
 	struct flow_state new_state = { 0 };

-	if (parse_packet_identifier(&pctx, &p_id, &flow_closing) < 0)
+	if (parse_packet_identifier(&pctx, &p_id, &fe.event_info) < 0)
 		goto out;

-	// Delete flow and create no timestamp entry if flow is closing
-	if (flow_closing) {
-		bpf_map_delete_elem(&flow_state, &p_id.flow);
+	now = bpf_ktime_get_ns(); // or bpf_ktime_get_boot_ns
+	f_state = bpf_map_lookup_elem(&flow_state, &p_id.flow);
+
+	// Flow closing - try to delete flow state and push closing-event
+	if (fe.event_info.event == FLOW_EVENT_CLOSING) {
+		if (!f_state) {
+			bpf_map_delete_elem(&flow_state, &p_id.flow);
+			fill_flow_event(&fe, now, &p_id.flow,
+					EVENT_SOURCE_EGRESS);
+			bpf_perf_event_output(skb, &events, BPF_F_CURRENT_CPU,
+					      &fe, sizeof(fe));
+		}
 		goto out;
 	}

-	// Check flow state
-	f_state = bpf_map_lookup_elem(&flow_state, &p_id.flow);
-	if (!f_state) { // No previous state - attempt to create it
+	// No previous state - attempt to create it and push flow-opening event
+	if (!f_state) {
 		bpf_map_update_elem(&flow_state, &p_id.flow, &new_state,
 				    BPF_NOEXIST);
 		f_state = bpf_map_lookup_elem(&flow_state, &p_id.flow);
-		if (!f_state)
+
+		if (!f_state) // Creation failed
 			goto out;
+
+		if (fe.event_info.event != FLOW_EVENT_OPENING) {
+			fe.event_info.event = FLOW_EVENT_OPENING;
+			fe.event_info.reason = EVENT_REASON_FIRST_OBS_PCKT;
+		}
+		fill_flow_event(&fe, now, &p_id.flow, EVENT_SOURCE_EGRESS);
+		bpf_perf_event_output(skb, &events, BPF_F_CURRENT_CPU, &fe,
+				      sizeof(fe));
 	}

-	// Check if identfier is new
-	/* The gap between checking and updating last_id may cause concurrency
-	 * issues where multiple packets may simultaneously think they are the
-	 * first with a new identifier. As long as all of the identifiers are
-	 * the same though, only one should be able to create a timestamp entry.
+	f_state->sent_pkts++;
+	f_state->sent_bytes += remaining_pkt_payload(&pctx);

-	 * A bigger issue is that older identifiers (for example due to
-         * out-of-order packets) may pass this check and update the current
-	 * identifier to an old one. This means that both the packet with the
-	 * old identifier itself as well the next packet with the current
-	 * identifier may be considered packets with new identifiers (even if
-	 * both have been seen before). For TCP timestamps this could be
-	 * prevented by changing the check to '>=' instead, but it may not be
-	 * suitable for other protocols, such as QUIC and its spinbit.
-	 *
-	 * For now, just hope that the rate limit saves us from creating an
-	 * incorrect timestamp. That may however also fail, either due to the
-	 * to it happening in a time it's not limited by rate sampling, or
-	 * because of rate check failing due to concurrency issues.
-	 */
+	// Check if identfier is new
 	if (f_state->last_id == p_id.identifier)
 		goto out;
 	f_state->last_id = p_id.identifier;

 	// Check rate-limit
-	/*
-	 * The window between checking and updating last_timestamp may cause
-	 * concurrency issues, where multiple packets simultaneously pass the
-	 * rate limit. However, as long as they have the same identifier, only
-	 * a single timestamp entry should successfully be created.
-	 */
-	p_ts = bpf_ktime_get_ns(); // or bpf_ktime_get_boot_ns
-	if (p_ts < f_state->last_timestamp ||
-	    p_ts - f_state->last_timestamp < config.rate_limit)
+	if (now < f_state->last_timestamp ||
+	    now - f_state->last_timestamp < config.rate_limit)
 		goto out;

 	/*
@@ -310,8 +341,8 @@ int pping_egress(struct __sk_buff *skb)
 	 * the next available map slot somewhat fairer between heavy and sparse
 	 * flows.
 	 */
-	f_state->last_timestamp = p_ts;
-	bpf_map_update_elem(&packet_ts, &p_id, &p_ts, BPF_NOEXIST);
+	f_state->last_timestamp = now;
+	bpf_map_update_elem(&packet_ts, &p_id, &now, BPF_NOEXIST);

 out:
 	return BPF_OK;
@@ -323,7 +354,9 @@ int pping_ingress(struct xdp_md *ctx)
 {
 	struct packet_id p_id = { 0 };
 	__u64 *p_ts;
-	struct rtt_event event = { 0 };
+	struct flow_event fe;
+	struct rtt_event re = { 0 };
+	struct flow_state *f_state;
 	struct parsing_context pctx = {
 		.data = (void *)(long)ctx->data,
 		.data_end = (void *)(long)ctx->data_end,
@@ -331,30 +364,51 @@ int pping_ingress(struct xdp_md *ctx)
 		.nh = { .pos = pctx.data },
 		.is_egress = false,
 	};
-	bool flow_closing = false;
+	__u64 now;

-	if (parse_packet_identifier(&pctx, &p_id, &flow_closing) < 0)
+	if (parse_packet_identifier(&pctx, &p_id, &fe.event_info) < 0)
 		goto out;

-	// Delete flow, but allow final attempt at RTT calculation
-	if (flow_closing)
-		bpf_map_delete_elem(&flow_state, &p_id.flow);
+	f_state = bpf_map_lookup_elem(&flow_state, &p_id.flow);
+	if (!f_state)
+		goto out;

+	f_state->rec_pkts++;
+	f_state->rec_bytes += remaining_pkt_payload(&pctx);
+
+	now = bpf_ktime_get_ns();
 	p_ts = bpf_map_lookup_elem(&packet_ts, &p_id);
-	if (!p_ts)
-		goto out;
+	if (!p_ts || now < *p_ts)
+		goto validflow_out;

-	event.rtt = bpf_ktime_get_ns() - *p_ts;
-	/*
-	 * Attempt to delete timestamp entry as soon as RTT is calculated.
-	 * But could have potential concurrency issue where multiple packets
-	 * manage to match against the identifier before it can be deleted.
-	 */
+	re.rtt = now - *p_ts;
+
+	// Delete timestamp entry as soon as RTT is calculated
 	bpf_map_delete_elem(&packet_ts, &p_id);

-	__builtin_memcpy(&event.flow, &p_id.flow, sizeof(struct network_tuple));
-	bpf_perf_event_output(ctx, &rtt_events, BPF_F_CURRENT_CPU, &event,
-			      sizeof(event));
+	if (f_state->min_rtt == 0 || re.rtt < f_state->min_rtt)
+		f_state->min_rtt = re.rtt;
+
+	re.event_type = EVENT_TYPE_RTT;
+	re.timestamp = now;
+	re.min_rtt = f_state->min_rtt;
+	re.sent_pkts = f_state->sent_pkts;
+	re.sent_bytes = f_state->sent_bytes;
+	re.rec_pkts = f_state->rec_pkts;
+	re.rec_bytes = f_state->rec_bytes;
+
+	// Push event to perf-buffer
+	__builtin_memcpy(&re.flow, &p_id.flow, sizeof(struct network_tuple));
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &re, sizeof(re));
+
+validflow_out:
+	// Wait with deleting flow until having pushed final RTT message
+	if (fe.event_info.event == FLOW_EVENT_CLOSING && f_state) {
+		bpf_map_delete_elem(&flow_state, &p_id.flow);
+		fill_flow_event(&fe, now, &p_id.flow, EVENT_SOURCE_INGRESS);
+		bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &fe,
+				      sizeof(fe));
+	}

 out:
 	return XDP_PASS;