From e7201c9eab4d7bf547a1185d67e839d68b6ccd9c Mon Sep 17 00:00:00 2001
From: Simon Sundberg <simon.sundberg@kau.se>
Date: Thu, 17 Feb 2022 12:29:01 +0100
Subject: [PATCH] pping: Update TODO based on map cleaning improvements

Also add description of three potential issues introduced by the
changes to cleanup process.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
---
 pping/TODO.md | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 64 insertions(+), 9 deletions(-)

diff --git a/pping/TODO.md b/pping/TODO.md
index ae7e782..588659d 100644
--- a/pping/TODO.md
+++ b/pping/TODO.md
@@ -35,11 +35,6 @@
     - Original pping checks if flow is bi-directional before adding
       timestamps, but this could miss shorter flows
 - [ ] Dynamically grow the maps if they are starting to get full
-- [ ] Improve map cleaning: Use a dynamic time to live for map entries
-      based on flow's RTT, instead of static 10s limit
-  - Keeping entries around for a long time allows the map to grow
-    unnecessarily large, which slows down the cleaning and may block
-    new entries
 - [ ] Use libxdp to load XDP program
 
 ## Done
@@ -63,6 +58,15 @@
 - [x] Add timestamps to output (as original pping)
 - [x] Add support for other hooks
   - TC-BFP on ingress instead of XDP
+- [x] Improve map cleaning:
+  - [x] Use BPF iter to clean up maps with BPF programs instead of
+        looping through entries in userspace.
+  - [x] Use a dynamic time to live for map entries based on flow's RTT
+        instead of static 10s limit
+    - Keeping entries around for a long time allows the map to grow
+    unnecessarily large, which slows down the cleaning and may block
+    new entries
+
 
 # Potential issues
 ## Limited information in different output formats
@@ -90,10 +94,10 @@ delete its flow-state entry (and send a timely flow closing event).
 
 A consequence of this is that the ICMP flow entries will stick around
 and occupy a space in the flow state map until they are cleaned out by
-the periodic cleanup process. The current timeout for inactive flows
-is a very generous 5 minutes, which means a lot of short ICMP flows
-could quickly fill up the flow map and potentially block other flows
-for a long while.
+the periodic cleanup process. The current timeout for ICMP flows is 30
+seconds, which means a lot of short ICMP flows could quickly fill up
+the flow map and potentially block other flows for a considerable
+time.
 
 ## RTT-based sampling
 The RTT-based sampling features means that timestamp entries may only
@@ -111,6 +115,57 @@ growing faster than it shrinks, as if it starts with a low RTT it will
 quickly update it to higher RTTs, but with high RTTs it will take
 longer for it do decrease to a lower RTT again.
 
+## Losing debug/warning information
+The "map full" and "map cleaning" events are pushed through the same
+perf-buffer as the rtt and flow events. Just like rtt events may be
+lost if too many of them are pushed, these debug/warning messages may
+also be lost if there's a lot of other events being pushed. In case
+one considers these debug/warning messages more critical than the
+normal RTT reports, it may be worth considering pushing them through a
+separate channel to make it less likely that they are lost.
+
+## RTT-based cleanup of timestamp entries
+For most flows the RTT will likely be well below one second, which
+allows for removing timestamp entries much earlier than the 10s static
+limit. This helps keep the timestamp map size down, thereby decreasing
+the risk of there being no room for new entries as well as reducing
+the number of entries the cleanup process must go through. However, if
+the RTT for a flow grows quickly (due to ex. buffer bloat) then the
+actual RTT of the flow may increase to beyond 8 times the initial RTT
+before ePPing has collected enough RTT samples to increase sRTT to a
+similar level. This may cause timestamps being deleted too early
+before they have time to actually match against a reply, in which case
+one also loses the RTT sample that would be used to update the sRTT
+causing the sRTT to remain too low.
+
+In practice this issue is limited by the fact that the cleanup process
+only runs periodically, so even for flows with a very low RTT the
+average time to delete a timestamp entry would still be 500ms (at
+default rate of running at 1Hz). But this also limits the usefulness
+of trying to delete entries earlier.
+
+Overall, a better approach in the future might be to get rid of the
+periodic cleanup process entirely, and instead only evict old entries
+if they're blocking a new entry from being inserted. This would be
+similar to using LRU maps, but would need some way to prevent an
+existing entry from being removed in case it's too young.
+
+## Periodical map cleanup may miss entries on delete
+Due to how the hash map traversal is implemented, all entries may not
+be traversed each time a map is iterated through. Specifically, if an
+entry is deleted and there are other entries remaining in the same
+bucket, those remaining entries will not be traversed. As the
+periodical cleanup may delete entries as it is traversing the map,
+this may result in some of the entries which share the bucket
+with deleted entries not always be traversed.
+
+In general this should not cause a large problem as those entries will
+simply be traversed the next time the map is iterated over
+instead. However, it may cause certain entries to remain in the hash
+map a bit longer than expected (and if the map is full subsequently
+block new entries from being created for that duration)
+
+
 ## Concurrency issues
 
 The program uses "global" (not `PERCPU`) hash maps to keep state. As