When setting up GoFlow2 for the first time, it is difficult to estimate the settings and resources required.
This software has been tested with hundreds of thousands of flows per second on common hardware but the default settings may not be optimal everywhere.
It is important to understand the pattern of your flows.
Some environments have predictable trends, for instance a regional ISP will likely have a peak of traffic at 20:00 local time,
whereas a hosting provider may have large bursts of traffic due to a DDoS attack.
We need to consider the following:
* R: The rate of packets (controlled by sampling and traffic)
* C: The decoding capacity of a worker (dependent on CPU)
* L: The allowed latency (dependent on buffer size)
In a typical environment, capacity matches or exceeds the rate (C >= R).
When the rate goes above the capacity (eg: bursts), packets waiting to be processed pile up.
Latency increases as long as the rate exceeds the capacity. It remains stable if the rate equals the capacity.
It can only lower when there is extra capacity (C-R).
A buffer too large can cause "buffer bloat" where latency is too high for normal operations (eg: DDoS detection being delayed),
whereas a short buffer (or no buffer for real-time) may drop information during an temporary increase.
The listen URI can be customized to meet an environment requirements.
GoFlow2 will work better in an environment with guaranteed resources.
## Life of a packet
When a packet is received by the collectors' machine, the kernel will send the packet towards a socket.
The socket is buffered. On Linux, the buffersize is a global configuration setting: `rmem_max`.
If the buffer is full, new packets will be discarded and increasing the count of
UDP errors.
A first level of load-balancing can be done by having multiple sockets listening
on the same port.
On Linux, this is done with `SO_REUSEPORT` and `SO_REUSEADDRESS` options.
In GoFlow2 you can set the `count` option to define the number of sockets.
Each socket will put the packet in a queue to be decoded.
The number of `workers` should ideally match the number of CPUs available.