mirror of
https://github.com/rtbrick/bngblaster.git
synced 2024-05-06 15:54:57 +00:00
94 lines
4.3 KiB
ReStructuredText
94 lines
4.3 KiB
ReStructuredText
.. _performance:
|
|
|
|
Performance Guide
|
|
=================
|
|
|
|
The BNG Blaster handles all traffic sent and received (I/O) in the main thread per default.
|
|
With this default behavior, you can achieve between 100.000 and 250.000 PPS bidirectional
|
|
traffic in most environments. Depending on the actual setup, this can be even less or much
|
|
more, which is primarily driven by the single-thread performance of the given CPU.
|
|
|
|
Those numbers can be increased by splitting the workload over multiple I/O worker threads.
|
|
Every I/O thread will handle only one interface and direction. It is also possible to start
|
|
multiple threads for the same interface and direction.
|
|
|
|
The number of I/O threads can be configured globally for all interfaces or per interface link.
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"interfaces": {
|
|
"rx-threads": 2,
|
|
"tx-threads": 1,
|
|
"links": [
|
|
{
|
|
"interface": "eth1",
|
|
"rx-threads": 4,
|
|
"tx-threads": 2,
|
|
}
|
|
]
|
|
}
|
|
}
|
|
|
|
The configuration per interface link allows asymmetric thread pools. Assuming you would send
|
|
massive unidirectional traffic from eth1 to eth2. In such a scenario, you would set up multiple
|
|
TX threads and one RX thread on eth1. For eth2 you would do the opposite, meaning to set up
|
|
multiple RX threads but only one TX thread.
|
|
|
|
It is also possible to start dedicated threads for TX but remain RX in the main thread or
|
|
vice versa by setting the number of threads to zero (default).
|
|
|
|
With multithreading, you should be able to scale up to at least 1 million PPS bidirectional, depending on
|
|
the actual configuration and setup. This allows starting 1 million flows with 1 PPS per flow over
|
|
at least 4 TX threads to verify all prefixes of a BGP full table for example.
|
|
|
|
The configured traffic streams are automatically balanced over all TX threads of the corresponding
|
|
interfaces but a single stream can't be split over multiple threads to prevent re-ordering issues.
|
|
|
|
Enabling multithreaded I/O causes some limitations. First of all, it works only on systems with
|
|
CPU cache coherence, which should apply to all modern CPU architectures. TX threads are not allowed
|
|
for LAG (Link Aggregation) interfaces but RX threads are supported. It is also not possible to capture
|
|
traffic streams send or received on threaded interfaces. All other traffic is still captured on threaded
|
|
interfaces.
|
|
|
|
.. note::
|
|
|
|
The BNG Blaster is currently tested for 1 million PPS with 1 million flows, which is not a
|
|
hard limitation but everything above should be considered with caution. It is also possible to
|
|
scale far beyond using DPDK-enabled interfaces.
|
|
|
|
A single stream will be always handled by a single thread to prevent re-ordering. The single stream
|
|
performance is limited by the TX interval multiplied by max bust size (`traffic->max-burst`) which
|
|
is 32 in the default configuration. Therefore each stream is limited to around 32K PPS per default.
|
|
This can be increased by changing the TX interval. With a TX interval of `0.1`, the single stream
|
|
performance increases to 320K PPS. The max burst size is should not be increased to prevent microbursts.
|
|
|
|
The following settings are recommended for most tests with 1M PPS or beyond.
|
|
|
|
.. code-block:: json
|
|
|
|
{
|
|
"interfaces": {
|
|
"tx-threads": 4,
|
|
"tx-interval": 0.01,
|
|
"rx-threads": 4,
|
|
"rx-interval": 0.1,
|
|
"io-slots": 32768
|
|
}
|
|
}
|
|
|
|
It is also recommended to increase the hardware and software queue size of your
|
|
network interface links to the maximum for higher throughput as explained
|
|
in the :ref:`Operating System Settings <interfaces>`.
|
|
|
|
The packet receives performance is also limited by the abilities of your network
|
|
interfaces to properly distribute the traffic over multiple hardware queues. Some
|
|
network interfaces are not able to distribute traffic based on VLAN or PPPoE session
|
|
identifiers. In this case, all traffic is received by the same hardware queue and
|
|
corresponding thread. If CPU utilization is not properly distributed over all
|
|
cores, this could be the reason.
|
|
|
|
.. note::
|
|
|
|
We are continuously working to increase performance. Contributions, proposals,
|
|
or recommendations on how to further increase performance are welcome! |