Before this PR, when graceful restart was configured for a neighbor
and when the restart flag was set by the restarting speaker, if
the neighbor was not advertising the GR capability, the initial
paths list was never sent by the restarting speaker to its neighbor
This is a problem when the server is configured with graceful
restart for all its peers without knowing if the peer supports it.
If some of the peers don't support it, they may never receive the
routes from the restarting speaker, leading to an inconsistent
routing state.
The gRPC code paths uses different functions than the BGP code path.
Thus is did not receive the fix for the mac mobility handling.
Fixes: c393f43 ("evpn: fix quadratic evpn mac-mobility handling")
Func capabilitiesFromConfig was always taken under the read lock.
However, when graceful restart is enabled for some families, this
function writes to the neighbor config which creates a data race.
Send Cease/Hard Reset notification for certain scenario when graceful
restart + notification support (RFC8538) are enabled. In this
implementation, we follow the suggestion of RFC8538 and map following
notification subcodes to Hard Reset subcode.
1. BGP_ERROR_SUB_MAXIMUM_NUMBER_OF_PREFIXES_REACHED
In this case, GoBGP is in the resource shortage and not working
properly. Thus, the peer should stop forwarding packet immediately.
2. BGP_ERROR_SUB_ADMINISTRATIVE_SHUTDOWN
This happens when the user uses DisablePeer API. This clearly indicates
user's intention of shutting down the session. Thus, we should send Hard
Reset.
3. BGP_ERROR_SUB_PEER_DECONFIGURED
This happens when the user uses DeletePeer API or StopBgp API or there's
an ASN mismatch found in the Open phase. The former two cases, the user
shows the intention to shutdown the session, so we should Hard Reset.
The latter case is not so obvious, but I think it's ok to do Hard Reset
because it is an unrecoverable error that cannot be solved without
user's involvement.
4. BGP_ERROR_SUB_HARD_RESET
This case currently doesn't exist, but obviously we should send Hard
Reset when someone explicitly specifies it.
The behavior for the remaining subcodes are unchanged. We may want to
expose a knob to adjust the behavior of BGP_ERROR_SUB_ADMINISTRATIVE_RESET
as suggested by RFC8538, but for this initial implementation, we kept it
as is.
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
This patch adds an 'identifier %d' option to the route add / del
CLI. This is helpful for testing add-paths capabilities in vpv46
contexts.
Signed-off-by: Nathan Skrzypczak <nathan.skrzypczak@gmail.com>
Currently, even though `ApplyPolicy` is called for determining the
accepted routes after apply policy, the new route with attribute
modifications is not returned. This is problematic for gRPC API users.
Tests are added for all four cases that were described in
https://github.com/osrg/gobgp/issues/2765. This PR makes the behaviour
correct for "Case/Attempt 2" described in the issue.
- Changes table.(*packerV4).pack.func1 (split) to adjust the max
parameter before using it to allocate slice size. Previously the full
max size was allocated then max was (possibly) truncated before
further use.
- Compares IPs using net.IP.Equal instead of using a string comparison
in order to avoid unnecessary allocations.
- Adds peer.routerID to access IP, called by peer.RouterID.
Moves a number of heavily called debug lines behind if checks to avoid
needless allocations of Fields objects and stringification of fields.
For cases where the server is not set to "debug" log level, these fields
were allocated on the heap and then immediately discarded - as well a
number of these were stringifying state / NLRIs regardless of log level.
In servers with significant amounts of routes and BGP peers, this lead
to a large amount of wasted allocations - in our case looking at Go's
memory profiler, 25% of all allocations were from these lines alone.
Fixes a deadlock condition that can happen if StopBgp is called when the
pending connection queue is full. During teardown, StopBgp calls a
mgmtOp on the server goroutine which attempts to stop the goroutine
accepting inbound connections, and waits for it to finish before
continuing.
This connection goroutine can block if the connection queue is full,
which is read by the same goroutine that processes all mgmtOps. This
means that if the queue is full and the goroutine is currently blocked,
then calling StopBgp will lead to a complete deadlock, as the connection
goroutine will never close as it is trying to send to the queue, but the
queue will not be read as the server goroutine is currently waiting for
the connection goroutine to exit.
To correct this, a context has been added that gets passed to the
connection goroutine. This is then checked in a new select statement on
the connection queue which gets cancelled by tcpListener.Close() ensuring
the goroutine exits correctly even if the queue is full.
Additionally change the `Key` to be the listener address, which is hopefully
more useful than a difficult-to-decipher dump of the listener struct.
The previous behavior would result in log lines like the following, even if
nothing went wrong:
```
time="2024-01-31T17:30:25Z" level=warning msg="accept failed" Error="<nil>" Key="&{0x140002e4000 {<nil> 0 0}}" Topic=grpc
```
With this change, the message is only logged if there was an error, and it will look like this:
```
time="2024-01-31T17:40:25Z" level=warning msg="accept failed" Error="lolol just testing" Key="127.0.0.1:59289" Topic=grpc
```
Currently, graceful restart waits for the EoR message for all address
families "enabled" for the peer, but it should only wait for "received"
address families (the address families the peer is capable of handling).
Fixes: #2524
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
This patch adds a special case in the destination hashmap for EVPN
Type-2 routes, to index them by MAC address. This allows for direct
access to the destination struct, instead of iterating over all
destination and all paths.
In effect, this replaces an iteration over all known paths by a quick
lookup to the MAC, leaving only an iteration to multiple paths to the
same MAC (e.g. multihoming or through multiple VNIs).
The practical effect is a reasonable convergence time for large EVPN
instances.
- before: 6m 7s
- after: 11s
The comparison was performed on a Xeon Silver 4209T, and an EVPN
instance comprising of 13k EVPN type-2 routes. The time is measured
by comparing the timestamp of the first and the last routes logged by
the cli's monitor mode.
Given the extreme difference, no further work was done for a more
accurate measurment.
It could be that a peer gets deleted and added back during the
transition from active state to open confirm. In that case, the TCP
connection from the old version of the peer could still be up. This
is a problem if the server is a listener only as the remote peer
would consider the old TCP connection as being valid and it won't
be able to connect until the TCP connection is eventually cleaned
by the Golang GC.
Adds usage of the "prefix-based" TCP MD5 for dynamic
neighbors. Non-dynamic neighbors will continue to use
non-prefix based, which makes it more compatible with
running on older kernels, as only 4.14+ includes the
necessary support.
This change also includes tests of dynamic peers in general.
This method was attempting to read from peer.fsm before acquiring a read
lock, leading to a data race as this struct is written by a different
goroutine in parallel. Commit moves the call to RLock before the first
read from the struct.