History log of /freebsd-current/sys/netinet/tcp_lro.h
Revision Date Author Comments
# 2c6fc36a 04-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

hpts/lro: make tcp_lro_flush_tcphpts() and tcp_run_hpts() pointers

Rename tcp_run_hpts() to tcp_hpts_softlock() to better describe its
function. This makes loadable hpts.ko working correctly with LRO.

Reviewed by: tuexen, rrs
Differential Revision: https://reviews.freebsd.org/D42858


# 4f9c93f1 04-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

lro: separate HPTS specific code into tcp_lro_hpts.c

Put same copyright header as tcp_hpts.c has, since all this code
was developed by Randall Stewart <rrs@FreeBSD.org> as a part of
the HPTS work. Also copy Mellanox copyright from tcp_lro.c as
Hans Petter Selasky also participated in restructuring the code.

Reviewed by: imp, tuexen, rrs
Differential Revision: https://reviews.freebsd.org/D42854


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# e0d8add4 28-Nov-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

tcp_lro: Fix for undefined behaviour.

Make sure the size of the raw[] array in the lro_address union is
correctly set at compile time, so that static code analysis tools
do not report undefined behaviour.

MFC after: 1 week
Sponsored by: NVIDIA Networking


# b4f60fab 10-Feb-2022 Mark Johnston <markj@FreeBSD.org>

tcp: Avoid conditionally defined fields in union lro_address

The layout of the structure ends up depending on whether the including
file includes opt_inet.h and opt_inet6.h, so different compilation units
can end up seeing different versions of the structure. Fix this by
unconditionally defining the address fields.

As a side effect, this eliminates some duplication in the kernel's CTF
type graph.

Reviewed by: rscheff, tuexen
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34242


# 93e28d6e 01-Feb-2022 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: LRO code to deal with all 12 TCP header flags

TCP per RFC793 has 4 reserved flag bits for future use. One
of those bits may be used for Accurate ECN.
This patch is to include these bits in the LRO code to ease
the extensibility if/when these bits are used.

Reviewed By: hselasky, rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34127


# 3284f492 01-Jan-2022 Ryan Stone <rstone@FreeBSD.org>

LRO: Don't merge ACK and non-ACK packets together

LRO was willing to merge ACK and non-ACK packets together. This
can cause incorrect th_ack values to be reported up the stack.
While non-ACKs are quite unlikely to appear in practice, LRO's
behaviour is against the spec. Make LRO unwilling to merge
packets with different TH_ACK flag values in order to fix the
issue.

Found by: Sysunit test
Differential Revision: https://reviews.freebsd.org/D33775
Reviewed by: rrs


# 24fe6643 30-Dec-2021 Ryan Stone <rstone@FreeBSD.org>

LRO: Fix lost packets when merging 1 payload with an ACK

To check if it needed to regenerate a packet's header before
sending it up the stack, LRO was checking if more than one payload
had been merged into the packet. This failed in the case where
a single payload was merged with one or more pure ACKs. This
results in lost ACKs.

Fix this by precisely tracking whether header regeneration is
required instead of using an incorrect heuristic.

Found with: Sysunit test
Differential Revision: https://reviews.freebsd.org/D33774
Reviewed by: rrs


# bb5cd80e 02-Aug-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

Update the TCP LRO code to handle both encrypted and un-encrypted traffic.

Encrypted and un-encrypted traffic needs to be coalesced separately.
Split the 16-bit lro_type field in the address information into two
8-bit fields, and then use the last 8-bit field for flags, which among
other indicate if the received mbuf is encrypted or un-encrypted.

Differential Revision: https://reviews.freebsd.org/D31377
Reviewed by: gallatin
MFC after: 1 week
Sponsored by: NVIDIA Networking


# d7955cc0 06-Jul-2021 Randall Stewart <rrs@FreeBSD.org>

tcp: HPTS performance enhancements

HPTS drives both rack and bbr, and yet there have been many complaints
about performance. This bit of work restructures hpts to help reduce CPU
overhead. It does this by now instead of relying on the timer/callout to
drive it instead use user return from a system call as well as lro flushes
to drive hpts. The timer becomes a backstop that dynamically adjusts
based on how "late" we are.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31083


# b45daaea 09-Jun-2021 Randall Stewart <rrs@FreeBSD.org>

tcp: LRO timestamps have lost their previous precision

Recently we had a rewrite to tcp_lro.c that was tested but one subtle change
was the move to a less precise timestamp. This causes all kinds of chaos
in tcp's that do pacing and needs to be fixed to use the more precise
time that was there before.

Reviewed by: mtuexen, gallatin, hselasky
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30695


# 9ca874cf 30-Mar-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

Add TCP LRO support for VLAN and VxLAN.

This change makes the TCP LRO code more generic and flexible with regards
to supporting multiple different TCP encapsulation protocols and in general
lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
to merge TCP packets for the same flow, to reduce the number of calls to upper
layers. This reduces CPU and increases performance, due to being able to send
larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
possible to avoid per-packet interaction by the host CPU.

Because the current TCP LRO code was tightly bound and optimized for TCP/IP
over ethernet only, several larger changes were needed. Also a minor bug was
fixed in the flushing mechanism for inactive entries, where the expire time,
"le->mtime" was not always properly set.

To avoid having to re-run time consuming regression tests for every change,
it was chosen to squash the following list of changes into a single commit:
- Refactor parsing of all address information into the "lro_parser" structure.
This easily allows to reuse parsing code for inner headers.
- Speedup header data comparison. Don't compare field by field, but
instead use an unsigned long array, where the fields get packed.
- Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed
recursivly, only applying deltas as the result of updating payload data.
- Make smaller inline functions doing one operation at a time instead of
big functions having repeated code.
- Refactor the TCP ACK compression code to only execute once
per TCP LRO flush. This gives a minor performance improvement and
keeps the code simple.
- Use sbintime() for all time-keeping. This change also fixes flushing
of inactive entries.
- Try to shrink the size of the LRO entry, because it is frequently zeroed.
- Removed unused TCP LRO macros.
- Cleanup unused TCP LRO statistics counters while at it.
- Try to use __predict_true() and predict_false() to optimise CPU branch
predictions.

Bump the __FreeBSD_version due to changing the "lro_ctrl" structure.

Tested by: Netflix
Reviewed by: rrs (transport)
Differential Revision: https://reviews.freebsd.org/D29564
MFC after: 2 week
Sponsored by: Mellanox Technologies // NVIDIA Networking


# 69a34e8d 26-Jan-2021 Randall Stewart <rrs@FreeBSD.org>

Update the LRO processing code so that we can support
a further CPU enhancements for compressed acks. These
are acks that are compressed into an mbuf. The transport
has to be aware of how to process these, and an upcoming
update to rack will do so. You need the rack changes
to actually test and validate these since if the transport
does not support mbuf compression, then the old code paths
stay in place. We do in this commit take out the concept
of logging if you don't have a lock (which was quite
dangerous and was only for some early debugging but has
been left in the code).

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28374


# 481be5de 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by: Netflix Inc.


# e57b2d0e 06-Sep-2019 Randall Stewart <rrs@FreeBSD.org>

This adds the final tweaks to LRO that will now allow me
to add BBR. These changes make it so you can get an
array of timestamps instead of a compressed ack/data segment.
BBR uses this to aid with its delivery estimates. We also
now (via Drew's suggestions) will not go to the expense of
the tcb lookup if no stack registers to want this feature. If
HPTS is not present the feature is not present either and you
just get the compressed behavior.

Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D21127


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 05cde7ef 02-Aug-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

tcp/lro: Implement hash table for LRO entries.

This significantly improves HTTP workload performance and reduces
HTTP workload latency.

Reviewed by: rrs, gallatin, hps
Obtained from: rrs, gallatin
Sponsored by: Netflix (rrs, gallatin) , Microsoft (sephe)
Differential Revision: https://reviews.freebsd.org/D6689


# fc271df3 26-May-2016 Hans Petter Selasky <hselasky@FreeBSD.org>

Use optimised complexity safe sorting routine instead of the kernel's
"qsort()".

The kernel's "qsort()" routine can in worst case spend O(N*N) amount of
comparisons before the input array is sorted. It can also recurse a
significant amount of times using up the kernel's interrupt thread
stack.

The custom sorting routine takes advantage of that the sorting key is
only 64 bits. Based on set and cleared bits in the sorting key it
partitions the array until it is sorted. This process has a recursion
limit of 64 times, due to the number of set and cleared bits which can
occur. Compiled with -O2 the sorting routine was measured to use
64-bytes of stack. Multiplying this by 64 gives a maximum stack
consumption of 4096 bytes for AMD64. The same applies to the execution
time, that the array to be sorted will not be traversed more than 64
times.

When serving roughly 80Gb/s with 80K TCP connections, the old method
consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4%
CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting
as measured by Intel Vtune. The testing was done using a sysctl to
toggle between "qsort()" and "tcp_lro_sort()".

Differential Revision: https://reviews.freebsd.org/D6472
Sponsored by: Mellanox Technologies
Tested by: Netflix
Reviewed by: gallatin, rrs, sephe, transport


# 1ea44822 01-Apr-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

tcp/lro: Change SLIST to LIST, so that removing an entry is O(1)

This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.

Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5765


# 489f0c3c 24-Mar-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

tcp/lro: Return TCP_LRO_NO_ENTRIES if we are short of LRO entries.

So that callers could react accordingly.

Reviewed by: gallatin (no objection)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5695


# 7ae3d4bf 17-Feb-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

tcp/lro: Allow drivers to set the TCP ACK/data segment aggregation limit

ACK aggregation limit is append count based, while the TCP data segment
aggregation limit is length based. Unless the network driver sets these
two limits, it's an NO-OP.

Reviewed by: adrian, gallatin (previous version), hselasky (previous version)
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5185


# e936121d 19-Jan-2016 Hans Petter Selasky <hselasky@FreeBSD.org>

Add optimizing LRO wrapper:

- Add optimizing LRO wrapper which pre-sorts all incoming packets
according to the hash type and flowid. This prevents exhaustion of
the LRO entries due to too many connections at the same time.
Testing using a larger number of higher bandwidth TCP connections
showed that the incoming ACK packet aggregation rate increased from
~1.3:1 to almost 3:1. Another test showed that for a number of TCP
connections greater than 16 per hardware receive ring, where 8 TCP
connections was the LRO active entry limit, there was a significant
improvement in throughput due to being able to fully aggregate more
than 8 TCP stream. For very few very high bandwidth TCP streams, the
optimizing LRO wrapper will add CPU usage instead of reducing CPU
usage. This is expected. Network drivers which want to use the
optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
"tcp_lro_flush()". Further the LRO control structure must be
initialized using "tcp_lro_init_args()" passing a non-zero number
into the "lro_mbufs" argument.

- Make LRO statistics 64-bit. Previously 32-bit integers were used for
statistics which can be prone to wrap-around. Fix this while at it
and update all SYSCTL's which expose LRO statistics.

- Ensure all data is freed when destroying a LRO control structures,
especially leftover LRO entries.

- Reduce number of memory allocations needed when setting up a LRO
control structure by precomputing the total amount of memory needed.

- Add own memory allocation counter for LRO.

- Bump the FreeBSD version to force recompilation of all KLDs due to
change of the LRO control structure size.

Sponsored by: Mellanox Technologies
Reviewed by: gallatin, sbruno, rrs, gnn, transport
Tested by: Netflix
Differential Revision: https://reviews.freebsd.org/D4914


# 7127e6ac 28-Aug-2013 Navdeep Parhar <np@FreeBSD.org>

Merge r254336 from user/np/cxl_tuning.

Add a last-modified timestamp to each LRO entry and provide an interface
to flush all inactive entries. Drivers decide when to flush and what
the inactivity threshold should be.

Network drivers that process an rx queue to completion can enter a
livelock type situation when the rate at which packets are received
reaches equilibrium with the rate at which the rx thread is processing
them. When this happens the final LRO flush (normally when the rx
routine is done) does not occur. Pure ACKs and segments with total
payload < 64K can get stuck in an LRO entry. Symptoms are that TCP
tx-mostly connections' performance falls off a cliff during heavy,
unrelated rx on the interface.

Flushing only inactive LRO entries works better than any of these
alternates that I tried:
- don't LRO pure ACKs
- flush _all_ LRO entries periodically (every 'x' microseconds or every
'y' descriptors)
- stop rx processing in the driver periodically and schedule remaining
work for later.

Reviewed by: andre


# 62b5b6ec 24-May-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4 bz_ipv6_fast:

Significantly update tcp_lro for mostly two things:
1) introduce basic support for IPv6 without extension headers.
2) try hard to also get the incremental checksum updates right,
especially also in the IPv4 case for the IP and TCP header.

Move variables around for better locality, factor things out into
functions, allow checksum updates to be compiled out, ...

Leave a few comments on further things to look at in the future,
though that is not the full list.

Update drivers with appropriate #includes as needed for IPv6 data
type in LRO.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


# 27f190a3 15-May-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Switch to a standard 2 clause BSD license (from bsd-style-copyright).

Approved by: Myricom Inc. (gallatin)
Approved by: Intel Corporation (jfv)


# 79e955ed 07-Jan-2011 John Baldwin <jhb@FreeBSD.org>

Trim extra spaces before tabs.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 6c5087a8 11-Jun-2008 Jack F Vogel <jfv@FreeBSD.org>

Add generic TCP LOR into netinet