History log of /freebsd-current/sys/netinet/tcp_hpts.c
Revision Date Author Comments
# aaaa01c0 05-Apr-2024 Michael Tuexen <tuexen@FreeBSD.org>

tcp hpts: initialize variable

Ensure that tv.tv_sec is zero in all code paths.

Reported by: Coverity Scan
CID: 1527724
Reviewed by: rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44584


# b600644f 01-Apr-2024 Michael Tuexen <tuexen@FreeBSD.org>

tcp hpts: improve consistency

The target_slot argument of max_slots_available() can be NULL.
Therefore, check for this in all places.
Right now, all callers provide non-NULL pointer.

Reported by: Coverity Scan
CID: 1527732
Reviewed by: rrs
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44527


# b7b78c1c 28-Mar-2024 Randall Stewart <rrs@FreeBSD.org>

Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold

HPTS inserts a softclock for system call return that optimizes performance. However when
no HPTS threads need the help (i.e. when they have less than 100 or so connections) then
there should be little work done i.e. check the counter and return instead of running through
all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts
thread that is over the connection threshold.

Reported by: eduardo
Reviewed by: gallatin, glebius, tuexen
Tested by: gallatin
Differential Revision: https://reviews.freebsd.org/D44420


# 638b5ae1 01-Mar-2024 Randall Stewart <rrs@FreeBSD.org>

HTPS has actually three states not two so the macro needs to account for that.

Ok lets fix up the tcp_in_hpts() so that it also says yes if you
are in the race state moving and you are scheduled to be put in.
This also requires changing the MPASS to be the old version non
inline function of tcp_in_hpts().

This change also adds a new inline macro so that a uint64_t timestamp can be
obtained by a transport (aka Rack will use this).

Reviewed by: glebius, tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D44157


# ef0ac0a1 20-Jan-2024 Gordon Bergling <gbe@FreeBSD.org>

tcp_hpts: Fix a typo of a function name in a comment

- s/tcp_ouput/tcp_output/

MFC after: 3 days


# 08c33cd9 26-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

hpts: avoid duplicate call to tcp_output()

Obtained from: rrs


# 48b55a7c 19-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: make the module unloadable

Although the HPTS subsytem wasn't initially designed as a loadable
module, now it is so. Make it possible to also unload it, but for
safety reasons hide that under 'kldunload -f'.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43092


# 175d4d69 19-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: use tcp_pace.cts_last_ran for last ran table

Remove the global cts_last_ran and use already existing unused field of
struct tcp_hptsi, which seems originally planned to hold this table. This
makes it consistent with other malloc-ed tables, like main array of HPTS
entities and CPU groups.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43091


# 3f46be6a 07-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: let tcp_hpts_init() set a random CPU only once

After d2ef52ef3dee the tcp_hpts_init() function can be called multiple
times on a tcpcb if it is switched there and back between two TCP stacks.
First, this makes existing assertion in tcp_hpts_init() incorrect. Second,
it creates possibility to change a randomly set t_hpts_cpu to a different
random value, while a tcpcb is already in the HPTS wheel, triggering other
assertions later in tcp_hptsi().

The best approach here would be to work on the stacks to really clear a
tcpcb out of HPTS wheel in tfb_tcp_fb_fini, draining the IHPTS_MOVING
state. But that's pretty intrusive change, so let's just get back to the
old logic (pre d2ef52ef3dee) where t_hpts_cpu was set to a random value
only once in a CPU lifetime and a newly switched stack inherits t_hpts_cpu
from the previous stack.

Reviewed by: rrs, tuexen
Differential Revision: https://reviews.freebsd.org/D42946
Reported-by: syzbot+fab29fe1ab089c52998d@syzkaller.appspotmail.com
Reported-by: syzbot+ca5f2aa0fda15dcfe6d7@syzkaller.appspotmail.com
Fixes: 2b3a77467dd3d74a7170f279fb25f9736b46ef8a


# 2c6fc36a 04-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

hpts/lro: make tcp_lro_flush_tcphpts() and tcp_run_hpts() pointers

Rename tcp_run_hpts() to tcp_hpts_softlock() to better describe its
function. This makes loadable hpts.ko working correctly with LRO.

Reviewed by: tuexen, rrs
Differential Revision: https://reviews.freebsd.org/D42858


# 6a79e480 27-Nov-2023 Randall Stewart <rrs@FreeBSD.org>

Fix two latent bugs in hpts. One where a static is put on
a local variable, the other an initialization bug where
we should be setting tv.tv_sec to 0.

PR: 275482


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# c3c20de3 25-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: move HPTS/LRO flags out of inpcb to tcpcb

These flags are TCP specific. While here, make also several LRO
internal functions to pass tcpcb pointer instead of inpcb one.

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39698


# c2a69e84 25-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: move HPTS related fields from inpcb to tcpcb

This makes inpcb lighter and allows future cache line optimizations
of tcpcb. The reason why HPTS originally used inpcb is the compressed
TIME-WAIT state (see 0d7445193ab), that used to free a tcpcb, while the
associated connection is still on the HPTS ring.

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39697


# 01216268 21-Apr-2023 Randall Stewart <rrs@FreeBSD.org>

tcp: hpts needs to still call output even after input.

The other stacks it turns out actually expect the output to be called and can become stuck if it is
not. This is because they run there timer code from there and the input routine does not always
assure a timer is running. The real longterm fix here might be to go into the other stacks (rack and bbr)
and make sure that a timer is running after input if you don't do output.. as well as call the timer functions.
This would cut down on calls from hpts. But I think its too dramatic of a change for the immediate time.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39738


# 2ad584c5 17-Apr-2023 Randall Stewart <rrs@FreeBSD.org>

tcp: Inconsistent use of hpts_calling flag

Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One
such inconsistency results in a bug when we can't allocate enough sendmap entries to entertain a call to
rack_output().. basically a timer won't get started like it should. Also in cleaning this up I find that the
"no_output" side of input needs to be adjusted to make sure we don't try to re-pace too quickly outside
the hpts assurance of 250useconds.

Another thing here is we end up with duplicate calls to tcp_output() which we should not. If packets go
from hpts for processing the input side of tcp will call the output side of tcp on the last packet if it is needed.
This means that when that occurs a second call to tcp_output would be made that is not needed and if pacing
is going on may be harmful.

Lets fix all this and explicitly state the contract that hpts is making with transports that care about the
flag.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39653


# a540cdca 17-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: use queue(9) STAILQ for the input queue

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39574


# 35bc0bcc 07-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: reduce argument list to functions that pass a segment

The socket argument is superfluous, as a tcpcb always has one and
only one socket.

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39434


# 2ff8187e 04-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: remove dead code tcp_drop_in_pkts()

Should have gone in f971e791391.


# 69c7c811 16-Mar-2023 Randall Stewart <rrs@FreeBSD.org>

Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.

The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.

Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D38831


# d68f1542 11-Jan-2023 Gordon Bergling <gbe@FreeBSD.org>

tcp_hpts: Fix a typo in a source code comment

- s/subract/subtract/

MFC after: 3 days


# eaabc937 14-Dec-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: retire TCPDEBUG

This subsystem is superseded by modern debugging facilities,
e.g. DTrace probes and TCP black box logging.

We intentionally leave SO_DEBUG in place, as many utilities may
set it on a socket. Also the tcp::debug DTrace probes look at
this flag on a socket.

Reviewed by: gnn, tuexen
Discussed with: rscheff, rrs, jtl
Differential revision: https://reviews.freebsd.org/D37694


# e68b3792 07-Dec-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: embed inpcb into tcpcb

For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself. With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa. Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that. However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort. The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone. Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision: https://reviews.freebsd.org/D37127


# 9eb0e832 08-Nov-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: provide macros to access inpcb and socket from a tcpcb

There should be no functional changes with this commit.

Reviewed by: rscheff
Differential revision: https://reviews.freebsd.org/D37123


# 53af6903 06-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: remove INP_TIMEWAIT flag

Mechanically cleanup INP_TIMEWAIT from the kernel sources. After
0d7445193ab, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions. Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision: https://reviews.freebsd.org/D36400


# d07a5018 03-Sep-2022 Gordon Bergling <gbe@FreeBSD.org>

tcp_hpts: Correct some typos in source code comments

- s/occured/occurred/
- s/the the/the/

MFC after: 3 days


# b33bfe6e 15-Aug-2022 Dimitry Andric <dim@FreeBSD.org>

Fix unused variable warnings in tcp_hpts.c

With clang 15, the following -Werror warning is produced:

sys/netinet/tcp_hpts.c:1114:10: error: variable 'paced_cnt' set but not used [-Werror,-Wunused-but-set-variable]
int32_t paced_cnt = 0;
^
sys/netinet/tcp_hpts.c:1112:11: error: variable 'total_slots_processed' set but not used [-Werror,-Wunused-but-set-variable]
uint64_t total_slots_processed = 0;
^

The 'paced_cnt' variable was in tcp_hpts.c when it was first added, and
the 'total_slots_processed' variable was added in d7955cc0ffdf9, but
both appear to have been debugging aids that have never been used, so
remove them.

MFC after: 3 days


# db6b3286 15-Aug-2022 Dimitry Andric <dim@FreeBSD.org>

Adjust function definition in tcp_hpts.c to avoid clang 15 warning

With clang 15, the following -Werror warning is produced:

sys/netinet/tcp_hpts.c:1594:23: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
tcp_choose_hpts_to_run()
^
void

This is because tcp_choose_hpts_to_run() is declared with a (void)
argument list, but defined with an empty argument list. Make the
definition match the declaration.

MFC after: 3 days


# 6e6439b2 14-Apr-2022 Randall Stewart <rrs@FreeBSD.org>

tcp - hpts timing is off when we are above 1200 connections.

HPTS timing begins to go off when we reach the threshold of connections (1200 by default)
where we have any returning syscall or LRO stop finding the oldest hpts thread that
has not run but instead using the CPU it is on. This ends up causing quite a lot of times
where hpts threads may not run for extended periods of time. On top of all that which
causes heartburn if you are pacing in tcp, you also have the fact that where AMD's
podded L3 cache may have sets of 8 CPU's that share a L3, hpts is unaware of this
and thus on amd you can generate a lot of cache misses.

So to fix this we will get rid of the CPU mode, and always use oldest. But also make
HPTS aware of the CPU topology and keep the "oldest" to be within the same L3 cache.
This also works nicely for NUMA as well couple with Drew's earlier NUMA changes.

Reviewed by: glebius, gallatin, tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D34916


# 1f2aaef2 09-Apr-2022 Gordon Bergling <gbe@FreeBSD.org>

tcp_htps: Fix a typo in a source code comment

- s/postion/position/

MFC after: 3 days


# 47ded797 07-Feb-2022 Franco Fichtner <franco@opnsense.org>

netinet: simplify RSS ifdef statements

Approved by: transport (rrs)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D31583


# aac52f94 18-Jan-2022 Randall Stewart <rrs@FreeBSD.org>

tcp: Warning cleanup from new compiler.

The clang compiler recently got an update that generates warnings of unused
variables where they were set, and then never used. This revision goes through
the tcp stack and cleans all of those up.

Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision:


# a370832b 26-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: remove delayed drop KPI

No longer needed after tcp_output() can ask caller to drop.

Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D33371


# f64dc2ab 26-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: TCP output method can request tcp_drop

The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection
when they do output on it. The default stack never does this, thus
existing framework expects tcp_output() always to return locked and
valid tcpcb.

Provide KPI extension to satisfy demands of advanced stacks. If the
output method returns negative error code, it means that caller must
call tcp_drop().

In tcp_var() provide three inline methods to call tcp_output():
- tcp_output() is a drop-in replacement for the default stack, so that
default stack can continue using it internally without modifications.
For advanced stacks it would perform tcp_drop() and unlock and report
that with negative error code.
- tcp_output_unlock() handles the negative code and always converts
it to positive and always unlocks.
- tcp_output_nodrop() just calls the method and leaves the responsibility
to drop on the caller.

Sweep over the advanced stacks and use new KPI instead of using HPTS
delayed drop queue for that.

Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D33370


# 40fa3e40 26-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: mechanically substitute call to tfb_tcp_output to new method.

Made with sed(1) execution:

sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)

sed:
s/tp->t_fb->tfb_tcp_output\(tp\)/tcp_output(tp)/
s/to tfb_tcp_output\(\)/to tcp_output()/

Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D33366


# db0ac6de 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.


# 2e27230f 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: rewrite inpcb synchronization

Just trust the pcb database, that if we did in_pcbref(), no way
an inpcb can go away. And if we never put a dropped inpcb on
our queue, and tcp_discardcb() always removes an inpcb to be
dropped from the queue, then any inpcb on the queue is valid.

Now, to solve LOR between inpcb lock and HPTS queue lock do the
following trick. When we are about to process a certain time
slot, take the full queue of the head list into on stack list,
drop the HPTS lock and work on our queue. This of course opens
a race when an inpcb is being removed from the on stack queue,
which was already mentioned in comments. To address this race
introduce generation count into queues. If we want to remove
an inpcb with generation count mismatch, we can't do that, we
can only mark it with desired new time slot or -1 for remove.

Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33026


# f971e791 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: rename input queue to drop queue and trim dead code

The HPTS input queue is in reality used only for "delayed drops".
When a TCP stack decides to drop a connection on the output path
it can't do that due to locking protocol between main tcp_output()
and stacks. So, rack/bbr utilize HPTS to drop the connection in
a different context.

In the past the queue could also process input packets in context
of HPTS thread, but now no stack uses this, so remove this
functionality.

Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33025


# b0a7c008 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: make struct tcp_hpts_entry private to the module.

Also, make some of the functions also private to the module. Remove
unused functions discovered after that.

Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33024


# 50f081ec 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: provide tcp_in_hpts().

It will hide some internal HPTS knowledge from the consumers.

Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33023


# de2d4784 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

SMR protection for inpcbs

With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc). However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree(). However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it. Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
Once the desired inpcb is found we need to transition from SMR
section to r/w lock on the inpcb itself, with a check that inpcb
isn't yet freed. This requires some compexity, since SMR section
itself is a critical(9) section. The complexity is hidden from
KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
notification) also a new KPI is provided, that hides internals of
the database - inp_next(struct inp_iterator *).

Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33022


# 7312e4e5 08-Jul-2021 Randall Stewart <rrs@FreeBSD.org>

tcp: Fix 32 bit platform breakage

This fixes the incorrect use of a sysctl add to u64. It
was for a useconds time, but on 32 bit platforms its
not a u64. Instead use the long directive.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31107


# d7955cc0 06-Jul-2021 Randall Stewart <rrs@FreeBSD.org>

tcp: HPTS performance enhancements

HPTS drives both rack and bbr, and yet there have been many complaints
about performance. This bit of work restructures hpts to help reduce CPU
overhead. It does this by now instead of relying on the timer/callout to
drive it instead use user return from a system call as well as lro flushes
to drive hpts. The timer becomes a backstop that dynamically adjusts
based on how "late" we are.

Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D31083


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 4e1a3ff8 03-Mar-2020 Bjoern A. Zeeb <bz@FreeBSD.org>

tcp_hpts: make RSS kernel compile again.

Add proper #includes, and #ifdefs and some style fixes to make RSS
kernels compile again. There are still possible issues with uin16_t
vs. uint_t cpuid which I am not going near.

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D23726


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# df341f59 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

Whitespace, remove from three files trailing white
space (leftover presents from emacs).

Sponsored by: Netflix Inc.


# 43e8b279 07-Nov-2019 Gleb Smirnoff <glebius@FreeBSD.org>

In TCP HPTS enter the epoch in tcp_hpts_thread() and assert it in
the leaf functions.


# 3b0b41e6 10-Jul-2019 Randall Stewart <rrs@FreeBSD.org>

This commit updates rack to what is basically being used at NF as
well as sets in some of the groundwork for committing BBR. The
hpts system is updated as well as some other needed utilities
for the entrance of BBR. This is actually part 1 of 3 more
needed commits which will finally complete with BBRv1 being
added as a new tcp stack.

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D20834


# 4e255d74 10-May-2019 Andrew Gallatin <gallatin@FreeBSD.org>

Bind TCP HPTS (pacer) threads to NUMA domains

Bind the TCP pacer threads to NUMA domains and build per-domain
pacer-thread lookup tables. These tables allow us to use the
inpcb's NUMA domain information to match an inpcb with a pacer
thread on the same domain.

The motivation for this is to keep the TCP connection local to a
NUMA domain as much as possible.

Thanks to jhb for pre-reviewing an earlier version of the patch.

Reviewed by: rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20134


# 52467047 04-Feb-2019 Warner Losh <imp@FreeBSD.org>

Regularize the Netflix copyright

Use recent best practices for Copyright form at the top of
the license:
1. Remove all the All Rights Reserved clauses on our stuff. Where we
piggybacked others, use a separate line to make things clear.
2. Use "Netflix, Inc." everywhere.
3. Use a single line for the copyright for grep friendliness.
4. Use date ranges in all places for our stuff.

Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)


# 384a5c3c 01-Oct-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of
INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic
it is possible, that the code is running in net_epoch_preempt section,
and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case.

PR: 231428
Reviewed by: mmacy, tuexen
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17335


# 6d2b0c01 06-Sep-2018 Bjoern A. Zeeb <bz@FreeBSD.org>

Make tcp_hpts.c compile a LINT kernel with options RSS and PCBGROUPS added by
adding the missing include files and changing a the type of cpuid which
would otherwise cause a false comparison with NETISR_CPUID_NONE.

Reviewed by: rrs
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16891


# 16bbf600 10-Aug-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Remove unneeded ipsec-related includes.

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D16637


# 6573d758 03-Jul-2018 Matt Macy <mmacy@FreeBSD.org>

epoch(9): allow preemptible epochs to compose

- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
there's no longer any benefit to dropping the pcbinfo lock
and trying to do so just adds an error prone branchfest to
these functions
- Remove cases of same function recursion on the epoch as
recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
thread as the tracker field is now stack or heap allocated
as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066


# f923a734 18-Jun-2018 Randall Stewart <rrs@FreeBSD.org>

Move the tp set back to where it was before
we started playing with the VNET sets. This
way we have verified the INP settings before
we go to the trouble of de-referencing it.

Reviewed by: and suggested by lstewart
Sponsored by: Netflix Inc.


# 9e58ff6f 18-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

convert inpcbinfo hash and info rwlocks to epoch + mutex

- Convert inpcbinfo info & hash locks to epoch for read and mutex for write
- Garbage collect code that handled INP_INFO_TRY_RLOCK failures as
INP_INFO_RLOCK which can no longer fail

When running 64 netperfs sending minimal sized packets on a 2x8x2 reduces
unhalted core cycles samples in rwlock rlock/runlock in udp_send from 51% to
3%.

Overall packet throughput rate limited by CPU affinity and NIC driver design
choices.

On the receiver unhalted core cycles samples in in_pcblookup_hash went from
13% to to 1.6%

Tested by LLNW and pho@

Reviewed by: jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15686


# f994ead3 18-Jun-2018 Randall Stewart <rrs@FreeBSD.org>

Move to using the inp->vnet pointer has suggested by lstewart.
This is far better since the hpts system is using the inp
as its basis anyway. Unfortunately his comments came late.

Sponsored by: Netflix Inc.


# 9293873e 14-Jun-2018 Gleb Smirnoff <glebius@FreeBSD.org>

TCPOUTFLAGS no longer exists since r334843.


# c9b4ac75 12-Jun-2018 Randall Stewart <rrs@FreeBSD.org>

This fixes missing VNET sets in the hpts system. Basically
without this and running vnets with a TCP stack that uses
some of the features is a recipe for panic (without this commit).

Reported by: Larry Rosenman
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15757


# cff21e48 11-Jun-2018 Jonathan T. Looney <jtl@FreeBSD.org>

Change RACK dependency on TCPHPTS from a build-time dependency to a load-
time dependency.

At present, RACK requires the TCPHPTS option to run. However, because
modules can be moved from machine to machine, this dependency is really
best assessed at load time rather than at build time.

Reviewed by: rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15756


# afbd6cfa 08-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

hpts: remove redundant decl breaking gcc build


# 603bbd06 09-May-2018 Warner Losh <imp@FreeBSD.org>

Minor style nits

Use full copyright year.
Remove 'All Rights Reserved' from new file (rights holder OK'd)
Minor #ifdef motion and #endif tagging
Remove __FBSDID macro from comments

Sponsored by: Netflix
OK'd by: rrs@


# 3ee9c3c4 19-Apr-2018 Randall Stewart <rrs@FreeBSD.org>

This commit brings in the TCP high precision timer system (tcp_hpts).
It is the forerunner/foundational work of bringing in both Rack and BBR
which use hpts for pacing out packets. The feature is optional and requires
the TCPHPTS option to be enabled before the feature will be active. TCP
modules that use it must assure that the base component is compile in
the kernel in which they are loaded.

MFC after: Never
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15020