Cross Reference: /freebsd-current/sys/netinet/tcp

History log of /freebsd-current/sys/netinet/tcp_timer.c
Revision	Date	Author	Comments
# fce03f85	05-May-2024	Randall Stewart <rrs@FreeBSD.org>	TCP can be subject to Sack Attacks lets fix this issue. There is a type of attack that a TCP peer can launch on a connection. This is for sure in Rack or BBR and probably even the default stack if it uses lists in sack processing. The idea of the attack is that the attacker is driving you to look at 100's of sack blocks that only update 1 byte. So for example if you have 1 - 10,000 bytes outstanding the attacker sends in something like: ACK 0 SACK(1-512) SACK(1024 - 1536), SACK(2048-2536), SACK(4096 - 4608), SACK(8192-8704) This first sack looks fine but then the attacker sends ACK 0 SACK(1-512) SACK(1025 - 1537), SACK(2049-2537), SACK(4097 - 4609), SACK(8193-8705) ACK 0 SACK(1-512) SACK(1027 - 1539), SACK(2051-2539), SACK(4099 - 4611), SACK(8195-8707) ... These blocks are making you hunt across your linked list and split things up so that you have an entry for every other byte. Has your list grows you spend more and more CPU running through the lists. The idea here is the attacker chooses entries as far apart as possible that make you run through the list. This example is small but in theory if the window is open to say 1Meg you could end up with 100's of thousands link list entries. To combat this we introduce three things. when the peer requests a very small MSS we stop processing SACK's from them. This prevents a malicious peer from just using a small MSS to do the same thing. Any time we get a sack block, we use the sack-filter to remove sacks that are smaller than the smallest v4 mss (minus 40 for max TCP options) unless it ties up to snd_max (since that is legal). All other sacks in theory should be at least an MSS. If we get such an attacker that means we basically start skipping all but MSS sized Sacked blocks. The sack filter used to throw away data when its bounds were exceeded, instead now we increase its size to 15 and then throw away sack's if the filter gets over-run to prevent the malicious attacker from over-running the sack filter and thus we start to process things anyway. The default stack will need to start using the sack-filter which we have talked about in past conference calls to take full advantage of the protections offered by it (and reduce cpu consumption when processing sacks). After this set of changes is in rack can drop its SAD detection completely Reviewed by:tuexen@, rscheff@ Differential Revision: <https://reviews.freebsd.org/D44903>
# e34ea019	18-Mar-2024	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: clear all TCP timers in tcp_timer_stop() when in callout When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling tcp_close(), we must also clear all other possible timers. Otherwise, upon return, the callout would be scheduled again in tcp_timer_enter(). Revert 57e27ff07aff, which was a temporary partial revert of otherwise correct 62d47d73b7eb, that exposed the problem being fixed now. Add an extra assertion in tcp_timer_enter() to check we aren't arming callout for a closed connection. Reviewed by: rscheff
# 62d47d73	10-Feb-2024	Richard Scheffenegger <rscheff@FreeBSD.org>	tcp: stop timers and clean scoreboard in tcp_close() Stop timers when in tcp_close() instead of doing that in tcp_discardcb(). A connection in CLOSED state shall not need any timers. Assert that no timer is rescheduled after that in tcp_timer_activate() and verfiy that this is also the expected state in tcp_discardcb(). PR: 276761 Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43792
# e21c6687	23-Jan-2024	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: pass positive errno to tcp_drop() Fixes: 446ccdd08e2a9f704f6348cd7f679e59183b99b3
# 30409ecd	06-Jan-2024	Richard Scheffenegger <rscheff@FreeBSD.org>	tcp: do not purge SACK scoreboard on first RTO Keeping the SACK scoreboard intact after the first RTO and retransmitting all data anew only on subsequent RTOs allows a more timely and efficient loss recovery under many adverse cirumstances. Reviewed By: tuexen, #transport MFC after: 10 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D42906
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 43b117f8	06-Jun-2023	Richard Scheffenegger <rscheff@FreeBSD.org>	tcp: make the maximum number of retransmissions tunable per VNET Both Windows (TcpMaxDataRetransmissions) and Linux (tcp_retries2) allow to restrict the maximum number of consecutive timer based retransmissions. Add that same capability on a per-VNet basis to FreeBSD. Reviewed By: cc, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D40424
# 69c7c811	16-Mar-2023	Randall Stewart <rrs@FreeBSD.org>	Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities. The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints we need to move to using the new inline functions. This adds them and moves rack to now use the tcp_tracepoints. Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D38831
# 76578d60	21-Feb-2023	Michael Tuexen <tuexen@FreeBSD.org>	bblog: improve timeout event handling Extend the BBLog RTO event to deal with all timers of the base stack. Also provide information about starting, stopping, and running off. The expiration of the retransmission timer is reported as it was done before. Reviewed by: rscheff@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38710
# eaabc937	14-Dec-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: retire TCPDEBUG This subsystem is superseded by modern debugging facilities, e.g. DTrace probes and TCP black box logging. We intentionally leave SO_DEBUG in place, as many utilities may set it on a socket. Also the tcp::debug DTrace probes look at this flag on a socket. Reviewed by: gnn, tuexen Discussed with: rscheff, rrs, jtl Differential revision: https://reviews.freebsd.org/D37694
# 446ccdd0	07-Dec-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: use single locked callout per tcpcb for the TCP timers Use only one callout structure per tcpcb that is responsible for handling all five TCP timeouts. Use locked version of callout, of course. The callout function tcp_timer_enter() chooses soonest timer and executes it with lock held. Unless the timer reports that the tcpcb has been freed, the callout is rescheduled for next soonest timer, if there is any. With single callout per tcpcb on connection teardown we should be able to fully stop the callout and immediately free it, avoiding use of callout_async_drain(). There is one gotcha here: callout_stop() can actually touch our memory when a rare race condition happens. See comment above tcp_timer_stop(). Synchronous stop of the callout makes tcp_discardcb() the single entry point for tcpcb destructor, merging the tcp_freecb() to the end of the function. While here, also remove lots of lingering checks in the beginning of TCP timer functions. With a locked callout they are unnecessary. While here, clean unused parts of timer KPI for the pluggable TCP stacks. While here, remove TCPDEBUG from tcp_timer.c, as this allows for more simplification of TCP timers. The TCPDEBUG is scheduled for removal. Move the DTrace probes in timers to the beginning of a function, where a tcpcb is always existing. Discussed with: rrs, tuexen, rscheff (the TCP part of the diff) Reviewed by: hselasky, kib, mav (the callout part) Differential revision: https://reviews.freebsd.org/D37321
# 918fa422	07-Dec-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: remove tcp_timer_suspend() It was a temporary code added together with RACK to fight against TCP timer races.
# e68b3792	07-Dec-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: embed inpcb into tcpcb For the TCP protocol inpcb storage specify allocation size that would provide space to most of the data a TCP connection needs, embedding into struct tcpcb several structures, that previously were allocated separately. The most import one is the inpcb itself. With embedding we can provide strong guarantee that with a valid TCP inpcb the tcpcb is always valid and vice versa. Also we reduce number of allocs/frees per connection. The embedded inpcb is placed in the beginning of the struct tcpcb, since in_pcballoc() requires that. However, later we may want to move it around for cache line efficiency, and this can be done with a little effort. The new intotcpcb() macro is ready for such move. The congestion algorithm data, the TCP timers and osd(9) data are also embedded into tcpcb, and temprorary struct tcpcb_mem goes away. There was no extra allocation here, but we went through extra pointer every time we accessed this data. One interesting side effect is that now TCP data is allocated from SMR-protected zone. Potentially this allows the TCP stacks or other TCP related modules to utilize that for their own synchronization. Large part of the change was done with sed script: s/tp->ccv->/tp->t_ccv./g s/tp->ccv/\&tp->t_ccv/g s/tp->cc_algo/tp->t_cc/g s/tp->t_timers->tt_/tp->tt_/g s/CCV$ccv, osd$/\&CCV(ccv, t_osd)/g Dependency side effect is that code that needs to know struct tcpcb should also know struct inpcb, that added several <netinet/in_pcb.h>. Differential revision: https://reviews.freebsd.org/D37127
# b40ae8c9	08-Nov-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: fix build without INVARIANTS and VIMAGE Lines from upcoming changes crept in and broke certain builds. Fixes: 9eb0e8326d0fe73ae947959c1df327238d3b2d53
# 8840ae22	08-Nov-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: don't store VNET in every tcpcb, take it from the inpcbinfo Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37125
# 9eb0e832	08-Nov-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: provide macros to access inpcb and socket from a tcpcb There should be no functional changes with this commit. Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37123
# f71cb9f7	08-Nov-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: inp_socket is valid through the lifetime of a TCP inpcb The inp_socket is cleared only in in_pcbdetach(), which for TCP is always accompanied with inp_pcbfree(). An inpcb that went through in_pcbfree() shall never be returned by any kind of pcb lookup. Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D37062
# 53af6903	06-Oct-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: remove INP_TIMEWAIT flag Mechanically cleanup INP_TIMEWAIT from the kernel sources. After 0d7445193ab, this commit shall not cause any functional changes. Note: this flag was very often checked together with INP_DROPPED. If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we will be able to remove most of this checks and turn them to assertions. Some of them can be turned into assertions right now, but that should be carefully done on a case by case basis. Differential revision: https://reviews.freebsd.org/D36400
# 0d744519	06-Oct-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: remove tcptw, the compressed timewait state structure The memory savings the tcptw brought back in 2003 (see 340c35de6a2) no longer justify the complexity required to maintain it. For longer explanation please check out the email [1]. Surpisingly through almost 20 years the TCP stack functionality of handling the TIME_WAIT state with a normal tcpcb did not bitrot. The existing tcp_input() properly handles a tcpcb in TCPS_TIME_WAIT state, which is confirmed by the packetdrill tcp-testsuite [2]. This change just removes tcptw and leaves INP_TIMEWAIT. The flag will be removed in a separate commit. This makes it easier to review and possibly debug the changes. [1] https://lists.freebsd.org/archives/freebsd-net/2022-January/001206.html [2] https://github.com/freebsd-net/tcp-testsuite Differential revision: https://reviews.freebsd.org/D36398
# 77198a94	03-Oct-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp_timers: provide tcp_timer_drop() and tcp_timer_close() Two functions to call tcp_drop() and tcp_close() from a callout context. Garbage collect tcp_inpinfo_lock_del(), it has a single use now. Differential revision: https://reviews.freebsd.org/D36397
# 08af8aac	27-Sep-2022	Randall Stewart <rrs@FreeBSD.org>	Tcp progress timeout Rack has had the ability to timeout connections that just sit idle automatically. This feature of course is off by default and requires the user set it on (though the socket option has been missing in tcp_usrreq.c). Lets get the progress timeout fully supported in the base stack as well as rack. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36716
# d1b07f36	26-Sep-2022	Randall Stewart <rrs@FreeBSD.org>	TCP complete end status work. The ending of a connection can tell us a lot about what happened i.e. did it fail to setup, did it timeout, was it a normal close. Often times this is useful information to help analyze and debug issues. Rack has had end status for some time but the base stack as not. Lets go a ahead and add in the missing bits to populate the end status. Reviewed by: tuexen, rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36712
# d9f6ac88	17-Aug-2022	Gleb Smirnoff <glebius@FreeBSD.org>	protosw: retire PRU_ flags and their char names For many years only TCP debugging used them, but relatively recently TCP DTrace probes also start to use them. Move their declarations into tcp_debug.h, but start including tcp_debug.h unconditionally, so that compilation with DTrace and without TCPDEBUG is possible.
# 6c452841	17-Aug-2022	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: use callout(9) directly instead of pr_slowtimo Modern TCP stacks uses multiple callouts per tcpcb, and a global callout is ancient artifact. However it is still used to garbage collect compressed timewait entries. Reviewed by: melifaro, tuexen Differential revision: https://reviews.freebsd.org/D36159
# 47ded797	07-Feb-2022	Franco Fichtner <franco@opnsense.org>	netinet: simplify RSS ifdef statements Approved by: transport (rrs) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D31583
# f64dc2ab	26-Dec-2021	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: TCP output method can request tcp_drop The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing framework expects tcp_output() always to return locked and valid tcpcb. Provide KPI extension to satisfy demands of advanced stacks. If the output method returns negative error code, it means that caller must call tcp_drop(). In tcp_var() provide three inline methods to call tcp_output(): - tcp_output() is a drop-in replacement for the default stack, so that default stack can continue using it internally without modifications. For advanced stacks it would perform tcp_drop() and unlock and report that with negative error code. - tcp_output_unlock() handles the negative code and always converts it to positive and always unlocks. - tcp_output_nodrop() just calls the method and leaves the responsibility to drop on the caller. Sweep over the advanced stacks and use new KPI instead of using HPTS delayed drop queue for that. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33370
# 40fa3e40	26-Dec-2021	Gleb Smirnoff <glebius@FreeBSD.org>	tcp: mechanically substitute call to tfb_tcp_output to new method. Made with sed(1) execution: sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/) sed: s/tp->t_fb->tfb_tcp_output$tp$/tcp_output(tp)/ s/to tfb_tcp_output/to tcp_output()/ Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33366
# c2c8e360	04-Dec-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	tcp: virtualise net.inet.tcp.msl sysctl. VNET teardown waits 2*MSL (60 seconds by default) before expiring tcp PCBs. These PCBs holds references to nexthops, which, in turn, reference ifnets. This chain results in VNET interfaces being destroyed and moved to default VNET only after 60 seconds. Allow tcp_msl to be set in jail by virtualising net.inet.tcp.msl sysctl, permitting more predictable VNET tests outcomes. MFC after: 1 week Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D33270
# ff945008	18-Nov-2021	Gleb Smirnoff <glebius@FreeBSD.org>	Add tcp_freecb() - single place to free tcpcb. Until this change there were two places where we would free tcpcb - tcp_discardcb() in case if all timers are drained and tcp_timer_discard() otherwise. They were pretty much copy-n-paste, except that in the default case we would run tcp_hc_update(). Merge this into single function tcp_freecb() and move new short version of tcp_timer_discard() to tcp_timer.c and make it static. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32965
# 9a06a824	09-Nov-2021	Gleb Smirnoff <glebius@FreeBSD.org>	tcp_timers: check for (INP_TIMEWAIT \| INP_DROPPED) only once All timers keep inpcb locked through their execution. We need to check these flags only once. Checking for INP_TIMEWAIT earlier is is also safer, since such inpcbs point into tcptw rather than tcpcb, and any dereferences of inp_ppcb as tcpcb are erroneous. Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32967
# b89af8e1	14-Apr-2020	Michael Tuexen <tuexen@FreeBSD.org>	Improve the TCP blackhole detection. The principle is to reduce the MSS in two steps and try each candidate two times. However, if two candidates are the same (which is the case in TCP/IPv6), this candidate was tested four times. This patch ensures that each candidate actually reduced the MSS and is only tested 2 times. This reduces the time window of missclassifying a temporary outage as an MTU issue. Reviewed by: jtl MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24308
# 413c3db1	31-Mar-2020	Michael Tuexen <tuexen@FreeBSD.org>	Allow the TCP backhole detection to be disabled at all, enabled only for IPv4, enabled only for IPv6, and enabled for IPv4 and IPv6. The current blackhole detection might classify a temporary outage as an MTU issue and reduces permanently the MSS. Since the consequences of such a reduction due to a misclassification are much more drastically for IPv4 than for IPv6, allow the administrator to enable it for IPv6 only. Reviewed by: bcr@ (man page), Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D24219
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# 481be5de	12-Feb-2020	Randall Stewart <rrs@FreeBSD.org>	White space cleanup -- remove trailing tab's or spaces from any line. Sponsored by: Netflix Inc.
# 109eb549	21-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	Make tcp_output() require network epoch. Enter the epoch before calling into tcp_output() from those functions, that didn't do that before. This eliminates a bunch of epoch recursions in TCP.
# b9555453	21-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	Make ip6_output() and ip_output() require network epoch. All callers that before may called into these functions without network epoch now must enter it.
# 334fc582	08-Jan-2020	Bjoern A. Zeeb <bz@FreeBSD.org>	vnet: virtualise more network stack sysctls. Virtualise tcp_always_keepalive, TCP and UDP log_in_vain. All three are set in the netoptions startup script, which we would love to run for VNETs as well [1]. While virtualising the log_in_vain sysctls seems pointles at first for as long as the kernel message buffer is not virtualised, it at least allows an administrator to debug the base system or an individual jail if needed without turning the logging on for all jails running on a system. PR: 243193 [1] MFC after: 2 weeks
# 5773ac11	10-Dec-2019	John Baldwin <jhb@FreeBSD.org>	Use callout_func_t instead of the deprecated timeout_t. Reviewed by: kib, imp Differential Revision: https://reviews.freebsd.org/D22752
# 58d94bd0	06-Nov-2019	Gleb Smirnoff <glebius@FreeBSD.org>	TCP timers are executed in callout context, so they need to enter network epoch to look into PCB lists. Mechanically convert INP_INFO_RLOCK() to NET_EPOCH_ENTER(). No functional change here.
# 0999766d	23-Mar-2019	Michael Tuexen <tuexen@FreeBSD.org>	Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial used by TCP. Reviewed by: rrs@, 0mp@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D19355
# c6dcb64b	20-Feb-2019	Michael Tuexen <tuexen@FreeBSD.org>	Use exponential backoff for retransmitting SYN segments as specified in the TCP RFCs. Reviewed by: rrs@, Richard Scheffenegger Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18974
# 6573d758	03-Jul-2018	Matt Macy <mmacy@FreeBSD.org>	epoch(9): allow preemptible epochs to compose - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066
# 89e560f4	07-Jun-2018	Randall Stewart <rrs@FreeBSD.org>	This commit brings in a new refactored TCP stack called Rack. Rack includes the following features: - A different SACK processing scheme (the old sack structures are not used). - RACK (Recent acknowledgment) where counting dup-acks is no longer done instead time is used to knwo when to retransmit. (see the I-D) - TLP (Tail Loss Probe) where we will probe for tail-losses to attempt to try not to take a retransmit time-out. (see the I-D) - Burst mitigation using TCPHTPS - PRR (partial rate reduction) see the RFC. Once built into your kernel, you can select this stack by either socket option with the name of the stack is "rack" or by setting the global sysctl so the default is rack. Note that any connection that does not support SACK will be kicked back to the "default" base FreeBSD stack (currently known as "default"). To build this into your kernel you will need to enable in your kernel: makeoptions WITH_EXTRA_TCP_STACKS=1 options TCPHPTS Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15525
# 10d20c84	07-May-2018	Matt Macy <mmacy@FreeBSD.org>	Fix spurious retransmit recovery on low latency networks TCP's smoothed RTT (SRTT) can be much larger than an actual observed RTT. This can be either because of hz restricting the calculable RTT to 10ms in VMs or 1ms using the default 1000hz or simply because SRTT recently incorporated a larger value. If an ACK arrives before the calculated badrxtwin (now + SRTT): tp->t_badrxtwin = ticks + (tp->t_srtt >> (TCP_RTT_SHIFT + 1)); We'll erroneously reset snd_una to snd_max. If multiple segments were dropped and this happens repeatedly the transmit rate will be limited to 1MSS per RTO until we've retransmitted all drops. Reported by: rstone Reviewed by: hiren, transport Approved by: sbruno MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8556
# 4c6a1090	02-May-2018	Michael Tuexen <tuexen@FreeBSD.org>	SImplify the call to tcp_drop(), since the handling of soft error is also done in tcp_drop(). No functional change. Sponsored by: Netflix, Inc.
# 2529f56e	22-Mar-2018	Jonathan T. Looney <jtl@FreeBSD.org>	Add the "TCP Blackbox Recorder" which we discussed at the developer summits at BSDCan and BSDCam in 2017. The TCP Blackbox Recorder allows you to capture events on a TCP connection in a ring buffer. It stores metadata with the event. It optionally stores the TCP header associated with an event (if the event is associated with a packet) and also optionally stores information on the sockets. It supports setting a log ID on a TCP connection and using this to correlate multiple connections that share a common log ID. You can log connections in different modes. If you are doing a coordinated test with a particular connection, you may tell the system to put it in mode 4 (continuous dump). Or, if you just want to monitor for errors, you can put it in mode 1 (ring buffer) and dump all the ring buffers associated with the connection ID when we receive an error signal for that connection ID. You can set a default mode that will be applied to a particular ratio of incoming connections. You can also manually set a mode using a socket option. This commit includes only basic probes. rrs@ has added quite an abundance of probes in his TCP development work. He plans to commit those soon. There are user-space programs which we plan to commit as ports. These read the data from the log device and output pcapng files, and then let you analyze the data (and metadata) in the pcapng files. Reviewed by: gnn (previous version) Obtained from: Netflix, Inc. Relnotes: yes Differential Revision: https://reviews.freebsd.org/D11085
# f1798531	30-Jan-2018	John Baldwin <jhb@FreeBSD.org>	Export tcp_always_keepalive for use by the Chelsio TOM module. This used to work by accident with ld.bfd even though always_keepalive was marked as static. LLD honors static more correctly, so export this variable properly (including moving it into the tcp_* namespace). Reviewed by: bz, emaste MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D14129
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# e29c55e4	06-Oct-2017	Gleb Smirnoff <glebius@FreeBSD.org>	Declare pmtud_blackhole global variables in tcp_timer.h, so that alternative TCP stacks can legally use them.
# 3d5af7a1	28-Aug-2017	Michael Tuexen <tuexen@FreeBSD.org>	Fix blackhole detection. There were two bugs related to the blackhole detection: * The smalles size was tried more than two times. * The restored MSS was not the original one, but the second candidate. MFC after: 1 week Sponsored by: Netflix, Inc.
# 32a04bb8	25-Aug-2017	Sean Bruno <sbruno@FreeBSD.org>	Use counter(9) for PLPMTUD counters. Remove unused PLPMTUD sysctl counters. Bump UPDATING and FreeBSD Version to indicate a rebuild is required. Submitted by: kevin.bowling@kev009.com Reviewed by: jtl Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12003
# cc65eb4e	21-Mar-2017	Gleb Smirnoff <glebius@FreeBSD.org>	Hide struct inpcb, struct tcpcb from the userland. This is a painful change, but it is needed. On the one hand, we avoid modifying them, and this slows down some ideas, on the other hand we still eventually modify them and tools like netstat(1) never work on next version of FreeBSD. We maintain a ton of spares in them, and we already got some ifdef hell at the end of tcpcb. Details: - Hide struct inpcb, struct tcpcb under _KERNEL \|\| _WANT_FOO. - Make struct xinpcb, struct xtcpcb pure API structures, not including kernel structures inpcb and tcpcb inside. Export into these structures the fields from inpcb and tcpcb that are known to be used, and put there a ton of spare space. - Make kernel and userland utilities compilable after these changes. - Bump __FreeBSD_version. Reviewed by: rrs, gnn Differential Revision: D10018
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# 5ede40dc	11-Feb-2017	Ryan Stone <rstone@FreeBSD.org>	Don't zero out srtt after excess retransmits If the TCP stack has retransmitted more than 1/4 of the total number of retransmits before a connection drop, it decides that its current RTT estimate is hopelessly out of date and decides to recalculate it from scratch starting with the next ACK. Unfortunately, it implements this by zeroing out the current RTT estimate. Drop this hack entirely, as it makes it significantly more difficult to debug connection issues. Instead check for excessive retransmits at the point where srtt is updated from an ACK being received. If we've exceeded 1/4 of the maximum retransmits, discard the previous srtt estimate and replace it with the latest rtt measurement. Differential Revision: https://reviews.freebsd.org/D9519 Reviewed by: gnn Sponsored by: Dell EMC Isilon
# 6d172f58	14-Oct-2016	Jonathan T. Looney <jtl@FreeBSD.org>	The code currently resets the keepalive timer each time a packet is received on a TCP session that has entered the ESTABLISHED state. This results in a lot of calls to reset the keepalive timer. This patch changes the behavior so we set the keepalive timer for the keepalive idle time (TP_KEEPIDLE). When the keepalive timer fires, it will first check to see if the session has been idle for TP_KEEPIDLE ticks. If not, it will reschedule the keepalive timer for the time the session will have been idle for TP_KEEPIDLE ticks. For a session with regular communication, the keepalive timer should fire approximately once every TP_KEEPIDLE ticks. For sessions with irregular communication, the keepalive timer might fire more often. But, the disruption from a periodic keepalive timer should be less than the regular cost of resetting the keepalive timer on every packet. (FWIW, this change saved approximately 1.73% of the busy CPU cycles on a particular test system with a heavy TCP output load. Of course, the actual impact is very specific to the particular hardware and workload.) Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8243
# eadd00f8	16-Aug-2016	Randall Stewart <rrs@FreeBSD.org>	A few more wording tweaks as suggested (with some modifications as well) by Ravi Pokala. Thanks for the comments :-) Sponsored by: Netflix Inc.
# 0fa047b9	16-Aug-2016	Randall Stewart <rrs@FreeBSD.org>	Comments describing how to properly use the new lock_add functions and its respective companion. Sponsored by: Netflix Inc.
# b07fef50	15-Aug-2016	Randall Stewart <rrs@FreeBSD.org>	This cleans up the timer code in TCP and also makes it so we do not take the INFO lock unless we are really going to delete the TCB. Differential Revision: D7136
# 5105a92c	17-May-2016	Randall Stewart <rrs@FreeBSD.org>	This small change adopts the excellent suggestion for using named structures in the add of a new tcp-stack that came in late to me via email after the last commit. It also makes it so that a new stack may optionally get a callback during a retransmit timeout. This allows the new stack to clear specific state (think sack scoreboards or other such structures). Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D6303
# a4641f4e	03-May-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/net*: minor spelling fixes. No functional change.
# e5ad6456	28-Apr-2016	Randall Stewart <rrs@FreeBSD.org>	This cleans up the timers code in TCP to start using the new async_drain functionality. This as been tested in NF as well as by Verisign. Still to do in here is to remove all the old flags. They are currently left being maintained but probably are no longer needed. Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D5924
# 84cc0778	24-Mar-2016	George V. Neville-Neil <gnn@FreeBSD.org>	FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306
# 4644fda3	27-Jan-2016	Gleb Smirnoff <glebius@FreeBSD.org>	Rename netinet/tcp_cc.h to netinet/cc/cc.h. Discussed with: lstewart
# 0645c604	26-Jan-2016	Hiren Panchasara <hiren@FreeBSD.org>	Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and 60 seconds, respectively. Turn them into sysctls that can be tuned live. The default values of 5 seconds and 60 seconds have been retained. Submitted by: Jason Wolfe (j at nitrology dot com) Reviewed by: gnn, rrs, hiren, bz MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D5024
# 2de3e790	21-Jan-2016	Gleb Smirnoff <glebius@FreeBSD.org>	- Rename cc.h to more meaningful tcp_cc.h. - Declare it a kernel only include, which it already is. - Don't include tcp.h implicitly from tcp_cc.h
# 0c39d38d	06-Jan-2016	Gleb Smirnoff <glebius@FreeBSD.org>	Historically we have two fields in tcpcb to describe sender MSS: t_maxopd, and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned up after T/TCP removal. After all permutations over the years the result is that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if timestamps are in action, or is equal to t_maxopd otherwise. That's a very rough estimate of MSS reduced by options length. Throughout the code it was used in places, where preciseness was not important, like cwnd or ssthresh calculations. With this change: - t_maxopd goes away. - t_maxseg now stores MSS not adjusted by options. - new function tcp_maxseg() is provided, that calculates MSS reduced by options length. The functions gives a better estimate, since it takes into account SACK state as well. Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D3593
# 281a0fd4	24-Dec-2015	Patrick Kelsey <pkelsey@FreeBSD.org>	Implementation of server-side TCP Fast Open (TFO) [RFC7413]. TFO is disabled by default in the kernel build. See the top comment in sys/netinet/tcp_fastopen.c for implementation particulars. Reviewed by: gnn, jch, stas MFC after: 3 days Sponsored by: Verisign, Inc. Differential Revision: https://reviews.freebsd.org/D4350
# 55bceb1e	15-Dec-2015	Randall Stewart <rrs@FreeBSD.org>	First cut of the modularization of our TCP stack. Still to do is to clean up the timer handling using the async-drain. Other optimizations may be coming to go with this. Whats here will allow differnet tcp implementations (one included). Reviewed by: jtl, hiren, transports Sponsored by: Netflix Inc. Differential Revision: D4055
# 7c4676dd	13-Nov-2015	Randall Stewart <rrs@FreeBSD.org>	This fixes several places where callout_stops return is examined. The new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.
# adf43a92	14-Oct-2015	Hiren Panchasara <hiren@FreeBSD.org>	Fix an unnecessarily aggressive behavior where mtu clamping begins on first retransmission timeout (rto) when blackhole detection is enabled. Make sure it only happens when the second attempt to send the same segment also fails with rto. Also make sure that each mtu probing stage (usually 1448 -> 1188 -> 524) follows the same pattern and gets 2 chances (rto) before further clamping down. Note: RFC4821 doesn't specify implementation details on how this situation should be handled. Differential Revision: https://reviews.freebsd.org/D3434 Reviewed by: sbruno, gnn (previous version) MFC after: 2 weeks Sponsored by: Limelight Networks
# 5d06879a	13-Sep-2015	George V. Neville-Neil <gnn@FreeBSD.org>	dd DTrace probe points, translators and a corresponding script to provide the TCPDEBUG functionality with pure DTrace. Reviewed by: rwatson MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: D3530
# d6de19ac	30-Aug-2015	Julien Charbon <jch@FreeBSD.org>	Put r284245 back in place: If at first this fix was seen as a temporary workaround for a callout(9) issue, it turns out it is instead the right way to use callout in mpsafe mode without using callout_drain(). r284245 commit message: Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763
# bcf9b913	24-Aug-2015	Julien Charbon <jch@FreeBSD.org>	Revert r284245: "Fix a callout race condition introduced in TCP timers callouts with r281599." r281599 fixed a TCP timer race condition, but due a callout(9) bug it also introduced another race condition workaround-ed with r284245. The callout(9) bug being fixed with r286880, we can now revert the workaround (r284245). Differential Revision: https://reviews.freebsd.org/D2079 (Initial change) Differential Revision: https://reviews.freebsd.org/D2763 (Workaround) Differential Revision: https://reviews.freebsd.org/D3078 (Fix) Sponsored by: Verisign, Inc. MFC after: 2 weeks
# 31a7749d	18-Aug-2015	Julien Charbon <jch@FreeBSD.org>	Make clear that TIME_WAIT timeout expiration is managed solely by tcp_tw_2msl_scan(). Sponsored by: Verisign, Inc.
# ff9b006d	02-Aug-2015	Julien Charbon <jch@FreeBSD.org>	Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability: - The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()). - A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters. It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections. PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.
# cad814ee	10-Jun-2015	Julien Charbon <jch@FreeBSD.org>	Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763 Reviewed by: hiren Approved by: hiren MFC after: 1 day Sponsored by: Verisign, Inc.
# 5571f9cf	16-Apr-2015	Julien Charbon <jch@FreeBSD.org>	Fix an old and well-documented use-after-free race condition in TCP timers: - Add a reference from tcpcb to its inpcb - Defer tcpcb deletion until TCP timers have finished Differential Revision: https://reviews.freebsd.org/D2079 Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com> Reviewed by: imp, rrs, adrian, jhb, bz Approved by: jhb Sponsored by: Verisign, Inc.
# 03374917	02-Apr-2015	Julien Charbon <jch@FreeBSD.org>	Provide better debugging information in tcp_timer_activate() and tcp_timer_active() Differential Revision: https://reviews.freebsd.org/D2179 Suggested by: bz Reviewed by: jhb Approved by: jhb
# 18832f1f	31-Mar-2015	Julien Charbon <jch@FreeBSD.org>	Use appropriate timeout_t* instead of void* in tcp_timer_activate() Suggested by: imp Differential Revision: https://reviews.freebsd.org/D2154 Reviewed by: imp, jhb Approved by: jhb
# b2bdc62a	18-Jan-2015	Adrian Chadd <adrian@FreeBSD.org>	Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits)
# cea40c48	30-Oct-2014	Julien Charbon <jch@FreeBSD.org>	Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and tcp_tw_2msl_scan(). This race condition drives unplanned timewait timeout cancellation. Also simplify implementation by holding inpcb reference and removing tcptw reference counting. Differential Revision: https://reviews.freebsd.org/D826 Submitted by: Marc De la Gueronniere <mdelagueronniere@verisign.com> Submitted by: jch Reviewed By: jhb (mentor), adrian, rwatson Sponsored by: Verisign, Inc. MFC after: 2 weeks X-MFC-With: r264321
# f0188618	21-Oct-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies
# 882ac53e	13-Oct-2014	Sean Bruno <sbruno@FreeBSD.org>	Handle small file case with regards to plpmtud blackhole detection. Submitted by: Mikhail <mp@lenta.ru> MFC after: 2 weeks Relnotes: yes
# f6f6703f	07-Oct-2014	Sean Bruno <sbruno@FreeBSD.org>	Implement PLPMTUD blackhole detection (RFC 4821), inspired by code from xnu sources. If we encounter a network where ICMP is blocked the Needs Frag indicator may not propagate back to us. Attempt to downshift the mss once to a preconfigured value. Default this feature to off for now while we do not have a full PLPMTUD implementation in our stack. Adds the following new sysctl's for control: net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature net.inet.tcp.pmtud_blackhole_mss -- mss to try for ipv4 net.inet.tcp.v6pmtud_blackhole_mss -- mss to try for ipv6 Adds the following new sysctl's for monitoring: -- Number of times the code was activated to attempt a mss downshift net.inet.tcp.pmtud_blackhole_activated -- Number of times the blackhole mss was used in an attempt to downshift net.inet.tcp.pmtud_blackhole_min_activated -- Number of times that we failed to connect after we downshifted the mss net.inet.tcp.pmtud_blackhole_failed Phabricator: https://reviews.freebsd.org/D506 Reviewed by: rpaulo bz MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks
# 8f7e75cb	29-Jun-2014	Adrian Chadd <adrian@FreeBSD.org>	If we're doing RSS then ensure the TCP timer selection uses the multi-CPU callwheel setup, rather than just dumping all the timers on swi0.
# 883831c6	18-May-2014	Adrian Chadd <adrian@FreeBSD.org>	When RSS is enabled and per cpu TCP timers are enabled, do an RSS lookup for the inp flowid/flowtype to destination CPU. This only modifies the case where RSS is enabled and the per-cpu tcp timer option is enabled. Otherwise the behaviour should be the same as before.
# 66eefb1e	10-Apr-2014	John Baldwin <jhb@FreeBSD.org>	Currently, the TCP slow timer can starve TCP input processing while it walks the list of connections in TIME_WAIT closing expired connections due to contention on the global TCP pcbinfo lock. To remediate, introduce a new global lock to protect the list of connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when closing an expired connection. This limits the window of time when TCP input processing is stopped to the amount of time needed to close a single connection. Submitted by: Julien Charbon <jcharbon@verisign.com> Reviewed by: rwatson, rrs, adrian MFC after: 2 months
# 5b999a6b	04-Mar-2013	Davide Italiano <davide@FreeBSD.org>	- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil
# 6c0ef895	09-Jan-2013	John Baldwin <jhb@FreeBSD.org>	Don't drop options from the third retransmitted SYN by default. If the SYNs (or SYN/ACK replies) are dropped due to network congestion, then the remote end of the connection may act as if options such as window scaling are enabled but the local end will think they are not. This can result in very slow data transfers in the case of window scaling disagreements. The old behavior can be obtained by setting the net.inet.tcp.rexmit_drop_options sysctl to a non-zero value. Reviewed by: net@ MFC after: 2 weeks
# 825fd1e4	26-Nov-2012	Navdeep Parhar <np@FreeBSD.org>	Make sure that tcp_timer_activate() correctly sees TCP_OFFLOAD (or not).
# 322181c9	28-Oct-2012	Andre Oppermann <andre@FreeBSD.org>	If the user has closed the socket then drop a persisting connection after a much reduced timeout. Typically web servers close their sockets quickly under the assumption that the TCP connections goes away as well. That is not entirely true however. If the peer closed the window we're going to wait for a long time with lots of data in the send buffer. MFC after: 2 weeks
# 77339e1c	28-Oct-2012	Andre Oppermann <andre@FreeBSD.org>	Update comment to reflect the change made in r242263. MFC after: 2 weeks
# c4ab59c1	28-Oct-2012	Andre Oppermann <andre@FreeBSD.org>	Add SACK_PERMIT to the list of TCP options that are switched off after retransmitting a SYN three times. MFC after: 2 weeks
# f4748ef5	28-Oct-2012	Andre Oppermann <andre@FreeBSD.org>	When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE, the default retransmit timeout, as base to calculate the backoff time until next try instead of the TCP_REXMTVAL() macro which only works correctly when we already have measured an actual RTT+RTTVAR. Before it would cause the first retransmit at RTOBASE, the next four at the same time (!) about 200ms later, and then another one again RTOBASE later. MFC after: 2 weeks
# 602e8e45	28-Oct-2012	Andre Oppermann <andre@FreeBSD.org>	Remove bogus 'else' in #ifdef that prevented the rttvar from being reset tcp_timer_rexmt() on retransmit for IPv6 sessions. MFC after: 2 weeks
# cf8f04f4	28-Oct-2012	Andre Oppermann <andre@FreeBSD.org>	When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks
# 655f934b	05-Aug-2012	Mikolaj Golub <trociny@FreeBSD.org>	In tcp timers, check INP_DROPPED flag a little later, after callout_deactivate(), so if INP_DROPPED is set we return with the timer active flag cleared. For me this fixes negative keep timer values reported by `netstat -x' for connections in CLOSE state. Approved by: net (silence) MFC after: 2 weeks
# 09fe6320	19-Jun-2012	Navdeep Parhar <np@FreeBSD.org>	- Updated TOE support in the kernel. - Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features. - iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon. Build-tested with make universe. 30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m \| grep TOE Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp \| grep toe # sockstat -46c \| grep toe Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
# 9077f387	05-Feb-2012	Gleb Smirnoff <glebius@FreeBSD.org>	Add new socket options: TCP_KEEPINIT, TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT, that allow to control initial timeout, idle time, idle re-send interval and idle send count on a per-socket basis. Reviewed by: andre, bz, lstewart
# aa4b09c5	12-Oct-2011	Navdeep Parhar <np@FreeBSD.org>	Make sure the inp wasn't dropped when rexmt let go of the inp and pcbinfo locks. Reviewed by: andre@ MFC after: 7 days
# fa046d87	30-May-2011	Robert Watson <rwatson@FreeBSD.org>	Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.
# 672dc4ae	29-Apr-2011	John Baldwin <jhb@FreeBSD.org>	TCP reuses t_rxtshift to determine the backoff timer used for both the persist state and the retransmit timer. However, the code that implements "bad retransmit recovery" only checks t_rxtshift to see if an ACK has been received in during the first retransmit timeout window. As a result, if ticks has wrapped over to a negative value and a socket is in the persist state, it can incorrectly treat an ACK from the remote peer as a "bad retransmit recovery" and restore saved values such as snd_ssthresh and snd_cwnd. However, if the socket has never had a retransmit timeout, then these saved values will be zero, so snd_ssthresh and snd_cwnd will be set to 0. If the socket is in fast recovery (this can be caused by excessive duplicate ACKs such as those fixed by 220794), then each ACK that arrives triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd to be no larger than snd_ssthresh. In effect, the socket's send window is permamently stuck at 0 even though the remote peer is advertising a much larger window and pending data is only sent via TCP window probes (so one byte every few seconds). Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that the various snd_*_prev fields in the pcb are valid and only perform "bad retransmit recovery" if this flag is set in the pcb. The flag is set on the first retransmit timeout that occurs and is cleared on subsequent retransmit timeouts or when entering the persist state. Reviewed by: bz MFC after: 2 weeks
# 79e955ed	07-Jan-2011	John Baldwin <jhb@FreeBSD.org>	Trim extra spaces before tabs.
# b5224580	21-Dec-2010	John Baldwin <jhb@FreeBSD.org>	Fix a typo in a comment. MFC after: 1 week
# b5af1b88	01-Dec-2010	Lawrence Stewart <lstewart@FreeBSD.org>	Pass NULL instead of 0 for the th pointer value. NULL != 0 on all platforms. Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166
# dbc42409	11-Nov-2010	Lawrence Stewart <lstewart@FreeBSD.org>	This commit marks the first formal contribution of the "Five New TCP Congestion Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details about the project are available at: http://caia.swin.edu.au/freebsd/5cc/ - Add a KPI and supporting infrastructure to allow modular congestion control algorithms to be used in the net stack. Algorithms can maintain per-connection state if required, and connections maintain their own algorithm pointer, which allows different connections to concurrently use different algorithms. The TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to programmatically query or change the congestion control algorithm respectively from within an application at runtime. - Integrate the framework with the TCP stack in as least intrusive a manner as possible. Care was also taken to develop the framework in a way that should allow integration with other congestion aware transport protocols (e.g. SCTP) in the future. The hope is that we will one day be able to share a single set of congestion control algorithm modules between all congestion aware transport protocols. - Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack and use it to decouple the meaning of recovery from a congestion event and recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based congestion control protocols don't generally need to recover from packet loss and need a different way to note a congestion recovery episode within the stack. - Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code and ensures the stack always uses the appropriate mechanisms for recovering from packet loss during a congestion recovery episode. - Extract the NewReno congestion control algorithm from the TCP stack and massage it into module form. NewReno is always built into the kernel and will remain the default algorithm for the forseeable future. Implementations of additional different algorithms will become available in the near future. - Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code that relies on the size of "struct tcpcb" is required. Many thanks go to the Cisco University Research Program Fund at Community Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work at the Centre for Advanced Internet Architectures, Swinburne University of Technology is greatly appreciated. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: Cisco URP, FreeBSD Foundation Reviewed by: rpaulo Tested by: David Hayes (and many others over the years) MFC after: 3 months
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# e5dbe8ec	03-Jun-2010	Robert Watson <rwatson@FreeBSD.org>	Merge r204830 from head to stable/8 Locking the tcbinfo structure should not be necessary in tcp_timer_delack(), so don't. Reviewed by: bz Sponsored by: Juniper Networks Approved by: re (kib)
# 87aedea4	20-Mar-2010	Kip Macy <kmacy@FreeBSD.org>	- spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1 - don't default to acquiring tcbinfo lock exclusively in rexmt MFC after: 7 days
# 1f821c53	07-Mar-2010	Robert Watson <rwatson@FreeBSD.org>	Locking the tcbinfo structure should not be necessary in tcp_timer_delack(), so don't. MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks
# b8614722	15-Sep-2009	Mike Silbersack <silby@FreeBSD.org>	Add the ability to see TCP timers via netstat -x. This can be a useful feature when you have a seemingly stuck socket and want to figure out why it has not been closed yet. No plans to MFC this, as it changes the netstat sysctl ABI. Reviewed by: andre, rwatson, Eric Van Gyzen
# 530c0060	01-Aug-2009	Robert Watson <rwatson@FreeBSD.org>	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
# 5ee847d3	19-Jul-2009	Robert Watson <rwatson@FreeBSD.org>	Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list. Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock. Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader. Update various consumers of these KPIs based on whether they may sleep or not. Reviewed by: bz Approved by: re (kib)
# eddfbb76	14-Jul-2009	Robert Watson <rwatson@FreeBSD.org>	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
# 6b0c5521	16-Jun-2009	John Baldwin <jhb@FreeBSD.org>	Trim extra sets of ()'s. Requested by: bde
# 78b50714	11-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures. MFC after: 3 days
# ad71fe3c	15-Mar-2009	Robert Watson <rwatson@FreeBSD.org>	Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers): INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account. MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz
# 24cb0f22	14-Jan-2009	Lawrence Stewart <lstewart@FreeBSD.org>	Add TCP Appropriate Byte Counting (RFC 3465) support to kernel. The new behaviour is on by default, and can be disabled by setting the net.inet.tcp.rfc3465 sysctl to 0 to obtain previous behaviour. The patch changes struct tcpcb in sys/netinet/tcp_var.h which breaks the ABI. Bump __FreeBSD_version to 800061 accordingly. User space tools that rely on the size of struct tcpcb (e.g. sockstat) need to be recompiled. Reviewed by: rpaulo, gnn Approved by: gnn, kmacy (mentors) Sponsored by: FreeBSD Foundation
# 4b79449e	02-Dec-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 8b615593	02-Oct-2008	Marko Zec <zec@FreeBSD.org>	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
# 603724d3	17-Aug-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
# 41698ebf	20-Jul-2008	Tom Rhodes <trhodes@FreeBSD.org>	Document a few sysctls. Reviewed by: rwatson
# 53640b0e	02-Jun-2008	Robert Watson <rwatson@FreeBSD.org>	When allocating temporary storage to hold a TCP/IP packet header template, use an M_TEMP malloc(9) allocation rather than an mbuf with mtod(9) and dtom(9). This eliminates the last use of dtom(9) in TCP. MFC after: 3 weeks
# 8501a69c	17-Apr-2008	Robert Watson <rwatson@FreeBSD.org>	Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)
# 4b421e2d	07-Oct-2007	Mike Silbersack <silby@FreeBSD.org>	Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports. Approved by: re (kensmith)
# 586b4a0e	24-Sep-2007	Konstantin Belousov <kib@FreeBSD.org>	Revert rev. 1.94. After recent tcp backouts, tcp_close() may return NULL. Check the return value of tcp_close() being NULL before dereferencing it in #ifdef TCPDEBUG block. Reviewed by: rwatson Approved by: re (gnn)
# e2f2059f	23-Sep-2007	Mike Silbersack <silby@FreeBSD.org>	Two changes: - Reintegrate the ANSI C function declaration change from tcp_timer.c rev 1.92 - Reorganize the tcpcb structure so that it has a single pointer to the "tcp_timer" structure which contains all of the tcp timer callouts. This change means that when the single tcp timer change is reintegrated, tcpcb will not change in size, and therefore the ABI between netstat and the kernel will not change. Neither of these changes should have any functional impact. Reviewed by: bmah, rrs Approved by: re (bmah)
# 85d94372	07-Sep-2007	Robert Watson <rwatson@FreeBSD.org>	Back out tcp_timer.c:1.93 and associated changes that reimplemented the many TCP timers as a single timer, but retain the API changes necessary to reintroduce this change. This will back out the source of at least two reported problems: lock leaks in certain timer edge cases, and TCP timers continuing to fire after a connection has closed (a bug previously fixed and then reintroduced with the timer rewrite). In a follow-up commit, some minor restylings and comment changes performed after the TCP timer rewrite will be reapplied, and a further change to allow the TCP timer rewrite to be added back without disturbing the ABI. The new design is believed to be a good thing, but the outstanding issues are leading to significant stability/correctness problems that are holding up 7.0. This patch was generated by silby, but is being committed by proxy due to poor network connectivity for silby this week. Approved by: re (kensmith) Submitted by: silby Tested by: rwatson, kris Problems reported by: peter, kris, others
# f5874737	09-Jun-2007	Andre Oppermann <andre@FreeBSD.org>	Handle a race condition on >2 core machines in tcp_timer() when a timer issues a shutdown and a simultaneous close on the socket happens. This race condition is inherent in the current socket/ inpcb life cycle system but can be handled well. Reported by: kris Tested by: kris (on 8-core machine)
# c214db75	27-May-2007	Robert Watson <rwatson@FreeBSD.org>	In tcp_timer_2msl(), tp can never become NULL, so don't check it for NULL before entering tcp_trace(). Found with: Coverity Prevent(tm) CID: 1840
# 2104448f	16-May-2007	Andre Oppermann <andre@FreeBSD.org>	Move TIME_WAIT related functions and timer handling from files other than repo copied tcp_subr.c into tcp_timewait.c#1.284: tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck() tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset() tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop() tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan() This is a mechanical move with appropriate renames and making them static if used only locally. The tcp_tw_2msl_scan() cleanup function is still run from the tcp_slowtimo() in tcp_timer.c.
# f2565d68	10-May-2007	Robert Watson <rwatson@FreeBSD.org>	Move universally to ANSI C function declarations, with relatively consistent style(9)-ish layout.
# 37ba9d11	06-May-2007	Andre Oppermann <andre@FreeBSD.org>	Fix two comments.
# b8152ba7	11-Apr-2007	Andre Oppermann <andre@FreeBSD.org>	Change the TCP timer system from using the callout system five times directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)
# 5dd9dfef	04-Apr-2007	Andre Oppermann <andre@FreeBSD.org>	Retire unused TCP_SACK_DEBUG.
# ad3f9ab3	21-Mar-2007	Andre Oppermann <andre@FreeBSD.org>	ANSIfy function declarations and remove register keywords for variables. Consistently apply style to all function declarations.
# 6489fe65	19-Mar-2007	Andre Oppermann <andre@FreeBSD.org>	Match up SYSCTL declaration style.
# 7c72af87	26-Feb-2007	Mohan Srinivasan <mohans@FreeBSD.org>	Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby.
# 751dea29	07-Sep-2006	Ruslan Ermilov <ru@FreeBSD.org>	Back when we had T/TCP support, we used to apply different timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).
# 3c89486c	07-Sep-2006	Ruslan Ermilov <ru@FreeBSD.org>	Remove a microoptimization for i386 that was a micropessimization for amd64.
# 2c857a9b	06-Sep-2006	Gleb Smirnoff <glebius@FreeBSD.org>	o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function. Reviewed by: ru
# 464469c7	11-Aug-2006	Mohan Srinivasan <mohans@FreeBSD.org>	Fixes an edge case bug in timewait handling where ticks rolling over causing the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby
# d8ab0ec6	03-Jun-2006	Robert Watson <rwatson@FreeBSD.org>	When entering a timer on a tcpcb, don't continue processing if it has been dropped. This prevents a bug introduced during the socket/pcb refcounting work from occuring, in which occasionally the retransmit timer may fire after a connection has been reset, resulting in the resulting R\|A TCP packet having a source port of 0, as the port reservation has been released. While here, fixing up some RUNLOCK->WUNLOCK bugs. MFC after: 1 month
# ffb761f6	16-May-2006	Gleb Smirnoff <glebius@FreeBSD.org>	- Backout one line from 1.78. The tp can be freed by tcp_drop(). - Style next line. Coverity ID: 912
# 31272868	05-May-2006	Robert Watson <rwatson@FreeBSD.org>	Only return (tw) from tcp_twclose() if reuse is passed, otherwise return NULL. In principle this shouldn't change the behavior, but avoids returning a potentially invalid/inappropriate pointer to the caller. Found with: Coverity Prevent (tm) Submitted by: pjd MFC after: 3 months
# 623dce13	01-Apr-2006	Robert Watson <rwatson@FreeBSD.org>	Update TCP for infrastructural changes to the socket/pcb refcount model, pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, the receive code no longer requires the pcbinfo lock, and the send code only requires it if building a new connection on an otherwise unconnected socket triggered via sendto() with an address. This should significnatly reduce tcbinfo lock contention in the receive and send cases. - In order to support the invariant that so_pcb != NULL, it is now necessary for the TCP code to not discard the tcpcb any time a connection is dropped, but instead leave the tcpcb until the socket is shutdown. This case is handled by setting INP_DROPPED, to substitute for using a NULL so_pcb to indicate that the connection has been dropped. This requires the inpcb lock, but not the pcbinfo lock. - Unlike all other protocols in the tree, TCP may need to retain access to the socket after the file descriptor has been closed. Set SS_PROTOREF in tcp_detach() in order to prevent the socket from being freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether or not it needs to free the socket when the connection finally does close. The typical case where this occurs is if close() is called on a TCP socket before all sent data in the send socket buffer has been transmitted or acknowledged. If INP_SOCKREF is found when the connection is dropped, we release the inpcb, tcpcb, and socket instead of flagging INP_DROPPED. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Annotate the existence of a long-standing race in the TCP timer code, in which timers are stopped but not drained when the socket is freed, as waiting for drain may lead to deadlocks, or have to occur in a context where waiting is not permitted. This race has been handled by testing to see if the tcpcb pointer in the inpcb is NULL (and vice versa), which is not normally permitted, but may be true of a inpcb and tcpcb have been freed. Add a counter to test how often this race has actually occurred, and a large comment for each instance where we compare potentially freed memory with NULL. This will have to be fixed in the near future, but requires is to further address how to handle the timer shutdown shutdown issue. - Several TCP calls no longer potentially free the passed inpcb/tcpcb, so no longer need to return a pointer to indicate whether the argument passed in is still valid. - Un-macroize debugging and locking setup for various protocol switch methods for TCP, as it lead to more obscurity, and as locking becomes more customized to the methods, offers less benefit. - Assert copyright on tcp_usrreq.c due to significant modifications that have been made as part of this work. These changes significantly modify the memory management and connection logic of our TCP implementation, and are (as such) High Risk Changes, and likely to contain serious bugs. Please report problems to the current@ mailing list ASAP, ideally with simple test cases, and optionally, packet traces. MFC after: 3 months
# 1c53f806	25-Mar-2006	Robert Watson <rwatson@FreeBSD.org>	Explicitly assert socket pointer is non-NULL in tcp_input() so as to provide better debugging information. Prefer explicit comparison to NULL for tcpcb pointers rather than treating them as booleans. MFC after: 1 month
# a4684d74	16-Feb-2006	Andre Oppermann <andre@FreeBSD.org>	Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available instead of being private to tcp_timer.c. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
# f59a9ebf	18-Jul-2005	Robert Watson <rwatson@FreeBSD.org>	Remove no-op spl's and most comment references to spls, as TCP locking is believed to be basically done (modulo any remaining bugs). MFC after: 3 days
# 2cdbfa66	20-May-2005	Paul Saab <ps@FreeBSD.org>	Replace t_force with a t_flag (TF_FORCEDATA). Submitted by: Raja Mukerji. Reviewed by: Mohan, Silby, Andre Opperman.
# c398230b	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes
# db0aae38	22-Dec-2004	Robert Watson <rwatson@FreeBSD.org>	Remove the now unused tcp_canceltimers() function. tcpcb timers are now stopped as part of tcp_discardcb(). MFC after: 2 weeks
# 950ab1e4	22-Dec-2004	Robert Watson <rwatson@FreeBSD.org>	Remove an annotation of a minor race relating to the update of multiple MIB entries using sysctl in short order, which might result in unexpected values for tcp_maxidle being generated by tcp_slowtimo. In practice, this will not happen, or at least, doesn't require an explicit comment. MFC after: 2 weeks
# 79a9e59c	05-Dec-2004	Robert Watson <rwatson@FreeBSD.org>	Assert the tcptw inpcb lock in tcp_timer_2msl_reset(), as fields in the tcptw undergo non-atomic read-modify-writes. MFC after: 2 weeks
# cce83ffb	23-Nov-2004	Robert Watson <rwatson@FreeBSD.org>	tcp_timewait() performs multiple non-atomic reads on the tcptw structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks
# b42ff86e	23-Nov-2004	Robert Watson <rwatson@FreeBSD.org>	De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possible but unlikely races that could be corrected by having tcp_keepcnt and tcp_keepintvl modifications go through handler functions via sysctl, but probably is not worth doing. Updates to multiple sysctls within evaluation of a single addition are unlikely. Annotate that tcp_canceltimers() is currently unused. De-spl tcp_timer_delack(). De-spl tcp_timer_2msl(). MFC after: 2 weeks
# c94c54e4	02-Nov-2004	Andre Oppermann <andre@FreeBSD.org>	Remove RFC1644 T/TCP support from the TCP side of the network stack. A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch
# a4f757cd	16-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	White space cleanup for netinet before branch: - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
# 6d90faf3	23-Jun-2004	Paul Saab <ps@FreeBSD.org>	Add support for TCP Selective Acknowledgements. The work for this originated on RELENG_4 and was ported to -CURRENT. The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there. You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit. Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)
# f36cfd49	07-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
# 97d8d152	20-Nov-2003	Andre Oppermann <andre@FreeBSD.org>	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
# 2a074620	08-Nov-2003	Sam Leffler <sam@FreeBSD.org>	use local values instead of chasing pointers Supported by: FreeBSD Foundation
# 9d11646d	15-Jul-2003	Jeffrey Hsu <hsu@FreeBSD.org>	Unify the "send high" and "recover" variables as specified in the lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1]
# f058535d	04-Jun-2003	Jeffrey Hsu <hsu@FreeBSD.org>	Compensate for decreasing the minimum retransmit timeout. Reviewed by: jlemon
# 607b0b0c	08-Mar-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Remove a panic(); if the zone allocator can't provide more timewait structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs
# 340c35de	19-Feb-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
# 79909384	19-Feb-2003	Jonathan Lemon <jlemon@FreeBSD.org>	Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs
# cb942153	13-Jan-2003	Jeffrey Hsu <hsu@FreeBSD.org>	Fix NewReno. Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
# abe239cf	24-Dec-2002	Jeffrey Hsu <hsu@FreeBSD.org>	Validate inp to prevent an use after free.
# c74af4fa	05-Sep-2002	Bruce Evans <bde@FreeBSD.org>	Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending on namespace pollution 4 layers deep in <netinet/in_pcb.h>. Removed unused includes. Sorted includes.
# 8ea8a680	20-Jul-2002	John Polstra <jdp@FreeBSD.org>	Fix overflows in intermediate calculations in sysctl_msec_to_ticks(). At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle to be reported as negative. MFC after: 3 days
# 701bec5a	18-Jul-2002	Matthew Dillon <dillon@FreeBSD.org>	Introduce two new sysctl's: net.inet.tcp.rexmit_min (default 3 ticks equiv) This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only. net.inet.tcp.rexmit_slop (default 200ms) This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues. Note that the original code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant. Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits.
# f76fcf6d	10-Jun-2002	Jeffrey Hsu <hsu@FreeBSD.org>	Lock up inpcb. Submitted by: Jennifer Yang <yangjihui@yahoo.com>
# 4cc20ab1	31-May-2002	Seigo Tanimura <tanimura@FreeBSD.org>	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu
# 243917fe	19-May-2002	Seigo Tanimura <tanimura@FreeBSD.org>	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
# c39a614e	07-Dec-2001	Robert Watson <rwatson@FreeBSD.org>	o Our currenty userland boot code (due to rc.conf and rc.network) always enables TCP keepalives using the net.inet.tcp.always_keepalive by default. Synchronize the kernel default with the userland default.
# b0e3ad75	21-Aug-2001	Mike Silbersack <silby@FreeBSD.org>	Much delayed but now present: RFC 1948 style sequence numbers In order to ensure security and functionality, RFC 1948 style initial sequence number generation has been implemented. Barring any major crypographic breakthroughs, this algorithm should be unbreakable. In addition, the problems with TIME_WAIT recycling which affect our currently used algorithm are not present. Reviewed by: jesper
# 2d610a50	07-Jul-2001	Mike Silbersack <silby@FreeBSD.org>	Temporary feature: Runtime tuneable tcp initial sequence number generation scheme. Users may now select between the currently used OpenBSD algorithm and the older random positive increment method. While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT handling; this is causing trouble for an increasing number of folks. To switch between generation schemes, one sets the sysctl net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments, 1 = the OpenBSD algorithm. 1 is still the default. Once a secure _and_ compatible algorithm is implemented, this sysctl will be removed. Reviewed by: jlemon Tested by: numerous subscribers of -net
# 08517d53	22-Jun-2001	Mike Silbersack <silby@FreeBSD.org>	Eliminate the allocation of a tcp template structure for each connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed. Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs. Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks
# 7ceb7783	31-May-2001	Jesper Skriver <jesper@FreeBSD.org>	Disable rfc1323 and rfc1644 TCP extensions if we havn't got any response to our third SYN to work-around some broken terminal servers (most of which have hopefully been retired) that have bad VJ header compression code which trashes TCP segments containing unknown-to-them TCP options. PR: kern/1689 Submitted by: jesper Reviewed by: wollman MFC after: 2 weeks
# d1745f45	20-Apr-2001	Jesper Skriver <jesper@FreeBSD.org>	Say goodbye to TCP_COMPAT_42 Reviewed by: wollman Requested by: wollman
# f0a04f3f	17-Apr-2001	Kris Kennaway <kris@FreeBSD.org>	Randomize the TCP initial sequence numbers more thoroughly. Obtained from: OpenBSD Reviewed by: jesper, peter, -developers
# 7d42e30c	26-Feb-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Use more aggressive retransmit timeouts for the initial SYN packet. As we currently drop the connection after 4 retransmits + 2 ICMP errors, this allows initial connection attempts to be dropped much faster.
# d17e895b	02-Oct-2000	Jonathan Lemon <jlemon@FreeBSD.org>	If TCPDEBUG is defined, we could dereference a tp which was freed.
# af1270f8	15-Sep-2000	Jonathan Lemon <jlemon@FreeBSD.org>	It is possible for a TCP callout to be removed from the timing wheel, but have a network interrupt arrive and deactivate the timeout before the callout routine runs. Check for this case in the callout routine; it should only run if the callout is active and not on the wheel.
# 77978ab8	04-Jul-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Previous commit changing SYSCTL_HANDLER_ARGS violated KNF. Pointed out by: bde
# 82d9ae4e	03-Jul-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Style police catches up with rev 1.26 of src/sys/sys/sysctl.h: Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources: -sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
# 46f58482	05-May-2000	Jonathan Lemon <jlemon@FreeBSD.org>	Implement TCP NewReno, as documented in RFC 2582. This allows better recovery for multiple packet losses in a single window. The algorithm can be toggled via the sysctl net.inet.tcp.newreno, which defaults to "on". Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
# fb59c426	09-Jan-2000	Yoshinobu Inoue <shin@FreeBSD.org>	tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
# 9fc2bcf6	31-Aug-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Simplify, and return an error if the user attempts to set a TCP time value which results in < 1 tick. Suggested by: bde
# ccb4d0c6	30-Aug-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Add a SYSCTL_PROC so that TCP timer values are now expressed to the user in ms, while they are stored internally as ticks. Note that there probably are rounding bogons here, especially on the alpha.
# 9b8b58e0	30-Aug-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Restructure TCP timeout handling: - eliminate the fast/slow timeout lists for TCP and instead use a callout entry for each timer. - increase the TCP timer granularity to HZ - implement "bad retransmit" recovery, as presented in "On Estimating End-to-End Network Path Properties", by Allman and Paxson. Submitted by: jlemon, wollmann
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 3d177f46	03-May-1999	Bill Fumerola <billf@FreeBSD.org>	Add sysctl descriptions to many SYSCTL_XXXs PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
# 552b7df4	24-Apr-1998	David Greenman <dg@FreeBSD.org>	Ensure that TCP_REXMTVAL doesn't return a value less than t_rttmin. This is believed to have been broken with the Brakmo/Peterson srtt calculation changes. The result of this bug is that TCP connections could time out extremely quickly (in 12 seconds). Also backed out jdp's partial fix for this problem in rev 1.17 of tcp_timer.c as it is obsoleted by this commit. Bug was pointed out by Kevin Lehey <kml@roller.nas.nasa.gov>. PR: 6068
# 8e5db87c	06-Apr-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the last traces of TUBA. Inspired by: PR kern/3317
# f498eeee	25-Feb-1998	David Greenman <dg@FreeBSD.org>	Changes to support the addition of a new sysctl variable: net.inet.tcp.delack_enabled Which defaults to 1 and can be set to 0 to disable TCP delayed-ack processing (i.e. all acks are immediate).
# 92252381	24-Jan-1998	Eivind Eklund <eivind@FreeBSD.org>	Make TCP_COMPAT_42 a new style option.
# 0cc12cc5	16-Sep-1997	Joerg Wunsch <joerg@FreeBSD.org>	Make TCPDEBUG a new-style option.
# 1fd0b058	02-Aug-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 7b40aa32	13-Sep-1996	Paul Traina <pst@FreeBSD.org>	Make the misnamed tcp initial keepalive timer value (which is really the time, in seconds, that state for non-established TCP sessions stays about) a sysctl modifyable variable. [part 1 of two commits, I just realized I can't play with the indices as I was typing this commit message.]
# af7a2999	12-Jul-1996	David Greenman <dg@FreeBSD.org>	Fixed two bugs in previous commit: be sure to include tcp_debug.h when TCPDEBUG is defined, and fix typo in TCPDEBUG2() macro.
# 2c37256e	11-Jul-1996	Garrett Wollman <wollman@FreeBSD.org>	Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
# 588c9225	03-Jun-1996	John Polstra <jdp@FreeBSD.org>	Fix a bug in the handling of the "persist" state which, under certain circumstances, caused perfectly good connections to be dropped. This happened for connections over a LAN, where the retransmit timer calculation TCP_REXMTVAL(tp) returned 0. If sending was blocked by flow control for long enough, the old code dropped the connection, even though timely replies were being received for all window probes. Reviewed by: W. Richard Stevens <rstevens@noao.edu>
# 2d8266af	14-Apr-1996	David Greenman <dg@FreeBSD.org>	Two fixes from Rich Stevens: 1) Set the persist timer to help time-out connections in the CLOSING state. 2) Honor the keep-alive timer in the CLOSING state. This fixes problems with connections getting "stuck" due to incompletion of the final connection shutdown which can be a BIG problem on busy WWW servers.
# 34be9bf3	04-Apr-1996	Poul-Henning Kamp <phk@FreeBSD.org>	Add a sysctl (net.inet.tcp.always_keepalive: 0) that when set will force keepalive on all tcp sessions. Setsockopt(2) cannot override this setting. Maybe another one is needed that just changes the default for SO_KEEPALIVE ? Requested by: Joe Greco <jgreco@brasil.moneng.mei.com>
# 2ee45d7d	11-Mar-1996	David Greenman <dg@FreeBSD.org>	Move or add #include <queue.h> in preparation for upcoming struct socket changes.
# 74b48c1d	04-Jan-1996	Andras Olah <olah@FreeBSD.org>	Reverse the modification which caused the annoying m_copydata crash: set the TF_ACKNOW flag when the REXMT timer goes off to force a retransmission. In certain situations pulling snd_nxt back to snd_una is not sufficient.
# 0312fbe9	14-Nov-1995	Poul-Henning Kamp <phk@FreeBSD.org>	New style sysctl & staticize alot of stuff.
# 98163b98	09-Nov-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Start adding new style sysctl here too.
# 356ad1b0	03-Nov-1995	Andras Olah <olah@FreeBSD.org>	Setting the TF_ACKNOW flag was redundant in the REXMT timeout because tcp_output() checks for the condition snd_nxt == snd_una. Reviewed by: davidg, wollman, olah Suggested by: Richard Stevens
# e79adb8e	03-Oct-1995	Garrett Wollman <wollman@FreeBSD.org>	Finish 4.4-Lite-2 merge: randomize TCP initial sequence numbers to make ISS-guessing spoofing attacks harder.
# efe4b0eb	21-Sep-1995	Garrett Wollman <wollman@FreeBSD.org>	Second try: get 4.4-Lite-2 into the source tree. The conflicts don't matter because none of our working source files are on the CSRG branch any more. Obtained from: 4.4BSD-Lite-2
# cc0964fb	29-Jul-1995	David Greenman <dg@FreeBSD.org>	Add connection drop capability for persist timeouts. Reviewed by: Andras Olah Obtained from: 4.4BSD-lite2 via W. Richard Stevens
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# 23062062	12-Apr-1995	David Greenman <dg@FreeBSD.org>	Fixed bug I introduced when changing PCB list to use 4.4BSD style queue macros. Basically, detect 'tp' going away differently.
# 15bd2b43	08-Apr-1995	David Greenman <dg@FreeBSD.org>	Implemented PCB hashing. Includes new functions in_pcbinshash, in_pcbrehash, and in_pcblookuphash.
# 41f82abe	15-Feb-1995	Garrett Wollman <wollman@FreeBSD.org>	Transaction TCP support now standard. Hack away!
# a0292f23	09-Feb-1995	Garrett Wollman <wollman@FreeBSD.org>	Merge Transaction TCP, courtesy of Andras Olah <olah@cs.utwente.nl> and Bob Braden <braden@isi.edu>. NB: This has not had David's TCP ACK hack re-integrated. It is not clear what the correct solution to this problem is, if any. If a better solution doesn't pop up in response to this message, I'll put David's code back in (or he's welcome to do so himself).
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources