History log of /freebsd-current/sys/netinet/siftr.c
Revision Date Author Comments
# a95cd6e4 07-Dec-2023 Richard Scheffenegger <rscheff@FreeBSD.org>

siftr: refactor batch log processing

Refactoring to perform the batch processing of
log messaged in two phases. First cycling through a limited
number of collected packets, and only thereafter freeing
the processed packets. This prevents any chance of calling
free while in a critical / spinlocked section.

Reviewed By: tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42949


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# d79a9edb 23-Nov-2023 Mitchell Horne <mhorne@FreeBSD.org>

alq, siftr: add panic/debugger checks to shutdown hooks

Don't try to gracefully terminate the pkt_manager thread if the
scheduler is not running.

We should not attempt to shutdown ald if RB_NOSYNC is set, and must not
if the scheduler is stopped (the function calls wakeup()).

Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42340


# fafb03ab 25-Jul-2023 Cheng Cui <cc@FreeBSD.org>

siftr: flush pkt_nodes to the log file in batch

Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D41175


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 2176c9ab 01-Jul-2023 Michael Tuexen <tuexen@FreeBSD.org>

dtrace: improve siftr probe

Improve consistency of the field names with tcpsinfo_t:
* Use mss instead of max_seg_size.
* Use lport and rport instead of tcp_localport and tcp_foreignport.

Use t_flags instead of flags to improve consistency with t_flags2.

Add laddr and raddr, since the addresses were missing when compared
to the output of siftr.

Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D40834


# cd9da8d0 01-Jul-2023 Michael Tuexen <tuexen@FreeBSD.org>

siftr: unbreak dtrace support

This patch adds back some fields needed by the siftr probe, which were
removed in
https://cgit.freebsd.org/src/commit/?id=aa61cff4249c92689d7a1f15db49d65d082184cb

With this fix, you can run dtrace scripts again when the siftr
module is loaded. And the siftr probe works again.

Reviewed by: rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D40826


# dc2d26df 30-Jun-2023 Michael Tuexen <tuexen@FreeBSD.org>

siftr: provide dtrace with the correct pointer to data

This fixes a bug which was introduced in the commit
https://svnweb.freebsd.org/changeset/base/282276

Reviewed by: cc, rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D40806


# 7a52b570 30-May-2023 Cheng Cui <cc@FreeBSD.org>

siftr: bring back the siftr_pkts_per_log feature

Summary: this missing feature is introduced by commit aa61cff4249c

Test Plan: verified in Emulab.net

Reviewers: rscheff, tuexen
Approved by: tuexen (mentor)
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D40336


# b71f2784 29-May-2023 Cheng Cui <cc@FreeBSD.org>

siftr: convert this tval.tv_sec to type intmax_t to print across platforms

Reviewers: rscheff, tuexen
Approved by: tuexen (mentor)
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D40323


# 78914cd6 29-May-2023 Cheng Cui <cc@FreeBSD.org>

siftr: sync-up man page with recent code changes, and cleanup code

Reviewers: rscheff, tuexen
Approved by: tuexen (mentor)
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D40322


# d29a9a61 29-May-2023 Cheng Cui <cc@FreeBSD.org>

siftr: fix a build error for powerpc and arm platforms

Summary: This is introduced by commit aa61cff4249c.

Reviewers: rscheff, tuexen
Approved by: tuexen (mentor)
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D40320


# aa61cff4 27-May-2023 Cheng Cui <cc@FreeBSD.org>

siftr: three changes that improve performance

Summary:
(1) use inp_flowid or a new packet hash for a flow identification
(2) cache constant connection info into struct flow_hash_node
(3) use compressed notation for IPv6 address representation

Reviewers: rscheff, tuexen
Approved by: tuexen (mentor)
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D40302


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 60167184 26-Apr-2023 Cheng Cui <cc@FreeBSD.org>

siftr: remove barely used hash generation per record

Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39835


# d090464e 25-Apr-2023 Cheng Cui <cc@FreeBSD.org>

Change the unit of srtt and rto to usec, inspired by these in struct "tcp_info". Therefore, no need hz and tcp_rtt_scale in the headline of the log. Update the man page as well.

Summary: Simplify srtt and rto values in siftr log.

Test Plan:
Tested in Emulab testbed:
cc@s1:~ % sudo sysctl net.inet.siftr
net.inet.siftr.port_filter: 0
net.inet.siftr.genhashes: 0
net.inet.siftr.ppl: 1
net.inet.siftr.logfile: /var/log/siftr.log
net.inet.siftr.enabled: 0
cc@s1:~ % sudo sysctl net.inet.siftr.port_filter=5001
net.inet.siftr.port_filter: 0 -> 5001
cc@s1:~ % sudo sysctl net.inet.siftr.enabled=1
net.inet.siftr.enabled: 0 -> 1
cc@s1:~ %
cc@s1:~ % iperf -c r1 -n 1M
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[ 1] local 10.1.1.2 port 33817 connected with 10.1.1.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-0.91 sec 1.00 MBytes 9.22 Mbits/sec
cc@s1:~ % sudo sysctl net.inet.siftr.enabled=0
net.inet.siftr.enabled: 1 -> 0

cc@s1:~ % ll /var/log/siftr.log
-rw-r--r-- 1 root wheel 91K Apr 25 09:38 /var/log/siftr.log
cc@s1:~ % cat /var/log/siftr.log
enable_time_secs=1682437111 enable_time_usecs=121115 siftrver=1.3.0 sysname=FreeBSD sysver=1400088 ipmode=4
o,0x00000000,1682437125.907343,10.1.1.2,33817,10.1.1.3,5001,1073725440,1073725440,2,0,0,0,0,2,536,0,1,672,1000000,32768,0,65536,0,0,0,0,0
i,0x00000000,1682437126.106759,10.1.1.2,33817,10.1.1.3,5001,1073725440,1073725440,2,0,0,0,0,2,536,0,1,672,1000000,32768,0,65536,0,1,0,0,0
o,0x00000000,1682437126.106767,10.1.1.2,33817,10.1.1.3,5001,1073725440,14480,2,65535,65700,9,9,4,1460,201000,1,16778209,803000,33580,0,65700,0,0,0,0,0
o,0x00000000,1682437126.107141,10.1.1.2,33817,10.1.1.3,5001,1073725440,14480,2,65535,65700,9,9,4,1460,201000,1,16778208,803000,33580,60,65700,0,0,0,0,0
...
i,0x00000000,1682437127.016754,10.1.1.2,33817,10.1.1.3,5001,1073725440,606109,1030,748544,66048,9,9,9,1460,100812,1,1008,303000,475948,0,65700,0,0,0,0,0
o,0x00000000,1682437127.016759,10.1.1.2,33817,10.1.1.3,5001,1073725440,606109,1030,748544,66048,9,9,10,1460,100812,1,1011,303000,475948,0,65700,0,0,0,0,0
disable_time_secs=1682437131 disable_time_usecs=767582 num_inbound_tcp_pkts=371 num_outbound_tcp_pkts=186 total_tcp_pkts=557 num_inbound_skipped_pkts_malloc=0 num_outbound_skipped_pkts_malloc=0 num_inbound_skipped_pkts_tcpcb=0 num_outbound_skipped_pkts_tcpcb=0 num_inbound_skipped_pkts_inpcb=0 num_outbound_skipped_pkts_inpcb=0 total_skipped_tcp_pkts=0 flow_list=10.1.1.2;33817-10.1.1.3;5001,

Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39803


# 1f782fcc 24-Apr-2023 Cheng Cui <cc@FreeBSD.org>

Remove unused fields in siftr_stats. Thus, update the man page as well.

Summary: Remove unused fields in siftr_stats. Thus, update the man page as well.

Test Plan: Tested in Emulab testbed.

Reviewers: rscheff, tuexen
Approved by: rscheff, tuexen
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39776


# caf32b26 14-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

pfil: add pfil_mem_{in,out}() and retire pfil_run_hooks()

The 0b70e3e78b0 changed the original design of a single entry point
into pfil(9) chains providing separate functions for the filtering
points that always provide mbufs and know the direction of a flow.
The motivation was to reduce branching. The logical continuation
would be to do the same for the filtering points that always provide
a memory pointer and retire the single entry point.

o Hooks now provide two functions: one for mbufs and optional for
memory pointers.
o pfil_hook_args() has a new member and pfil_add_hook() has a
requirement to zero out uninitialized data. Bump PFIL_VERSION.
o As it was before, a hook function for a memory pointer may realloc
into an mbuf. Such mbuf would be returned via a pointer that must
be provided in argument.
o The only hook that supports memory pointers is ipfw:default-link.
It is rewritten to provide two functions.
o All remaining uses of pfil_run_hooks() are converted to
pfil_mem_in().
o Transparent union of pfil_packet_t and tricks to fix pointer
alignment are retired. Internal pfil_realloc() reduces down to
m_devget() and thus is retired, too.

Reviewed by: mjg, ocochard
Differential revision: https://reviews.freebsd.org/D37977


# 9c655838 06-Oct-2022 Richard Scheffenegger <rscheff@FreeBSD.org>

siftr: apply filter early on

Quickly check TCP port filter, before investing into
expensive operations.

No functional change.

Obtained from: guest-ccui
Reviewed By: #transport, tuexen, guest-ccui
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36842


# 53af6903 06-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: remove INP_TIMEWAIT flag

Mechanically cleanup INP_TIMEWAIT from the kernel sources. After
0d7445193ab, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions. Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision: https://reviews.freebsd.org/D36400


# c198adf3 12-Sep-2022 Dag-Erling Smørgrav <des@FreeBSD.org>

siftr: spell PFIL_PASS correctly.

Sponsored by: NetApp
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D36539


# 1241e8e7 07-Apr-2022 Tom Jones <thj@FreeBSD.org>

siftr: expose t_flags2 in siftr output

Replace the old snd_bwnd field which was kept for compatibility with the
t_flags2 field from the tcpcb. This exposes in siftr logs interesting
things such as ECN, PLPMTUD, Accurate ECN and if first bytes are
complete.

Reviewed by: rscheff (transport), chengc_netapp.com, debdrup (manpages)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
X-NetApp-PR: #73
Differential Revision: https://reviews.freebsd.org/D34672


# 34d8ffff 03-Nov-2021 Allan Jude <allanjude@FreeBSD.org>

SIFTR: Fix compilation with -DSIFTR_IPV6

A few pieces of the SIFTR code that are behind #ifdef SIFTR_IPV6 have
not been updated as APIs have changed, etc.

Reported by: Alexander Sideropoulos <Alexander.Sideropoulos@netapp.com>
Reviewed by: rscheff, lstewart
Sponsored by: NetApp
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D32698


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 481be5de 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by: Netflix Inc.


# 746c7ae5 07-Oct-2019 Michael Tuexen <tuexen@FreeBSD.org>

In r343587 a simple port filter as sysctl tunable was added to siftr.
The new sysctl was not added to the siftr.4 man page at the time.
This updates the man page, and removes one left over trailing whitespace.

Submitted by: Richard Scheffenegger
Reviewed by: bcr@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21619


# 54739273 01-Feb-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Repair siftr(4): PFIL_IN and PFIL_OUT are defines of some value, relying
on them having particular values can break things.


# b252313f 31-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

New pfil(9) KPI together with newborn pfil API and control utility.

The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision: https://reviews.freebsd.org/D18951


# 435a8c15 30-Jan-2019 Brooks Davis <brooks@FreeBSD.org>

Add a simple port filter to SIFTR.

SIFTR does not allow any kind of filtering, but captures every packet
processed by the TCP stack.
Often, only a specific session or service is of interest, and doing the
filtering in post-processing of the log adds to the overhead of SIFTR.

This adds a new sysctl net.inet.siftr.port_filter. When set to zero, all
packets get captured as previously. If set to any other value, only
packets where either the source or the destination ports match, are
captured in the log file.

Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Differential Revision: https://reviews.freebsd.org/D18897


# c53d6b90 18-Jan-2019 Brooks Davis <brooks@FreeBSD.org>

Make SIFTR work again after r342125 (D18443).

Correct a logic error.

Only disable when already enabled or enable when disabled.

Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Obtained from: Cheng Cui
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18885


# 855acb84 15-Dec-2018 Brooks Davis <brooks@FreeBSD.org>

Fix bugs in plugable CC algorithm and siftr sysctls.

Use the sysctl_handle_int() handler to write out the old value and read
the new value into a temporary variable. Use the temporary variable
for any checks of values rather than using the CAST_PTR_INT() macro on
req->newptr. The prior usage read directly from userspace memory if the
sysctl() was called correctly. This is unsafe and doesn't work at all on
some architectures (at least i386.)

In some cases, the code could also be tricked into reading from kernel
memory and leaking limited information about the contents or crashing
the system. This was true for CDG, newreno, and siftr on all platforms
and true for i386 in all cases. The impact of this bug is largest in
VIMAGE jails which have been configured to allow writing to these
sysctls.

Per discussion with the security officer, we will not be issuing an
advisory for this issue as root access and a non-default config are
required to be impacted.

Reviewed by: markj, bz
Discussed with: gordon (security officer)
MFC after: 3 days
Security: kernel information leak, local DoS (both require root)
Differential Revision: https://reviews.freebsd.org/D18443


# 384a5c3c 01-Oct-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of
INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic
it is possible, that the code is running in net_epoch_preempt section,
and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case.

PR: 231428
Reviewed by: mmacy, tuexen
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17335


# 2bf95012 05-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Create a new macro for static DPCPU data.

On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by: bz, emaste, markj
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D16140


# f6960e20 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

netinet silence warnings


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 47cedcbd 11-Mar-2016 John Baldwin <jhb@FreeBSD.org>

Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem.

Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5515


# bee68e94 30-Apr-2015 George V. Neville-Neil <gnn@FreeBSD.org>

Move the SIFTR DTrace probe out of the writing thread context
and directly into the place where the data is collected.


# 981ad3ec 29-Apr-2015 George V. Neville-Neil <gnn@FreeBSD.org>

Brief demo script showing the various values that can be read via
the new SIFTR statically defined tracepoint (SDT).

Differential Revision: https://reviews.freebsd.org/D2387
Reviewed by: bz, markj


# efca1668 24-Mar-2015 Lawrence Stewart <lstewart@FreeBSD.org>

The addition of flowid and flowtype in r280233 and r280237 respectively forgot
to extend the IPv6 packet node format string, which causes a build failure when
SIFTR is compiled with IPv6 support.

Reported by: Lars Eggert


# d0a8b2a5 18-Mar-2015 Hiren Panchasara <hiren@FreeBSD.org>

Add connection flow type to siftr(4).

Suggested by: adrian
Sponsored by: Limelight Networks


# a025fd14 18-Mar-2015 Hiren Panchasara <hiren@FreeBSD.org>

Add connection flowid to siftr(4).

Reviewed by: lstewart
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D2089


# cfa6009e 12-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 0e1152fc 27-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

MFC after: 3 days
Sponsored by: Mellanox Technologies


# c3322cb9 28-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 1e0e83d7 06-Mar-2013 Lawrence Stewart <lstewart@FreeBSD.org>

The hashmask returned by hashinit() is a valid index in the returned hash array.
Fix a siftr(4) potential memory leak and INVARIANTS triggered kernel panic in
hashdestroy() by ensuring the last array index in the flow counter hash table is
flushed of entries.

MFC after: 3 days


# 8f134647 22-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>


# fa046d87 30-May-2011 Robert Watson <rwatson@FreeBSD.org>

Decompose the current single inpcbinfo lock into two locks:

- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


# 6bed196c 13-Apr-2011 Sergey Kandaurov <pluknet@FreeBSD.org>

Staticize malloc types.

Approved by: lstewart
MFC after: 1 week


# 891b8ed4 12-Apr-2011 Lawrence Stewart <lstewart@FreeBSD.org>

Use the full and proper company name for Swinburne University of Technology
throughout the source tree.

Requested by: Grenville Armitage, Director of CAIA at Swinburne University of
Technology
MFC after: 3 days


# 3e288e62 22-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 92ea5581 20-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Fix a minor code redundancy nit.

MFC after: 3 days


# 052aec12 20-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

When enabling or disabling SIFTR with a VIMAGE kernel, ensure we add or remove
the SIFTR pfil(9) hook functions to or from all network stacks. This patch
allows packets inbound or outbound from a vnet to be "seen" by SIFTR.

Additional work is required to allow SIFTR to actually generate log messages for
all vnet related packets because the siftr_findinpcb() function does not yet
search for inpcbs across all vnets. This issue will be fixed separately.

Reported and tested by: David Hayes <dahayes at swin edu au>
MFC after: 3 days


# 31c6a003 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# 619ad9eb 11-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Standardise all Swinburne related copyright/licence statements throughout the
tree in preparation for another large code import. Swinburne University is the
legal entity that owns copyright and the 2-clause BSD licence is acceptable.


# 67f285a2 11-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

The university does not require that its CRICOS number be included in source
code. Remove all references from the tree.

MFC after: 3 days


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# d4d3e218 25-Sep-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Log the number of segments currently in the reassembly queue.

Sponsored by: FreeBSD Foundation


# 1c18314d 16-Sep-2010 Andre Oppermann <andre@FreeBSD.org>

Remove the TCP inflight bandwidth limiter as announced in r211315
to give way for the pluggable congestion control framework. It is
the task of the congestion control algorithm to set the congestion
window and amount of inflight data without external interference.

In 'struct tcpcb' the variables previously used by the inflight
limiter are renamed to spares to keep the ABI intact and to have
some more space for future extensions.

In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to
preserve the ABI. It is always set to 0.

In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed
to preserve the ABI. It is always set to 0.

These unused variable in the various structures may be reused in the
future or garbage collected before the next release or at some other
point when an ABI change happens anyway for other reasons.

No MFC is planned. The inflight bandwidth limiter stays disabled by
default in the other branches but remains available.


# 79848522 17-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

- Move common code from the hook functions that fills in a packet node struct to
a separate inline function. This further reduces duplicate code that didn't
have a good reason to stay as it was.

- Reorder the malloc of a pkt_node struct in the hook functions such that it
only occurs if we managed to find a usable tcpcb associated with the packet.

- Make the inp_locally_locked variable's type consistent with the prototype of
siftr_siftdata().

Sponsored by: FreeBSD Foundation


# adc5f010 13-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

The SIFTR DPCPU statistics struct was not being zeroed between enable/disable
cycles so the values would accumulate rather than reset for each cycle.

Sponsored by: FreeBSD Foundation


# 985147de 13-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Catch up with the rename of DPCPU_SUM to DPCPU_VARSUM in r209978.

Sponsored by: FreeBSD Foundation


# a5548bf6 03-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Import the Statistical Information For TCP Research (SIFTR) kernel module into
FreeBSD. SIFTR logs a range of statistics on active TCP connections to a log
file, providing the ability to make highly granular measurements of TCP
connection state. The tool is aimed at system administrators, developers and
researchers alike. Please take it for a spin and test it out - the man page
should have all the information required to get you going.

Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.

Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: dwmalone, gnn, rpaulo
Tested by: Many on freebsd-current@ and elsewhere over the years
MFC after: 1 month