History log of /freebsd-current/sys/sys/socket.h
Revision Date Author Comments
# 3fa95784 27-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: define shutdown(2) constants in cpp namespace

There is software that uses SHUT_RD, SHUT_WR as preprocessor defines and
its build was broken by enum declaration. Keep the enum, but provide
defines to propagate the constants to cpp namespace.

PR: 277994
PR: 277995
Fixes: c3276e02beab825824e3147b31af33af66298430


# d62c4607 18-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: remove unused KPIs to manipulate sockets

These KPIs were added in dd0e6c383a9f0 and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet. But IfAPI does
that for the sake of large collection of NIC drivers not being aware of
struct ifnet. For the sockets it is unclear what could be a large
collection of externally written kernel modules that need extensively use
sockets and not be aware of their internals at the same time. This
isolation of a structure knowledge requires a lot of work, and just
throwing in a few KPIs isn't helpful.

Reviewed by: kib, olce, markj
Differential Revision: https://reviews.freebsd.org/D44311


# 5bba2728 16-Jan-2024 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: make pr_shutdown fully protocol specific method

Disassemble a one-for-all soshutdown() into protocol specific methods.
This creates a small amount of copy & paste, but makes code a lot more
self documented, as protocol specific method would execute only the code
that is relevant to that protocol and nothing else. This also fixes a
couple recent regressions and reduces risk of future regressions. The
extended KPI for the new pr_shutdown removes need for the extra pr_flush
which was added for the sake of SCTP which could not perform its shutdown
properly with the old one. Particularly for SCTP this change streamlines
a lot of code.

Some notes on why certain parts of code were copied or were not to certain
protocols:
* The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is
needed only for those protocols that may be connected or disconnected.
* The above reduces into only SS_ISCONNECTED for those protocols that
always connect instantly.
* The ENOTCONN and continue processing hack is left only for datagram
protocols.
* The SOLISTENING(so) block is copied to those protocols that listen(2).
* sorflush() on SHUT_RD is copied almost to every protocol, but that
will be refactored later.
* wakeup(&so->so_timeo) is copied to protocols that can make a non-instant
connect(2), can SO_LINGER or can accept(2).

There are three protocols (netgraph(4), Bluetooth, SDP) that did not have
pr_shutdown, but old soshutdown() would still perform sorflush() on
SHUT_RD for them and also wakeup(9). Those protocols partially supported
shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost
shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay
about that and SDP is almost abandoned anyway.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43413


# c3276e02 16-Jan-2024 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: make shutdown(2) how argument a enum

Reviwed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43412


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# b6a816f1 19-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: garbage collect so_sototcpcb()

It had very little use and required inpcb layer to know tcpcb.


# 8624f434 30-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

divert: declare PF_DIVERT domain and stop abusing PF_INET

The divert(4) is not a protocol of IPv4. It is a socket to
intercept packets from ipfw(4) to userland and re-inject them
back. It can divert and re-inject IPv4 and IPv6 packets today,
but potentially it is not limited to these two protocols. The
IPPROTO_DIVERT does not belong to known IP protocols, it
doesn't even fit into u_char. I guess, the implementation of
divert(4) was done the way it is done basically because it was
easier to do it this way, back when protocols for sockets were
intertwined with IP protocols and domains were statically
compiled in.

Moving divert(4) out of inetsw accomplished two important things:

1) IPDIVERT is getting much closer to be not dependent on INET.
This will be finalized in following changes.
2) Now divert socket no longer aliases with raw IPv4 socket.
Domain/proto selection code won't need a hack for SOCK_RAW and
multiple entries in inetsw implementing different flavors of
raw socket can merge into one without requirement of raw IPv4
being the last member of dom_protosw.

Differential revision: https://reviews.freebsd.org/D36379


# a4b8c6d9 12-Aug-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

netlink: add AF_NETLINK / PF_NETLINK definitions

Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D36174
MFC after: 1 weeks


# 37351133 14-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

uipc_socket.c: Modify MSG_TLSAPPDATA to only do Alert Records

Without this patch, the MSG_TLSAPPDATA flag would cause
soreceive_generic() to return ENXIO for any non-application
data record in a TLS receive stream.

This works ok for TLS1.2, since Alert records appear to be
the only non-application data records received.
However, for TLS1.3, there can be post-handshake handshake
records, such as NewSessionKey sent to the client from the
server. These handshake records cannot be handled by the
upcall which does an SSL_read() with length == 0.

It appears that the client can simply throw away these
NewSessionKey records, but to do so, it needs to receive
them within the kernel.

This patch modifies the semantics of MSG_TLSAPPDATA slightly,
so that it only applies to Alert records and not Handshake
records. It is needed to allow the krpc to work with KTLS1.3.

Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35170


# 7045b160 28-Jul-2021 Roy Marples <roy@marples.name>

socket: Implement SO_RERROR

SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by: philip (network), kbowling (transport), gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26652


# 924d1c9a 08-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.


# 4a01b854 07-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

SO_RERROR indicates that receive buffer overflows should be handled as errors.
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.

This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.


# ede4af47 17-Nov-2020 Conrad Meyer <cem@FreeBSD.org>

unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI

As this ABI is still fresh (r367287), let's correct some mistakes now:

- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess

Discussed with: kib, markj, trasz
Differential Revision: https://reviews.freebsd.org/D27084


# fedeb08b 03-Oct-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Introduce scalable route multipath.

This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

Internet6:
GrpIdx NhIdx Weight Slots Gateway Netif Refcnt
1 ------- ------- ------- --------------------------------------- --------- 1
13 10 1 2001:db8::2 vlan2
14 20 2 2001:db8::3 vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default

Tested by: olivier
Reviewed by: glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D26449


# f6e54eb3 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

sys: clean up empty lines in .c and .h files


# 102829aa 19-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add the MSG_TLSAPPDATA flag to indicate "return ENXIO" for non-application TLS
data records.

The kernel RPC cannot process non-application data records when
using TLS. It must to an upcall to a userspace daemon that will
call SSL_read() to process them.

This patch adds a new flag called MSG_TLSAPPDATA that the kernel
RPC can use to tell sorecieve() to return ENXIO instead of a non-application
data record, when that is what is at the top of the receive queue.
I put the code in #ifdef KERN_TLS/#endif, although it will build without
that, so that it is recognized as only useful when KERN_TLS is enabled.
The alternative to doing this is to have the kernel RPC re-queue the
non-application data message after receiving it, but that seems more
complicated and might introduce message ordering issues when there
are multiple non-application data records one after another.

I do not know what, if any, changes will be required to support TLS1.3.

Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D25923


# a560f3eb 20-May-2020 Wei Hu <whu@FreeBSD.org>

HyperV socket implementation for FreeBSD

This change adds Hyper-V socket feature in FreeBSD. New socket address
family AF_HYPERV and its kernel support are added.

Submitted by: Wei Hu <weh@microsoft.com>
Reviewed by: Dexuan Cui <decui@microsoft.com>
Relnotes: yes
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D24061


# a6663252 12-Apr-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Introduce nexthop objects and new routing KPI.

This is the foundational change for the routing subsytem rearchitecture.
More details and goals are available in https://reviews.freebsd.org/D24141 .

This patch introduces concept of nexthop objects and new nexthop-based
routing KPI.

Nexthops are objects, containing all necessary information for performing
the packet output decision. Output interface, mtu, flags, gw address goes
there. For most of the cases, these objects will serve the same role as
the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
multiple BGP full-views, as these objects will be shared between routing
entries. This allows to store more information in the nexthop.

New KPI:

struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
uint32_t scopeid, uint32_t flags, uint32_t flowid);

These 2 function are intended to replace all all flavours of
<in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous
fib[46]-generation functions.

Upon successful lookup, they return nexthop object which is guaranteed to
exist within current NET_EPOCH. If longer lifetime is desired, one can
specify NHR_REF as a flag and get a referenced version of the nexthop.
Reference semantic closely resembles rtentry one, allowing sed-style conversion.

Additionally, another 2 functions are introduced to support uRPF functionality
inside variety of our firewalls. Their primary goal is to hide the multipath
implementation details inside the routing subsystem, greatly simplifying
firewalls implementation:

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);

All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
embedding and allowing to support IPv4 link-locals in the future.

Structure changes:
* rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
* rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.

Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.

More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232

Reviewed by: ae,glebius(initial version)
Differential Revision: https://reviews.freebsd.org/D24232


# 6b01d4d4 21-Aug-2018 Michael Tuexen <tuexen@FreeBSD.org>

Add SOL_SOCKET level socket option with name SO_DOMAIN to get
the domain of a socket.

This is helpful when testing and Solaris and Linux have the same
socket option using the same name.

Reviewed by: bcr@, rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16791


# 1a43cff9 06-Jun-2018 Sean Bruno <sbruno@FreeBSD.org>

Load balance sockets with new SO_REUSEPORT_LB option.

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by: Johannes Lundberg <johalun0@gmail.com>
Obtained from: DragonflyBSD
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003


# 7875017c 24-Apr-2018 Sean Bruno <sbruno@FreeBSD.org>

Revert r332894 at the request of the submitter.

Submitted by: Johannes Lundberg <johalun0_gmail.com>
Sponsored by: Limelight Networks


# 7b7796ee 23-Apr-2018 Sean Bruno <sbruno@FreeBSD.org>

Load balance sockets with new SO_REUSEPORT_LB option

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by: Johannes Lundberg <johanlun0@gmail.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 06193f0b 07-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Use hardware timestamps to report packet timestamps for SO_TIMESTAMP
and other similar socket options.

Provide new control message SCM_TIME_INFO to supply information about
timestamp. Currently it indicates that the timestamp was
hardware-assisted and high-precision, for software timestamps the
message is not returned. Reserved fields are added to ABI to report
additional info about it, it is expected that raw hardware clock value
might be useful for some applications.

Reviewed by: gallatin (previous version), hselasky
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12638


# 779f106a 08-Jun-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
- Take out selinfo's from the socket buffers into the socket. The
first reason is to support braindamaged scenario when a socket is
added to kevent(2) and then listen(2) is cast on it. The second
reason is that there is future plan to make socket buffers pluggable,
so that for a dataflow socket a socket buffer can be changed, and
in this case we also want to keep same selinfos through the lifetime
of a socket.
- Remove struct struct so_accf. Since now listening stuff no longer
affects struct socket size, just move its fields into listening part
of the union.
- Provide sol_upcall field and enforce that so_upcall_set() may be called
only on a dataflow socket, which has buffers, and for listening sockets
provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
- Add a mutex to socket, to be used instead of socket buffer lock to lock
fields of struct socket that don't belong to a socket buffer.
- Allow to acquire two socket locks, but the first one must belong to a
listening socket.
- Make soref()/sorele() to use atomic(9). This allows in some situations
to do soref() without owning socket lock. There is place for improvement
here, it is possible to make sorele() also to lock optionally.
- Most protocols aren't touched by this change, except UNIX local sockets.
See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
listening sockets: provide function solisten_dequeue(), and use it in
the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
infiniband, rpc.

o UNIX local sockets.
- Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
local sockets. Most races exist around spawning a new socket, when we
are connecting to a local listening socket. To cover them, we need to
hold locks on both PCBs when spawning a third one. This means holding
them across sonewconn(). This creates a LOR between pcb locks and
unp_list_lock.
- To fix the new LOR, abandon the global unp_list_lock in favor of global
unp_link_lock. Indeed, separating these two locks didn't provide us any
extra parralelism in the UNIX sockets.
- Now call into uipc_attach() may happen with unp_link_lock hold if, we
are accepting, or without unp_link_lock in case if we are just creating
a socket.
- Another problem in UNIX sockets is that uipc_close() basicly did nothing
for a listening socket. The vnode remained opened for connections. This
is fixed by removing vnode in uipc_close(). Maybe the right way would be
to do it for all sockets (not only listening), simply move the vnode
teardown from uipc_detach() to uipc_close()?

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9770


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# f3b3aa83 18-Jan-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Format and sort MSG_* flags, to prevent misedits in future. There is no
functional change.


# cf66bb8d 18-Jan-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Fix regression from r311568: collision of MSG_NOSIGNAL with MSG_MORETOCOME
lead to delayed send of data sent with sendto(MSG_NOSIGNAL).

Submitted by: rrs


# f3e7afe2 18-Jan-2017 Hans Petter Selasky <hselasky@FreeBSD.org>

Implement kernel support for hardware rate limited sockets.

- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by: wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision: https://reviews.freebsd.org/D3687
Sponsored by: Mellanox Technologies
MFC after: 3 months


# 339efd75 16-Jan-2017 Maxim Sobolev <sobomax@FreeBSD.org>

Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by: adrian, gnn
Differential Revision: https://reviews.freebsd.org/D9171


# 14da48cb 06-Jan-2017 John Baldwin <jhb@FreeBSD.org>

Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.

Reviewed by: adrian
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D8955


# 00b5ffde 17-Nov-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Add flag SF_USER_READAHEAD to sendfile(2). When specified, the syscall won't
do any speculations about readahead, and use exactly the amount of readahead
specified by user. E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee
that no readahead at all will be performed.


# 455ee98d 31-May-2016 Ed Schouten <ed@FreeBSD.org>

Make CMSG_*() work without having NULL available.

The <sys/socket.h> is not supposed to declare NULL, according to POSIX.
Our implementation complies with that, meaning that we need to make sure
that CMSG_*() doesn't use it.


# 9c64cfe5 29-Mar-2016 Gleb Smirnoff <glebius@FreeBSD.org>

The sendfile(2) allows to send extra data from userspace before the file
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.

Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix


# bf420ace 29-Jan-2016 Konstantin Belousov <kib@FreeBSD.org>

Add implementations of sendmmsg(3) and recvmmsg(3) functions which
wraps sendmsg(2) and recvmsg(2) into batch send and receive operation.
The goal of this implementation is only to provide API compatibility
with Linux.

The cancellation behaviour of the functions is not quite right, but
due to relative rare use of cancellation it is considered acceptable
comparing with the complexity of the correct implementation. If
functions are reimplemented as syscalls, the fix would come almost
trivial. The direct use of the syscall trampolines instead of libc
wrappers for sendmsg(2) and recvmsg(2) is to avoid data loss on
cancellation.

Submitted by: Boris Astardzhiev <boris.astardzhiev@gmail.com>
Discussed with: jilles (cancellation behaviour)
MFC after: 1 month


# 2bab0c55 08-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.

The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.

Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.

The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.

Celebrates: Netflix global launch!
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
Relnotes: yes


# 0e87b36e 11-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SF_KQUEUE code. This code was developed at Netflix, but was not
ever used. It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained. It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.

Sponsored by: Netflix


# 5b26ea5d 25-Feb-2014 John Baldwin <jhb@FreeBSD.org>

Remove more constants related to static sysctl nodes. The MAXID constants
were primarily used to size the sysctl name list macros that were removed
in r254295. A few other constants either did not have an associated
sysctl node, or the associated node used OID_AUTO instead.

PR: ports/184525 (exp-run)


# 2b80a756 16-Jan-2014 Adrian Chadd <adrian@FreeBSD.org>

Implement the extension api for sendfile to allow for kqueue notifications.

This is still under a bit of flux, as the final API hasn't been nailed
down. It's also unclear whether we should define the two new types in the
header or not - it may allow bad code to compile that shouldn't (ie,
since uintX's are defined, the developer may not include sys/types.h.)

Reviewed by: peter, imp, bde
Sponsored by: Netflix, Inc.


# fd77bbb9 26-Aug-2013 John Baldwin <jhb@FreeBSD.org>

Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by: bde
Tested by: exp-run by bdrewery


# ca04d21d 15-Aug-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by: kib
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 863c7e45 08-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

- Reserve a special AF for SDP. The one we were incorrectly using before
was taken by another AF.

Sponsored by: EMC / Isilon Storage Division


# 988fb7a6 12-Jun-2013 Kevin Lo <kevlo@FreeBSD.org>

Add PF_IEEE80211 definition.

Reviewed by: rpaulo


# da7d2afb 01-May-2013 Jilles Tjoelker <jilles@FreeBSD.org>

Add accept4() system call.

The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.


# 937c9165 30-Mar-2013 Jilles Tjoelker <jilles@FreeBSD.org>

Improve namespacing in <sys/socket.h>:

* MSG_NOSIGNAL is in POSIX.1-2008.
* MSG_NOTIFICATION (SCTP) is not in POSIX.
* PRU_FLUSH_* (SCTP) are not in POSIX.
* bindat()/connectat() are not in POSIX.

Discussed with: rrs (PRU_FLUSH_*)


# c2e3c52e 19-Mar-2013 Jilles Tjoelker <jilles@FreeBSD.org>

Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.

This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.

The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.

For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags
argument.

Reviewed by: kib


# 7493f24e 02-Mar-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Implement two new system calls:

int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

which allow to bind and connect respectively to a UNIX domain socket with a
path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by: The FreeBSD Foundation
Discussed with: rwatson, jilles, kib, des


# 0d25fab4 01-Feb-2013 John Baldwin <jhb@FreeBSD.org>

Add placeholder constants to reserve a portion of the socket option
name space for use by downstream vendors to add custom options.

MFC after: 2 weeks


# 747d2fa1 26-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Add SO_PROTOCOL/SO_PROTOTYPE socket SOL_SOCKET-level option to get the
socket protocol number. This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.

Submitted by: Jukka A. Ukkonen <jau iki fi>
PR: kern/162352
Discussed with: bz
Reviewed by: glebius
MFC after: 2 weeks


# bfe09c55 11-Feb-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Properly name the sysctl to "iflistl" rather than "iflist2", which had been
the prototype name and slipped in in r231505.

Spotted in a reply from: bde
MFC after: 3 days


# 6d076ae8 10-Feb-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Introduce a new NET_RT_IFLISTL API to query the address list. It works
on extended and extensible structs if_msghdrl and ifa_msghdrl. This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.

Bump __FreeBSD_version to allow ports to more easily detect the new API.

Reviewed by: glebius, brooks
MFC after: 3 days


# a4646b93 17-Apr-2011 Jilles Tjoelker <jilles@FreeBSD.org>

Allow using CMSG_NXTHDR with -Wcast-align.

If various checks are omitted, the CMSG_NXTHDR macro expands to
(struct cmsghdr *)((char *)(cmsg) + \
_ALIGN(((struct cmsghdr *)(cmsg))->cmsg_len))

Although there is no alignment problem (assuming cmsg is properly aligned
and _ALIGN is correct), this violates -Wcast-align on strict-alignment
architectures. Therefore an intermediate cast to void * is appropriate here.

There is no workaround other than not using -Wcast-align.

MFC after: 2 weeks


# 5c9d0a9a 12-Nov-2010 Luigi Rizzo <luigi@FreeBSD.org>

This commit implements the SO_USER_COOKIE socket option, which lets
you tag a socket with an uint32_t value. The cookie can then be
used by the kernel for various purposes, e.g. setting the skipto
rule or pipe number in ipfw (this is the reason SO_USER_COOKIE has
been implemented; however there is nothing ipfw-specific in its
implementation).

The ipfw-related code that uses the optopn will be committed separately.

This change adds a field to 'struct socket', but the struct is not
part of any driver or userland-visible ABI so the change should be
harmless.

See the discussion at
http://lists.freebsd.org/pipermail/freebsd-ipfw/2009-October/004001.html

Idea and code from Paul Joe, small modifications and manpage
changes by myself.

Submitted by: Paul Joe
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 2a46409d 11-Jan-2010 Brooks Davis <brooks@FreeBSD.org>

MFC r201955:
Improve the comment about CMGROUP_MAX.


# 7cca94f3 09-Jan-2010 Brooks Davis <brooks@FreeBSD.org>

Improve the comment about CMGROUP_MAX.

MFC after: 3 days


# a254d1f1 08-Sep-2009 Poul-Henning Kamp <phk@FreeBSD.org>

Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.


# 2ac047d1 08-Sep-2009 Poul-Henning Kamp <phk@FreeBSD.org>

Move the duplicate definition of struct sockaddr_storage to its own
include file, and include this where the previous duplicate definitions were.

Static program checkers like FlexeLint rightfully take a dim view of
duplicate definitions, even if they currently are identical.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# cb752f1d 08-Aug-2008 Xin LI <delphij@FreeBSD.org>

Add prototype defination for setfib(2) to sys/socket.h.


# dd0e6c38 20-Jul-2008 Kip Macy <kmacy@FreeBSD.org>

Add accessor functions for socket fields.

MFC after: 1 week


# 8b07e49a 09-May-2008 Julian Elischer <julian@FreeBSD.org>

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# cf71e438 14-Apr-2008 Randall Stewart <rrs@FreeBSD.org>

Add pru_flush routine so a transport can
flush itself during Shutdown

MFC after: 1 week


# b75a1171 03-Feb-2008 Poul-Henning Kamp <phk@FreeBSD.org>

Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs
referencing the files VM pages are returned from the network stack,
making changes to the file safe.

This flag does not guarantee that the data has been transmitted to the
other end.


# 76b262c4 12-Dec-2007 Kip Macy <kmacy@FreeBSD.org>

Fix style issues with initial TCP offload commit

Requested by: rwatson
Submitted by: rwatson


# 620721db 12-Dec-2007 Kip Macy <kmacy@FreeBSD.org>

Add driver independent interface to offload active established TCP connections

Reviewed by: silby


# 915fabcd 18-Sep-2007 Alfred Perlstein <alfred@FreeBSD.org>

Reserve AF_ constants for vendors by giving them the odd numbered
AF_ constants ranging from 39 to 133.

Approved by: re (kensmith)


# 71498f30 12-Jun-2007 Bruce M Simpson <bms@FreeBSD.org>

Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)


# 18a60731 19-Apr-2007 Mike Makonnen <mtm@FreeBSD.org>

Make inet6_rth_* family of functions more compliant with RFC3542:
1. CMSG_NXTHDR(mhdr, cmsg) is supposed to dereference cmsg and return
the next header in the chain. If cmsg is NULL it should return
the first header, behaving essentially like CMSG_FIRSTHDR().
2. inet6_rth_(space|init|add) should do basic checking on their input
to verify that the number of headers (segments) is
between 0 and 127 inclusive.

MFC-After: 1 month


# f8829a4a 03-Nov-2006 Randall Stewart <rrs@FreeBSD.org>

Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn


# d99b0dd2 02-Nov-2006 Andre Oppermann <andre@FreeBSD.org>

Rewrite kern_sendfile() to work in two loops, the inner which turns as many
VM pages into mbufs as it can -- up to the free send socket buffer space.
The outer loop then drops the whole mbuf chain into the send socket buffer,
calls tcp_output() on it and then waits until 50% of the socket buffer are
free again to repeat the cycle. This way tcp_output() gets the full amount
of data to work with and can issue up to 64K sends for TSO to chop up in
the network adapter without using any CPU cycles. Thus it gets very efficient
especially with the readahead the VM and I/O system do.

The previous sendfile(2) code simply looped over the file, turned each 4K
page into an mbuf and sent it off. This had the effect that TSO could only
generate 2 packets per send instead of up to 44 at its maximum of 64K.

Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of
sleeping on mbuf allocation failures.

Benchmarking shows significant improvements (95% confidence):
45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO)
83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO)

(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)

Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 month


# 246b5467 25-Jul-2006 Sam Leffler <sam@FreeBSD.org>

add support for 802.11 packet injection via bpf

Together with: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Reviewed by: arch@
MFC after: 1 month


# 8434c29b 18-Sep-2005 Robert Watson <rwatson@FreeBSD.org>

Add three new read-only socket options, which allow regression tests
and other applications to query the state of the stack regarding the
accept queue on a listen socket:

SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog)
SO_LISTENQLEN Return the value of so_qlen (complete sockets)
SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets)

Minor white space tweaks to existing socket options to make them
consistent.

Discussed with: andre
MFC after: 1 week


# 6a2989fd 12-Apr-2005 Matthew N. Dodd <mdodd@FreeBSD.org>

Implement unix(4) socket options LOCAL_CREDS and LOCAL_CONNWAIT.

- Add unp_addsockcred() (for LOCAL_CREDS).
- Add an argument to unp_connect2() to differentiate between
PRU_CONNECT and PRU_CONNECT2. (for LOCAL_CONNWAIT)

Obtained from: NetBSD (with some changes)


# d025278a 08-Mar-2005 Alfred Perlstein <alfred@FreeBSD.org>

Make MSG_NOSIGNAL available to native programs.
Bump FreeBSD_version to note this change.

Reviewed by: sobomax


# 8d6e40c3 08-Mar-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by: alfred


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# d297f702 29-Nov-2004 Paul Saab <ps@FreeBSD.org>

If soreceive() is called from a socket callback, there's no reason
to do a window update to the peer (thru an ACK) from soreceive()
itself. TCP will do that upon return from the socket callback.
Sending a window update from soreceive() results in a lock reversal.

Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson


# 8f078016 11-Aug-2004 Andre Oppermann <andre@FreeBSD.org>

RFC 2292 requires to check msg_controllen, in case that the kernel returns
an empty list for some reasons.

Obtained from:

NetBSD: socket.h,v 1.62 2001/09/07 08:13:01 itojun
OpenBSD: socket.h,v 1.39 2001/09/07 16:45:25 itojun

MFC after: 2 weeks


# 14956200 16-Jul-2004 Hartmut Brandt <harti@FreeBSD.org>

According to POSIX sys/socket.h must define CMSG_NXTHDR but most not
define NULL. This means we cannot use NULL in the definition of CMSG_NXTHDR.
So replace NULL with 0.

PR: kern/60309
Submitted by: Jeff King <peff-freebsd@peff.net>


# 38665043 01-Jun-2004 Don Lewis <truckman@FreeBSD.org>

Whitespace correction - #define should be followed by a tab.


# 866046f5 31-May-2004 Don Lewis <truckman@FreeBSD.org>

Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call. The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation. The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.


# 73a4a9a7 09-May-2004 Maksim Yevmenkin <emax@FreeBSD.org>

Mode few Bluetooth defines into system include files

Reviewed by: imp


# 82c6e879 06-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# cc4ca9da 13-Mar-2004 Matthew N. Dodd <mdodd@FreeBSD.org>

Define AF_ARP/PF_ARP.


# b49d824e 08-Feb-2004 Mike Silbersack <silby@FreeBSD.org>

Add the SF_NODISKIO flag to sendfile. This flag causes sendfile to be
mindful of blocking on disk I/O and instead return EBUSY when such
blocking would occur.

Results from the DeBox project indicate that blocking on disk I/O
can slow the performance of a kqueue/poll based webserver. Using
a flag such as SF_NODISKIO and throwing connections that would block
to helper processes/threads helped increase performance.

Currently, only the Flash webserver uses this flag, although it could
probably be applied to thttpd with relative ease.

Idea by: Yaoping Ruan & Vivek Pai


# be8a62e8 31-Jan-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead. Simultaneous
use of the two options is possible and they will return consistent
timestamps.

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.


# 9f144cff 24-Dec-2003 Alfred Perlstein <alfred@FreeBSD.org>

Add restrict qualifiers.

PR: 44394
Submitted by: Craig Rodrigues <rodrige@attbi.com>


# 05b2efe0 14-Nov-2003 Bruce M Simpson <bms@FreeBSD.org>

Add a sysctl MIB, NET_RT_IFMALIST, to retrieve multicast group memberships
in a protocol-independent way.

Submitted by: harti


# 3c6b084e 05-Mar-2003 Peter Wemm <peter@FreeBSD.org>

Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.


# f0603d5b 26-Feb-2003 Mike Barcroft <mike@FreeBSD.org>

Move the typedef for size_t into _iovec.h, so that size_t is available
for struct iovec.


# 5c648bac 28-Dec-2002 Poul-Henning Kamp <phk@FreeBSD.org>

It is bad style to define the same structure in multiple header
files which might be included together.

Things like debuggers and lint-like programs get their knickers in
a twist (rightly so one might add) when they find different locations
for the same named struct depending on which .h file were included
first.

This is a stellar example of Very Bad Thinking on the part of the
standards dudes who wrote that both sys/uio.h and sys/socket.h
should define struct iovec the same way.

Fix this by putting struct iovec into its own miniature sys/_iovec.h
file and #include that from sys/socket.h and sys/uio.h.

Sensible people could just put iovec into sys/_types.h but there
is probably some standard or other which will be violated if we
did something that horrible.


# 6059c38b 14-Dec-2002 Bill Fenner <fenner@FreeBSD.org>

Add prototype for sockatmark().


# 6c2f2bc5 13-Nov-2002 Mike Barcroft <mike@FreeBSD.org>

Fix a constant in the standard namespace not to depend on another
constant in the BSD namespace.


# d59298df 12-Oct-2002 Mike Barcroft <mike@FreeBSD.org>

o Add typedefs for size_t and ssize_t.
o Add typedefs for gid_t, off_t, pid_t, and uid_t in the non-standards
case.
o Add struct iovec (also defined in <sys/uio.h>).
o Add visibility conditionals to avoid defining non-standard
extentions in the standards case.
o Change spelling of some types so they work without including
<sys/types.h> (u_char -> unsigned char, u_short -> unsigned short,
int64 -> __int64, caddr_t -> char *)
o Add comments about missing restrict type-qualifiers and missing
function.


# abbd8902 21-Aug-2002 Mike Barcroft <mike@FreeBSD.org>

o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif

Concept by: bde
Reviewed by: jake, obrien


# c33c8251 20-Jun-2002 Alfred Perlstein <alfred@FreeBSD.org>

Implement SO_NOSIGPIPE option for sockets. This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin


# 8973ba1b 14-Jun-2002 Robert Watson <rwatson@FreeBSD.org>

Reserve two constants for managing socket MAC labels via socket options.


# 9d697beb 14-Jun-2002 Robert Watson <rwatson@FreeBSD.org>

Whitespaec consistency.


# b837f53a 11-Jun-2002 Garrett Wollman <wollman@FreeBSD.org>

SO_PRIVSTATE has been commented out for long enough now....


# 6e330f3e 01-Jun-2002 Alfred Perlstein <alfred@FreeBSD.org>

bde noticed that SOMAXCONN breaks pretty badly as an option for LINT.
so back it out.


# c6e43821 19-Apr-2002 Mike Barcroft <mike@FreeBSD.org>

Add sa_family_t type to <sys/_types.h> and typedefs to <netinet/in.h>
and <sys/socket.h>. Previously, sa_family_t was only typedef'd in
<sys/socket.h>.


# 789f12fe 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P


# dc20def4 03-Feb-2002 Mark Murray <markm@FreeBSD.org>

Zero functional difference; make some integer constants unsigned, as
they are used in unsigned context. This shuts lint(1) up in a few
significant ways with "signed/unsigned" arithmetic warnings.


# 5752bffd 04-Sep-2001 David E. O'Brien <obrien@FreeBSD.org>

style(9) the structure definitions.


# 75f5bc80 12-Jun-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

FreeBSD already avoided namespace pollution (rev.1.45).

Submitted by: bde


# 33841545 10-Jun-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 4c68f41d 22-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Add address families AF_SLOW and AF_SCLUSTER. These are used by the
Sitara QoSworks box.

Obtained from: Sitara Networks Inc.


# 7bbd138e 12-Apr-2001 Alfred Perlstein <alfred@FreeBSD.org>

Make SOMAXCONN a kernel option.

Submitted by: Terry Lambert <terry@lambert.org>


# 392df6bc 22-Mar-2001 Alfred Perlstein <alfred@FreeBSD.org>

Remove struct cmessage from sys/socket.h and reintroduce the private
definitions.

Requested by: wollman


# 4ed6d634 21-Mar-2001 Alfred Perlstein <alfred@FreeBSD.org>

Hopefully fix some of the bugs in passing credentials over UNIX domain sockets.

Make struct cmessage visible from socket.h (about 4 places were
defining it for themselves which wasn't good)

Make __rpc_get_local_uid() useable and give it prototype that's
visible.

Fix some issues with printing out usernames from rpcbind and keyserv.


# 5981ddec 16-Feb-2001 Bruce Evans <bde@FreeBSD.org>

Fixed disordering in previous commit.


# ad9fdc8f 15-Feb-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

Correct 2nd argument of getnameinfo(3) to socklen_t.

Reviewed by: itojun


# ef4cf3e0 19-Dec-2000 Assar Westerlund <assar@FreeBSD.org>

remove pfctlinput


# e255a226 22-Nov-2000 Jeroen Ruigrok van der Werven <asmodai@FreeBSD.org>

Reduce number of #ifdef nestings.

Submitted by: bde


# 6b1d8cea 08-Nov-2000 Jeroen Ruigrok van der Werven <asmodai@FreeBSD.org>

Fix CMSG and ALIGN macro usage.
Previously we had to include <machine/param.h> or <sys/param.h> bogusly
due to the fact that <sys/socket.h> CMSG macros needed the ALIGN macro,
which was defined in param.h. However, including param.h was a disaster
for namespace pollution.
This solution, as contributed by shin a while ago, fixes it elegantly
by wrapping the definitions around some namespace pollution preventer
definitions.
This patch was long overdue.
This should allow any network programmer to use <sys/socket.h> as
before.

PR: 19971, 20530
Submitted by: Martin Kaeske <MartinKaeske@lausitz.net>
Mark Andrews <Mark.Andrews@nominum.com>
Patch submitted by: shin
Reviewed by: bde


# 6878e27e 22-Sep-2000 Jeroen Ruigrok van der Werven <asmodai@FreeBSD.org>

Document which RFC introduced CMSG_SPACE() and CMSG_LEN().


# 04da7244 22-Sep-2000 Jeroen Ruigrok van der Werven <asmodai@FreeBSD.org>

Fix comment about the bsd-api-new-02a draft. This has been superceded
by RFC 2553.


# a79b7128 19-Jun-2000 Alfred Perlstein <alfred@FreeBSD.org>

return of the accept filter part II

accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg


# a72fda71 18-Jun-2000 Alfred Perlstein <alfred@FreeBSD.org>

backout accept optimizations.

Requested by: jmg, dcs, jdp, nate


# 8f4e4aa5 15-Jun-2000 Alfred Perlstein <alfred@FreeBSD.org>

add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()
until the incoming connection has either data waiting or what looks like a
HTTP request header already in the socketbuffer. This ought to reduce
the context switch time and overhead for processing requests.

The initial idea and code for HTTPACCEPT came from Yahoo engineers and has
been cleaned up and a more lightweight DELAYACCEPT for non-http servers
has been added

Reviewed by: silence on hackers.


# 19a938aa 11-Mar-2000 Yoshinobu Inoue <shin@FreeBSD.org>

Fix sockaddr_storage related macro definition, as ss_family member type change.
(Currently, no effect but for future portability)

Approved by: jkh

Reviewed by: bde


# 7d0d8dc3 03-Mar-2000 Yoshinobu Inoue <shin@FreeBSD.org>

CMSG_XXX macros alignment fixes to follow RFC2292.

Approved by: jkh

Submitted by: Partly from tech@openbsd
Reviewed by: itojun


# 5d60ed0e 13-Jan-2000 Yoshinobu Inoue <shin@FreeBSD.org>

Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.

s/__ss_family/ss_family/
s/__ss_len/ss_len/

Makeworld is confirmed, and no application should be affected by this change
yet.


# 664a31e4 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 6a800098 22-Dec-1999 Yoshinobu Inoue <shin@FreeBSD.org>

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 9b962c56 24-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

General clean-up of socket.h and associated sources to synchronise up
with NetBSD and the Single Unix Specification v2.

This updates some structures with other, almost equivalent types and
effort is under way to get the whole more consistent.

Also removes a double definition of INET6 and some other clean-ups.

Reviewed by: green, bde, phk
Some part obtained from: NetBSD, SUSv2 specification


# 76429de4 05-Nov-1999 Yoshinobu Inoue <shin@FreeBSD.org>

KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project


# 43aebc0e 21-Oct-1999 Julian Elischer <julian@FreeBSD.org>

Add missing entries in a structure.


# b551ea1b 21-Oct-1999 Julian Elischer <julian@FreeBSD.org>

fix typo


# 4cf49a43 21-Oct-1999 Julian Elischer <julian@FreeBSD.org>

Whistle's Netgraph link-layer (sometimes more) networking infrastructure.
Been in production for 3 years now. Gives Instant Frame relay to if_sr
and if_ar drivers, and PPPOE support soon. See:
ftp://ftp.whistle.com/pub/archie/netgraph/index.html
for on-line manual pages.

Reviewed by: Doug Rabson (dfr@freebsd.org)
Obtained from: Whistle CVS tree


# 114ae644 14-Oct-1999 Mike Smith <msmith@FreeBSD.org>

Implement pseudo_AF_HDRCMPLT, which controls the state of the 'header
completion' flag. If set, the interface output routine will assume that
the packet already has a valid link-level source address. This defaults
to off (the address is overwritten)

PR: kern/10680
Submitted by: "Christopher N . Harrell" <cnh@mindspring.net>
Obtained from: NetBSD


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# dd0b2081 05-Nov-1998 David Greenman <dg@FreeBSD.org>

Implemented zero-copy TCP/IP extensions via sendfile(2) - send a
file to a stream socket. sendfile(2) is similar to implementations in
HP-UX, Linux, and other systems, but the API is more extensive and
addresses many of the complaints that the Apache Group and others have
had with those other implementations. Thanks to Marc Slemko of the
Apache Group for helping me work out the best API for this.
Anyway, this has the "net" result of speeding up sends of files over
TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU
cycles) when compared to a traditional read/write loop.


# 3f8c4506 15-Sep-1998 Poul-Henning Kamp <phk@FreeBSD.org>

(this is an extract from src/share/examples/atm/README)

===================================
HARP | Host ATM Research Platform
===================================

HARP 3

What is this stuff?
-------------------
The Advanced Networking Group (ANG) at the Minnesota Supercomputer Center,
Inc. (MSCI), as part of its work on the MAGIC Gigabit Testbed, developed
the Host ATM Research Platform (HARP) software, which allows IP hosts to
communicate over ATM networks using standard protocols. It is intended to
be a high-quality platform for IP/ATM research.

HARP provides a way for IP hosts to connect to ATM networks. It supports
standard methods of communication using IP over ATM. A host's standard IP
software sends and receives datagrams via a HARP ATM interface. HARP provides
functionality similar to (and typically replaces) vendor-provided ATM device
driver software.

HARP includes full source code, making it possible for researchers to
experiment with different approaches to running IP over ATM. HARP is
self-contained; it requires no other licenses or commercial software packages.

HARP implements support for the IETF Classical IP model for using IP over ATM
networks, including:

o IETF ATMARP address resolution client
o IETF ATMARP address resolution server
o IETF SCSP/ATMARP server
o UNI 3.1 and 3.0 signalling protocols
o Fore Systems's SPANS signalling protocol

What's supported
----------------
The following are supported by HARP 3:

o ATM Host Interfaces
- FORE Systems, Inc. SBA-200 and SBA-200E ATM SBus Adapters
- FORE Systems, Inc. PCA-200E ATM PCI Adapters
- Efficient Networks, Inc. ENI-155p ATM PCI Adapters

o ATM Signalling Protocols
- The ATM Forum UNI 3.1 signalling protocol
- The ATM Forum UNI 3.0 signalling protocol
- The ATM Forum ILMI address registration
- FORE Systems's proprietary SPANS signalling protocol
- Permanent Virtual Channels (PVCs)

o IETF "Classical IP and ARP over ATM" model
- RFC 1483, "Multiprotocol Encapsulation over ATM Adaptation Layer 5"
- RFC 1577, "Classical IP and ARP over ATM"
- RFC 1626, "Default IP MTU for use over ATM AAL5"
- RFC 1755, "ATM Signaling Support for IP over ATM"
- RFC 2225, "Classical IP and ARP over ATM"
- RFC 2334, "Server Cache Synchronization Protocol (SCSP)"
- Internet Draft draft-ietf-ion-scsp-atmarp-00.txt,
"A Distributed ATMARP Service Using SCSP"

o ATM Sockets interface
- The file atm-sockets.txt contains further information

What's not supported
--------------------
The following major features of the above list are not currently supported:

o UNI point-to-multipoint support
o Driver support for Traffic Control/Quality of Service
o SPANS multicast and MPP support
o SPANS signalling using Efficient adapters

This software was developed under the sponsorship of the Defense Advanced
Research Projects Agency (DARPA).

Reviewed (lightly) by: phk
Submitted by: Network Computing Services, Inc.


# 65d1dabb 12-Sep-1998 Garrett Wollman <wollman@FreeBSD.org>

Define the Posix.1g names for the howto argument to shutdown(2).


# fdcef9e6 01-Feb-1998 Alexander Langer <alex@FreeBSD.org>

Added inet6 to CTL_NET_NAMES.


# cb3453e8 21-Dec-1997 Bruce Evans <bde@FreeBSD.org>

Moved some declarations from <sys/socket.h> to the correct places, and
fixed everything that depended on them being misplaced.


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 57bf258e 16-Aug-1997 Garrett Wollman <wollman@FreeBSD.org>

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


# 68713f97 08-May-1997 Kenjiro Cho <kjc@FreeBSD.org>

merge ATM driver


# 8fa5c00f 30-Apr-1997 Garrett Wollman <wollman@FreeBSD.org>

Remove SO_PRIVSTATE socket option; it is no longer necessary, nor implemented
in the kernel. inetd should automatically notic that it has gone away
once it is recompiled.


# 0b788fa1 21-Mar-1997 Bill Paul <wpaul@FreeBSD.org>

Add support to sendmsg()/recvmsg() for passing credentials between
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).

What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:

- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.

- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.

- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.

The receiver can now inspect the credential information and use it to
authenticate the client.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 96c19658 29-Aug-1996 Peter Wemm <peter@FreeBSD.org>

Grab the next slot for AF_INET6/PF_INET6, the resolver uses it.


# ad8a9541 14-Aug-1996 John Polstra <jdp@FreeBSD.org>

Fix a typo in the #define for PF_RTIP, even though I doubt it will
ever make one bit of difference to anybody.


# 96ccf1cb 18-Jun-1996 Garrett Wollman <wollman@FreeBSD.org>

When bringing the netkey stuff over, I forgot that I had decided to change
AF_KEY into pseudo_AF_KEY, and defined PF_KEY incorrectly. Fix.

Noticed by: pst


# bd22f58e 14-Jun-1996 Garrett Wollman <wollman@FreeBSD.org>

This is the `netkey' kernel key-management service (the PF_KEY analogue
to PF_ROUTE) from NRL's IPv6 distribution, heavily modified by me for
better source layout, formatting, and textual conventions. I am told
that this code is no longer under active development, but it's a useful
hack for those interested in doing work on network security, key management,
etc. This code has only been tested twice, so it should be considered
highly experimental.

Obtained from: ftp.ripe.net


# 82dab6ce 09-May-1996 Garrett Wollman <wollman@FreeBSD.org>

Make it possible to return more than one piece of control information
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps as control information (PR #1179).

Submitted by: Louis Mamakos <loiue@TransSys.com>


# 02e2c406 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[new sys/syscallargs.h file, to be "cvs rm"ed]


# b1358054 07-Feb-1996 Garrett Wollman <wollman@FreeBSD.org>

Define a new socket option, SO_PRIVSTATE. Getting it returns the state
of the SS_PRIV flag in so_state; setting it always clears same.


# 6c5e9bbd 30-Jan-1996 Mike Pritchard <mpp@FreeBSD.org>

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 8ee124f5 05-Jan-1996 David Greenman <dg@FreeBSD.org>

Increased default SOMAXCONN from 32 to 128. 128 is the largest value I
consider "safe" for most systems. Note that this is (has been for some
time) also tunable with sysctl (via kern.somaxconn) should the operator
wish to increase this value even higher. Also note that 128 is what
the Netscape WWW server reportedly asks for.


# a89f7290 12-Sep-1995 David Greenman <dg@FreeBSD.org>

Increased SOMAXCONN from 5 to 32. 5 was too small a value for just about
any reasonably busy machine, and by any measure is a lousy "max" value.
32 was chosen after a careful analysis of typical listen queue depths
on several busy Internet servers (both web and ftp). I also intend to
add a statistics counter for dropped connection requests due to the limit
being exceeded.


# 24e16f2e 06-Feb-1995 Garrett Wollman <wollman@FreeBSD.org>

Merge in the socket-level support for Transaction TCP from the OLAH_TTCP
branch.

Submitted by: Andras Olah <olah@cs.utwente.nl>


# 62397647 05-Jan-1995 Stefan Eßer <se@FreeBSD.org>

Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Reviewed by: <wollman>
First hooks and defines for the ISDN driver,
that soon will see the light ...


# b4a8d575 08-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Added prototypes here and there. Moved pfctlinput into socket.h.


# f86eaaca 02-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Prototypes, prototypes and even more prototypes. Not quite done yet, but
getting closer all the time.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources