History log of /freebsd-current/sys/netinet6/raw_ip6.c
Revision Date Author Comments
# 60d8dbbe 18-Jan-2024 Kristof Provost <kp@FreeBSD.org>

netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters

When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.

Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).

For example:
dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'

This can be disabled by setting the following kernel option:
options KDTRACE_NO_MIB_SDT

Reviewed by: gallatin, tuexen (previous version), gnn (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43504


# ce69e373 03-Feb-2024 Gleb Smirnoff <glebius@FreeBSD.org>

Revert "sockets: retire sorflush()"

Provide a comment in sorflush() why the socket I/O sx(9) lock is actually
important.

This reverts commit 507f87a799cf0811ce30f0ae7f10ba19b2fd3db3.


# 507f87a7 16-Jan-2024 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: retire sorflush()

With removal of dom_dispose method the function boils down to two
meaningful function calls: socantrcvmore() and sbrelease(). The latter is
only relevant for protocols that use generic socket buffers.

The socket I/O sx(9) lock acquisition in sorflush() is not relevant for
shutdown(2) operation as it doesn't do any I/O that may interleave with
read(2) or write(2). The socket buffer mutex acquisition inside
sbrelease() is what guarantees thread safety. This sx(9) acquisition in
soshutdown() can be tracked down to 4.4BSD times, where it used to be
sblock(), and it was carried over through the years evolving together with
sockets with no reconsideration of why do we carry it over. I can't tell
if that sblock() made sense back then, but it doesn't make any today.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43415


# 5bba2728 16-Jan-2024 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: make pr_shutdown fully protocol specific method

Disassemble a one-for-all soshutdown() into protocol specific methods.
This creates a small amount of copy & paste, but makes code a lot more
self documented, as protocol specific method would execute only the code
that is relevant to that protocol and nothing else. This also fixes a
couple recent regressions and reduces risk of future regressions. The
extended KPI for the new pr_shutdown removes need for the extra pr_flush
which was added for the sake of SCTP which could not perform its shutdown
properly with the old one. Particularly for SCTP this change streamlines
a lot of code.

Some notes on why certain parts of code were copied or were not to certain
protocols:
* The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is
needed only for those protocols that may be connected or disconnected.
* The above reduces into only SS_ISCONNECTED for those protocols that
always connect instantly.
* The ENOTCONN and continue processing hack is left only for datagram
protocols.
* The SOLISTENING(so) block is copied to those protocols that listen(2).
* sorflush() on SHUT_RD is copied almost to every protocol, but that
will be refactored later.
* wakeup(&so->so_timeo) is copied to protocols that can make a non-instant
connect(2), can SO_LINGER or can accept(2).

There are three protocols (netgraph(4), Bluetooth, SDP) that did not have
pr_shutdown, but old soshutdown() would still perform sorflush() on
SHUT_RD for them and also wakeup(9). Those protocols partially supported
shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost
shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay
about that and SDP is almost abandoned anyway.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D43413


# a13039e2 27-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: reoder inpcb destruction

First, merge in_pcbdetach() with in_pcbfree(). The comment for
in_pcbdetach() was no longer correct. Then, make sure we remove
the inpcb from the hash before we commit any destructive actions
on it. There are couple functions that rely on the hash lock
skipping SMR + inpcb lock to lookup an inpcb. Although there are
no known functions that similarly rely on the global inpcb list
lock, also do list removal before destructive actions.

PR: 273890
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D43122


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 3d0d5b21 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Explicitly include <net/if_private.h> in netstack

Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200


# fcb3f813 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet*: remove PRC_ constants and streamline ICMP processing

In the original design of the network stack from the protocol control
input method pr_ctlinput was used notify the protocols about two very
different kinds of events: internal system events and receival of an
ICMP messages from outside. These events were coded with PRC_ codes.
Today these methods are removed from the protosw(9) and are isolated
to IPv4 and IPv6 stacks and are called only from icmp*_input(). The
PRC_ codes now just create a shim layer between ICMP codes and errors
or actions taken by protocols.

- Change ipproto_ctlinput_t to pass just pointer to ICMP header. This
allows protocols to not deduct it from the internal IP header.
- Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer.
It has all the information needed to the protocols. In the structure,
change ip6c_finaldst fields to sockaddr_in6. The reason is that
icmp6_input() already has this address wrapped in sockaddr, and the
protocols want this address as sockaddr.
- For UDP tunneling control input, as well as for IPSEC control input,
change the prototypes to accept a transparent union of either ICMP
header pointer or struct ip6ctlparam pointer.
- In icmp_input() and icmp6_input() do only validation of ICMP header and
count bad packets. The translation of ICMP codes to errors/actions is
done by protocols.
- Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap,
inet6ctlerrmap arrays.
- In protocol ctlinput methods either trust what icmp_errmap() recommend,
or do our own logic based on the ICMP header.

Differential revision: https://reviews.freebsd.org/D36731


# 43d39ca7 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet*: de-void control input IP protocol methods

After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for
IPv4 we got vector ip_ctlprotox[] that is executed only and only from
icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed
only and only from icmp6_input(). This allows to use protocol specific
argument types in these methods instead of struct sockaddr and void.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36727


# 46ddeb6b 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet6: retire ip6protosw.h

The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with
the former removed in f0ffb944d25 in 2001 and the latter survived until
today. It has been reduced down to only one useful declaration that
moves to ip6_var.h

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36726


# 74ed2e8a 02-Sep-2022 Gleb Smirnoff <glebius@FreeBSD.org>

raw ip: fix regression with multicast and RSVP

With 61f7427f02a raw sockets protosw has wildcard pr_protocol. Protocol
of a specific pcb is stored in inp_ip_p.

Reviewed by: karels
Reported by: karels
Differential revision: https://reviews.freebsd.org/D36429
Fixes: 61f7427f02a307d28af674a12c45dd546e3898e4


# 61f7427f 30-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: cleanup protocols that existed merely to provide pr_input

Since 4.4BSD the protosw was used to implement socket types created
by socket(2) syscall and at the same to demultiplex incoming IPv4
datagrams (later copied to IPv6). This story ended with 78b1fc05b20.

These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch
packets in ip_input(), they would also be returned by pffindproto()
if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw
sockets to work correctly, all the entries were pointing at raw_usrreq
differentiating only in the value of pr_protocol.

With 78b1fc05b20 all these entries are no longer needed, as ip_protox
is independent of protosw. Any socket syscall requesting SOCK_RAW type
would end up with rip_protosw. And this protosw has its pr_protocol
set to 0, allowing to mark socket with any protocol.

For IPv6 raw socket the change required two small fixes:
o Validate user provided protocol value
o Always use protocol number stored in inp in rip6_attach, instead
of protosw value, which is now always 0.

Differential revision: https://reviews.freebsd.org/D36380


# e7d02be1 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: refactor protosw and domain static declaration and load

o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d18ef), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860dcd).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee54ea).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232


# e0b40500 11-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

raw ip6: merge rip6_output() into rip6_send()

While here remove some code that was compat legacy back in 2005, added
in a1f7e5f8ee7fe.

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36128


# c7a62c92 10-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: gather v4/v6 handling code into in_pcballoc() from protocols

Reviewed by: rrs, tuexen
Differential revision: https://reviews.freebsd.org/D36062


# a14465e1 14-Jun-2022 Mark Johnston <markj@FreeBSD.org>

rip6: Fix a lock order reversal in rip6_bind()

See also commit 71a1539e3783.

Reported by: syzbot+9b461b6a07a83cc10daa@syzkaller.appspotmail.com
Reported by: syzbot+b6ce0aec16f5fdab3282@syzkaller.appspotmail.com
Reviewed by: glebius
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35472


# a98bb75f 13-Apr-2022 John Baldwin <jhb@FreeBSD.org>

netinet6: Use __diagused for variables only used in KASSERT().


# 71a1539e 16-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

inet6: fix a LOR between rip and rawinp

Running sys/netpfil/pf/fragmentation v6 results in:

lock order reversal:
1st 0xfffffe00050429a8 rip (rip, sleep mutex) @ /usr/src/sys/netinet6/raw_ip6.c:803
2nd 0xfffff8009491e1d0 rawinp (rawinp, rw) @ /usr/src/sys/netinet6/raw_ip6.c:804
lock order rawinp -> rip established at:
0xffffffff8068e26a at witness_lock_order_add+0x28a
0xffffffff8068d087 at witness_checkorder+0x627
0xffffffff805a9f05 at __mtx_lock_flags+0x205
0xffffffff808102e4 at in_pcballoc+0x204
0xffffffff808d53c6 at rip6_attach+0x116
0xffffffff806dc4e8 at socreate+0x368
0xffffffff806eaedc at kern_socket+0xfc
0xffffffff806eadcd at sys_socket+0x2d
0xffffffff80abc774 at syscallenter+0x5c4
0xffffffff80abbeeb at amd64_syscall+0x1b
0xffffffff80a8044b at fast_syscall_common+0xf8
lock order rip -> rawinp attempted at:
0xffffffff8068dc2a at witness_checkorder+0x11ca
0xffffffff805d1b7f at _rw_wlock_cookie+0x18f
0xffffffff808d596c at rip6_connect+0x19c
0xffffffff806e0842 at soconnectat+0x142
0xffffffff806ebe36 at kern_connectat+0x136
0xffffffff806ebcdf at sys_connect+0x4f
0xffffffff80abc774 at syscallenter+0x5c4
0xffffffff80abbeeb at amd64_syscall+0x1b
0xffffffff80a8044b at fast_syscall_common+0xf8

Reviewed by: glebius
Fixes: de2d47842e880281 ("SMR protection for inpcbs")
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33508


# db0ac6de 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.


# de2d4784 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

SMR protection for inpcbs

With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc). However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree(). However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it. Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
Once the desired inpcb is found we need to transition from SMR
section to r/w lock on the inpcb itself, with a check that inpcb
isn't yet freed. This requires some compexity, since SMR section
itself is a critical(9) section. The complexity is hidden from
KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
notification) also a new KPI is provided, that hides internals of
the database - inp_next(struct inp_iterator *).

Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33022


# 7045b160 28-Jul-2021 Roy Marples <roy@marples.name>

socket: Implement SO_RERROR

SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by: philip (network), kbowling (transport), gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26652


# 2290dfb4 19-May-2021 Ryan Stone <rstone@FreeBSD.org>

Enter the net epoch before calling ip6_setpktopts

ip6_setpktopts() can look up ifnets via ifnet_by_index(), which
is only safe in the net epoch. Ensure that callers are in the net
epoch before calling this function.

Sponsored by: Dell EMC Isilon
MFC after: 4 weeks
Reviewed by: donner, kp
Differential Revision: https://reviews.freebsd.org/D30630


# d8acd268 12-May-2021 Mark Johnston <markj@FreeBSD.org>

Fix mbuf leaks in various pru_send implementations

The various protocol implementations are not very consistent about
freeing mbufs in error paths. In general, all protocols must free both
"m" and "control" upon an error, except if PRUS_NOTREADY is specified
(this is only implemented by TCP and unix(4) and requires further work
not handled in this diff), in which case "control" still must be freed.

This diff plugs various leaks in the pru_send implementations.

Reviewed by: tuexen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30151


# f161d294 02-May-2021 Mark Johnston <markj@FreeBSD.org>

Add missing sockaddr length and family validation to various protocols

Several protocol methods take a sockaddr as input. In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur. Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.

Reported by: syzkaller+KASAN
Reviewed by: tuexen, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29519


# 605284b8 13-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Enforce net epoch in in6_selectsrc().

in6_selectsrc() may call fib6_lookup() in some cases, which requires
epoch. Wrap in6_selectsrc* calls into epoch inside its users.
Mark it as requiring epoch by adding NET_EPOCH_ASSERT().

MFC after: 1 weeek
Differential Revision: https://reviews.freebsd.org/D28647


# 924d1c9a 08-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.


# 4a01b854 07-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

SO_RERROR indicates that receive buffer overflows should be handled as errors.
Historically receive buffer overflows have been ignored and programs
could not tell if they missed messages or messages had been truncated
because of overflows. Since programs historically do not expect to get
receive overflow errors, this behavior is not the default.

This is really really important for programs that use route(4) to keep in sync
with the system. If we loose a message then we need to reload the full system
state, otherwise the behaviour from that point is undefined and can lead
to chasing bogus bug reports.


# 0c325f53 18-Oct-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Implement flowid calculation for outbound connections to balance
connections over multiple paths.

Multipath routing relies on mbuf flowid data for both transit
and outbound traffic. Current code fills mbuf flowid from inp_flowid
for connection-oriented sockets. However, inp_flowid is currently
not calculated for outbound connections.

This change creates simple hashing functions and starts calculating hashes
for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the
system.

Reviewed by: glebius (previous version),ae
Differential Revision: https://reviews.freebsd.org/D26523


# b9555453 21-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Make ip6_output() and ip_output() require network epoch.

All callers that before may called into these functions
without network epoch now must enter it.


# f42347c3 07-Nov-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Remove unnecessary recursive epoch enter via INP_INFO_RLOCK
macro in raw input functions for IPv4 and IPv6. They shall
always run in the network epoch.


# 3af7f97c 08-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Revert changes to rip6_bind() from r353292. This function is always
called in syscall context, so it must enter epoch itself. This
changeset originates from early version of the patch, and somehow
slipped to the final version.

Reported by: pho


# b8a6e03f 07-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111


# 0ecd976e 02-Aug-2019 Bjoern A. Zeeb <bz@FreeBSD.org>

IPv6 cleanup: kernel

Finish what was started a few years ago and harmonize IPv6 and IPv4
kernel names. We are down to very few places now that it is feasible
to do the change for everything remaining with causing too much disturbance.

Remove "aliases" for IPv6 names which confusingly could indicate
that we are talking about a different data structure or field or
have two fields, one for each address family.
Try to follow common conventions used in FreeBSD.

* Rename sin6p to sin6 as that is how it is spelt in most places.
* Remove "aliases" (#defines) for:
- in6pcb which really is an inpcb and nothing separate
- sotoin6pcb which is sotoinpcb (as per above)
- in6p_sp which is inp_sp
- in6p_flowinfo which is inp_flow
* Try to use ia6 for in6_addr rather than in6p.
* With all these gone also rename the in6p variables to inp as
that is what we call it in most of the network stack including
parts of netinet6.

The reasons behind this cleanup are that we try to further
unify netinet and netinet6 code where possible and that people
will less ignore one or the other protocol family when doing
code changes as they may not have spotted places due to different
names for the same thing.

No functional changes.

Discussed with: tuexen (SCTP changes)
MFC after: 3 months
Sponsored by: Netflix


# fb288770 19-Apr-2019 Michael Tuexen <tuexen@FreeBSD.org>

When an IPv6 packet is received for a raw socket which has the
IPPROTO_IPV6 level socket option IPV6_CHECKSUM enabled and the
checksum check fails, drop the message. Without this fix, an
ICMP6 message was sent indicating a parameter problem.

Thanks to bz@ for suggesting a way to simplify this fix.

Reviewed by: bz@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D19969


# 70a0f3dc 19-Apr-2019 Michael Tuexen <tuexen@FreeBSD.org>

When a checksum has to be computed for a received IPv6 packet because it
is requested by the application using the IPPROTO_IPV6 level socket option
IPV6_CHECKSUM on a raw socket, ensure that the packet contains enough
bytes to contain the checksum at the specified offset.

Reported by: syzbot+6295fcc5a8aced81d599@syzkaller.appspotmail.com
Reviewed by: bz@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D19968


# ae7c65b1 19-Apr-2019 Michael Tuexen <tuexen@FreeBSD.org>

Avoid a buffer overwrite in rip6_output() when computing the checksum
as requested by the user via the IPPROTO_IPV6 level socket option
IPV6_CHECKSUM. The check if there are enough bytes in the packet to
store the checksum at the requested offset was wrong by 1.

Reviewed by: bz@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D19967


# a68cc388 08-Jan-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with: jtl, gallatin


# 1ff6e7a8 24-Oct-2018 Bjoern A. Zeeb <bz@FreeBSD.org>

rip6_input() inp validation after epoch(9)

After r335924 rip6_input() needs inp validation to avoid
working on FREED inps.

Apply the relevant bits from r335497,r335501 (rip_input() change)
to the IPv6 counterpart.

PR: 232194
Reviewed by: rgrimes, ae (,hps)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17594


# 6573d758 03-Jul-2018 Matt Macy <mmacy@FreeBSD.org>

epoch(9): allow preemptible epochs to compose

- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
there's no longer any benefit to dropping the pcbinfo lock
and trying to do so just adds an error prone branchfest to
these functions
- Remove cases of same function recursion on the epoch as
recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
thread as the tracker field is now stack or heap allocated
as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066


# b872626d 12-Jun-2018 Matt Macy <mmacy@FreeBSD.org>

mechanical CK macro conversion of inpcbinfo lists

This is a dependency for converting the inpcbinfo hash and info rlocks
to epoch.


# 4f6c66cc 23-May-2018 Matt Macy <mmacy@FreeBSD.org>

UDP: further performance improvements on tx

Cumulative throughput while running 64
netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15409


# 68e0e5a6 05-Feb-2018 Andrey V. Elsukov <ae@FreeBSD.org>

Modify ip6_get_prevhdr() to be able use it safely.

Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.

In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.

Reported by: Maxime Villard <max at m00nbsd dot net>
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14158


# 151ba793 24-Dec-2017 Alexander Kabaev <kan@FreeBSD.org>

Do pass removing some write-only variables from the kernel.

This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 3e85b721 16-May-2017 Ed Maste <emaste@FreeBSD.org>

Remove register keyword from sys/ and ANSIfy prototypes

A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.

ANSIfy related prototypes while here.

Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193


# dce33a45 05-Mar-2017 Ermal Luçi <eri@FreeBSD.org>

The patch provides the same socket option as Linux IP_ORIGDSTADDR.
Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.

The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.

This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.

Reviewed by: adrian, aw
Approved by: ae (mentor)
Sponsored by: rsync.net
Differential Revision: D9235


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# c10c5b1e 11-Feb-2017 Ermal Luçi <eri@FreeBSD.org>

Committed without approval from mentor.

Reported by: gnn


# ed55edce 09-Feb-2017 Ermal Luçi <eri@FreeBSD.org>

The patch provides the same socket option as Linux IP_ORIGDSTADDR.
Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.

The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.

This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.

Sponsored-by: rsync.net
Differential Revision: D9235
Reviewed-by: adrian


# fcf59617 06-Feb-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Merge projects/ipsec into head/.

Small summary
-------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
option IPSEC_SUPPORT added. It enables support for loading
and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
support was removed. Added TCP/UDP checksum handling for
inbound packets that were decapsulated by transport mode SAs.
setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
build as part of ipsec.ko module (or with IPSEC kernel).
It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
methods. The only one header file <netipsec/ipsec_support.h>
should be included to declare all the needed things to work
with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
- now all security associations stored in the single SPI namespace,
and all SAs MUST have unique SPI.
- several hash tables added to speed up lookups in SADB.
- SADB now uses rmlock to protect access, and concurrent threads
can do SA lookups in the same time.
- many PF_KEY message handlers were reworked to reflect changes
in SADB.
- SADB_UPDATE message was extended to support new PF_KEY headers:
SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
avoid locking protection for ipsecrequest. Now we support
only limited number (4) of bundled SAs, but they are supported
for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
check for full history of applied IPsec transforms.
o References counting rules for security policies and security
associations were changed. The proper SA locking added into xform
code.
o xform code was also changed. Now it is possible to unregister xforms.
tdb_xxx structures were changed and renamed to reflect changes in
SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by: gnn, wblock
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9352


# c3bef61e 15-Sep-2016 Kevin Lo <kevlo@FreeBSD.org>

Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.

Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D7878


# 155d72c4 15-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net* : for pointers replace 0 with NULL.

Mostly cosmetical, no functional change.

Found with devel/coccinelle.


# 601c0b8b 10-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().

in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
socket-less (ND code). The main reason for that change is inability to
specify non-default FIB for callers w/o socket since (internally) inpcb
is used to determine fib.

As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
pass IPv6 address w/ explicitly specified scope as separate argument.

Reviewed by: ae (previous version)


# 357ce739 03-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove 'struct route_int6' argument from in6_selectsrc() and
in6_selectif().

The main task of in6_selectsrc() is to return IPv6 SAS (along with
output interface used for scope checks). No data-path code uses
route argument for caching. The only users are icmp6 (reflect code),
ND6 ns/na generation code. All this fucntions are control-plane, so
there is no reason to try to 'optimize' something by passing cached
route into to ip6_output(). Given that, simplify code by eliminating
in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is
used only by in6_selectsrc(), eliminate its 'struct route_in6' argument,
too. While here, reshape rte-related code inside in6_selectif() to
free lookup result immediately after saving all the needed fields.


# 44eb8bbe 11-Dec-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Do not count security policy violation twice.
ipsec*_in_reject() do this by their own.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 603eaf79 09-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Renove faith(4) and faithd(8) from base. It looks like industry
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.

No objections from: net@


# 257480b8 04-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert netinet6/ to use new routing API.

* Remove &ifpp from ip6_output() in favor of ri->ri_nh_info
* Provide different wrappers to in6_selectsrc:
Currently it is used by 2 differenct type of customers:
- socket-based one, which all are unsure about provided
address scope and
- in-kernel ones (ND code mostly), which don't have
any sockets, options, crededentials, etc.
So, we provide two different wrappers to in6_selectsrc()
returning select source.
* Make different versions of selectroute():
Currenly selectroute() is used in two scenarios:
- SAS, via in6_selecsrc() -> in6_selectif() -> selectroute()
- output, via in6_output -> wrapper -> selectroute()
Provide different versions for each customer:
- fib6_lookup_nh_basic()-based in6_selectif() which is
capable of returning interface only, without MTU/NHOP/L2
calculations
- full-blown fib6_selectroute() with cached route/multipath/
MTU/L2
* Stop using routing table for link-local address lookups
* Add in6_ifawithifp_lla() to make for-us check faster for link-local
* Add in6_splitscope / in6_setllascope for faster embed/deembed scopes


# 8f5a8818 07-Aug-2014 Kevin Lo <kevlo@FreeBSD.org>

Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric: D476
Reviewed by: jhb


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 6794f460 23-Jul-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by: Taku YAMAMOTO, Maciej Milewski


# a786f679 09-Jul-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters.


# 6659296c 20-Jun-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics
accounting.

MFC after: 2 weeks


# f1d7ebfe 19-Jun-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use RIP6STAT_INC() macro for raw ip6 statistics accounting.

MFC after: 2 weeks


# 9cb8d207 09-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.

MFC after: 1 week


# b7c896c9 11-Mar-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Take the inpcb rlock before calculating checksum, it was accidentally
moved in r191672.

Obtained from: Yandex LLC
MFC after: 1 week


# eb1b1807 05-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# ee799639 03-Feb-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Add SO_SETFIB option support on PF_INET6 sockets and allow inheriting the
FIB number from the process, as set by setfib(2), on socket creation.

Sponsored by: Cisco Systems, Inc.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 365ccde0 20-Aug-2010 Hajimu UMEMOTO <ume@FreeBSD.org>

optp may be NULL.


# 1db8d1f8 19-Aug-2010 Ana Kukec <anchie@FreeBSD.org>

MFp4: anchie_soc2009 branch:

Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.

The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.

Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space. The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.

The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.

Approved by: bz (mentor)


# 388288b2 17-Aug-2010 Hajimu UMEMOTO <ume@FreeBSD.org>

Make `ping6 -I' work with net.inet6.ip6.use_defaultzone=1.

MFC after: 2 weeks


# 8c09aa57 14-Aug-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

In rip6_input(), in case of multicast, we might skip the normal processing
and go to the next iteration early if multicast filtering would decide that
this socket shall not receive the data.
Unlock the pcb in that case or we leak the read lock and next time trying
to get a write lock, would hang forever.

PR: kern/149608
Submitted by: Chris Luke (chrisy flirble.org)
MFC after: 3 days


# 480d7c6c 06-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH


# 7f6b24dc 02-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207277:
Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by: (ten 211.ru)
Tested by: (ten 211.ru)


# 82cea7e6 29-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# 7a657e63 27-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.

This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.

Reported by: (ten 211.ru)
Tested by: (ten 211.ru)
MFC after: 4 days


# 950cde50 28-Dec-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r200473:

Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.

Discussed with: julian, zec, jamie, rwatson (back in Sept)


# de0bd6f7 13-Dec-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.

Discussed with: julian, zec, jamie, rwatson (back in Sept)
MFC after: 5 days


# 28d8e43c 20-Nov-2009 Bruce M Simpson <bms@FreeBSD.org>

MFC r199518:
Adapt the fix for IGMPv2 in r199287 for the IPv6 stack.
Only multicast routing is affected by the issue.


# 7ab5a5cd 19-Nov-2009 Bruce M Simpson <bms@FreeBSD.org>

Adapt the fix for IGMPv2 in r199287 for the IPv6 stack.
Only multicast routing is affected by the issue.

MFC after: 1 day


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 1e77c105 16-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 88d166bf 23-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory
to save the selected source address rather than returning an
unreferenced copy to a pointer that might long be gone by the
time we use the pointer for anything meaningful.

Asked for by: rwatson
Reviewed by: rwatson


# 8c0fec80 23-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


# e40bae9a 21-Jun-2009 Roman Divacky <rdivacky@FreeBSD.org>

Switch cmd argument to u_long. This matches what if_ethersubr.c does and
allows the code to compile cleanly on amd64 with clang.

Reviewed by: rwatson
Approved by: ed (mentor)


# 8d8bc018 08-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


# f6dfe47a 30-Apr-2009 Marko Zec <zec@FreeBSD.org>

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)


# 33cde130 29-Apr-2009 Bruce M Simpson <bms@FreeBSD.org>

Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:

* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.

NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.


# e27b0c87 12-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel. This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after: 3 days


# 33553d6e 27-Feb-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


# b89e82dd 05-Feb-2009 Jamie Gritton <jamie@FreeBSD.org>

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


# 1cecba0f 25-Jan-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by: jamie (as part of a larger patch)
MFC after: 1 week


# 97590249 17-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Another step assimilating IPv[46] PCB code:
normalize IN6P_* compat flags usage to their equialent
INP_* counterpart.

Discussed with: rwatson
Reviewed by: rwatson
MFC after: 4 weeks


# 75bab8b8 15-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

check pointers against NULL


# aba53ef0 15-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

convert more pointer validation checks to checking against NULL


# fc384fa5 15-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().

Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.

Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)


# 4b79449e 02-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 413628a7 29-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


# 6aee2fc5 26-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Merge in6_pcbfree() into in_pcbfree() which after the previous
IPsec change in r185366 only differed in two additonal IPv6 lines.
Rather than splattering conditional code everywhere add the v6
check centrally at this single place.

Reviewed by: rwatson (as part of a larger changset)
MFC after: 6 weeks (*)
(*) possibly need to leave a stub wrapper in 7 to keep the symbol.


# 97021c24 26-Nov-2008 Marko Zec <zec@FreeBSD.org>

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 0206cdb8 26-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove in6_pcbdetach() as it is exactly the same function
as in_pcbdetach() and we don't need the code twice.

Reviewed by: rwatson
MFC after: 6 weeks (*)
(*) possibly need to leave a stub wrapper in 7 to keep the symbol.


# 44e33a07 19-Nov-2008 Marko Zec <zec@FreeBSD.org>

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# e11e3f18 23-Oct-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.


# 1ede983c 23-Oct-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 2209e8f1 30-Jul-2008 Robert Watson <rwatson@FreeBSD.org>

Adopt the slightly weaker consistency locking approach used in IPv4 raw
sockets for IPv6 raw sockets: separately lock the inpcb for determining
the destination address for a connect()'d raw socket at the rip6_send()
layer, and then re-acquire the inpcb lock in the rip6_output() layer to
query other options on the socket. Previously, the global raw IP socket
lock was used, which while correct and marginally more consistent, could
add significantly to global raw IP socket lock contention.

MFC after: 1 week


# 2f1ff0cd 29-Jul-2008 Robert Watson <rwatson@FreeBSD.org>

Since we fail IPv6 raw socket allocation if inp->in6p_icmp6filt can't
be allocated, there's no need to conditionize use and freeing of it
later.

MFC after: 1 week


# 6c5bbf5c 28-Jul-2008 Alexander Motin <mav@FreeBSD.org>

Move inpcb lock higher to protect some nonbinding fields reading.
It fixes nothing at this time, but decided to be more correct.


# b11e21ae 27-Jul-2008 Alexander Motin <mav@FreeBSD.org>

According to in_pcb.h protocol binding information has double locking.
It allows access it while list travercing holding only global pcbinfo lock.


# f2f877d3 08-Jul-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Change the parameters to in6_selectsrc():
- pass in the inp instead of both in6p_moptions and laddr.
- pass in cred for upcoming prison checks.

Reviewed by: rwatson


# 0ae76120 05-Jul-2008 Robert Watson <rwatson@FreeBSD.org>

Improve approximation of style(9) in raw socket code.


# 9ad11dd8 20-Apr-2008 Robert Watson <rwatson@FreeBSD.org>

With IPv4 raw sockets, read lock rather than write lock the inpcb when
receiving or transmitting.

With IPv6 raw sockets, read lock rather than write lock the inpcb when
receiving. Unfortunately, IPv6 source address selection appears to
require a write lock on the inpcb for the time being.

MFC after: 3 months


# 8501a69c 17-Apr-2008 Robert Watson <rwatson@FreeBSD.org>

Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after: 3 months
Tested by: kris (superset of committered patch)


# 79ba3952 24-Jan-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Replace the last susers calls in netinet6/ with privilege checks.

Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by: rwatson (older version, before addressing his comments)


# 9233d8f3 08-Jan-2008 David E. O'Brien <obrien@FreeBSD.org>

un-__P()


# b48287a3 10-Dec-2007 David E. O'Brien <obrien@FreeBSD.org>

Clean up VCS Ids.


# 2a463222 05-Jul-2007 Xin LI <delphij@FreeBSD.org>

Space cleanup

Approved by: re (rwatson)


# 1272577e 05-Jul-2007 Xin LI <delphij@FreeBSD.org>

ANSIfy[1] plus some style cleanup nearby.

Discussed with: gnn, rwatson
Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by: re (rwatson)


# b2630c29 02-Jul-2007 George V. Neville-Neil <gnn@FreeBSD.org>

Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing


# 2cb64cb2 01-Jul-2007 George V. Neville-Neil <gnn@FreeBSD.org>

Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by: bz
Approved by: re
Supported by: Secure Computing


# c2259ba4 13-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Include priv.h to pick up suser(9) definitions, missed in an earlier
commit.

Warnings spotted by: kris


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# 54d642bb 11-May-2007 Robert Watson <rwatson@FreeBSD.org>

Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.


# 1291e2a0 24-Feb-2007 Bruce M Simpson <bms@FreeBSD.org>

Fix tinderbox. ip6_mrouter should be defined in raw_ip6.c as it is
tested to determine if the userland socket is open; this, in turn, is
used to determine if the module has been loaded.

Tested with: LINT


# 6be2e366 24-Feb-2007 Bruce M Simpson <bms@FreeBSD.org>

Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).


# a152f8a3 21-Jul-2006 Robert Watson <rwatson@FreeBSD.org>

Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket. pru_abort is now a
notification of close also, and no longer detaches. pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket. This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree(). With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by: gnn


# d915b280 18-Jul-2006 Stephan Uphoff <ups@FreeBSD.org>

Fix race conditions on enumerating pcb lists by moving the initialization
( and where appropriate the destruction) of the pcb mutex to the init/finit
functions of the pcb zones.
This allows locking of the pcb entries and race condition free comparison
of the generation count.
Rearrange locking a bit to avoid extra locking operation to update the generation
count in in_pcballoc(). (in_pcballoc now returns the pcb locked)

I am planning to convert pcb list handling from a type safe to a reference count
model soon. ( As this allows really freeing the PCBs)

Reviewed by: rwatson@, mohans@
MFC after: 1 week


# 1e0acb68 25-Jun-2006 Robert Watson <rwatson@FreeBSD.org>

Use suser_cred() instead of a direct comparison of cr_uid with 0 in
rip6_output().

MFC after: 1 week


# ff7425ce 11-Apr-2006 Robert Watson <rwatson@FreeBSD.org>

Don't use spl around call to in_pcballoc() in IPv6 raw socket support;
all necessary synchronization appears present.

MFC after: 3 months


# 14ba8add 01-Apr-2006 Robert Watson <rwatson@FreeBSD.org>

Update in_pcb-derived basic socket types following changes to
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.

- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.

- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.

MFC after: 3 months


# bc725eaf 01-Apr-2006 Robert Watson <rwatson@FreeBSD.org>

Chance protocol switch method pru_detach() so that it returns void
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.

MFC after: 3 months


# ac45e92f 01-Apr-2006 Robert Watson <rwatson@FreeBSD.org>

Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful. Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit. This will be corrected shortly in followup
commits to these components.

MFC after: 3 months


# 4350fcab 18-Oct-2005 SUZUKI Shinsuke <suz@FreeBSD.org>

Raw IPv6 checksum must use the protocol number of the last header, instead of the first next-header value.

Obtained from: KAME
MFC after: 1 day


# 971b154c 15-Oct-2005 SUZUKI Shinsuke <suz@FreeBSD.org>

added a missing unlock

Submitted by: JINMEI Tatuya
MFC After: 1 day


# 2af9b919 18-Aug-2005 SUZUKI Shinsuke <suz@FreeBSD.org>

added a missing unlock (just do the same thing as in netinet/raw_ip.c)

Obtained from: KAME
MFC after: 3 days


# a1f7e5f8 24-Jul-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME


# 885adbfa 21-Jul-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

always copy ip6_pktopt. remove needcopy and needfree
argument/structure member accordingly.

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


# e07db7aa 21-Jul-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

simplified udp6_output() and rip6_output(): do not override
in6p_outputopts at the entrance of the functions. this trick was
necessary when we passed an in6 pcb to in6_embedscope(), within which
the in6p_outputopts member was used, but we do not use this kind of
interface any more.

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


# d5e3406d 21-Jul-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

be consistent on naming advanced API functions; use ip6_XXXpktopt(s).

Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME


# 8a9d54df 28-Mar-2005 Sam Leffler <sam@FreeBSD.org>

check for malloc failure (also move malloc up to simplify error recovery)

Noticed by: Coverity Prevent analysis tool
Reviewed by: gnn


# 87609341 06-Feb-2005 Robert Watson <rwatson@FreeBSD.org>

Remove a comment from the raw IPv6 output function regarding
M_TRYWAIT allocations: M_PREPEND() now uses M_DONTWAIT.

MFC after: 3 days


# caf43b02 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes, separate for KAME


# 756d52a1 08-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.


# 0b7851fa 02-Sep-2004 Robert Watson <rwatson@FreeBSD.org>

Unlock rather than lock the ripcbinfo lock at the end of rip6_input().

RELENG_5 candidate.

Foot provided by: Patrick Guelat <pg at imp dot ch>


# 8a0c4da8 12-Aug-2004 Robert Watson <rwatson@FreeBSD.org>

When allocating the IPv6 header to stick in front of raw packet being
sent via a raw IPv6 socket, use M_DONTWAIT not M_TRYWAIT, as we're
holding the raw pcb mutex.

Reported, tested by: kuriyama


# f31f65a7 05-Aug-2004 Robert Watson <rwatson@FreeBSD.org>

Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead
structures, allowing in6_pcbnotify() to lock the pcbinfo and each
inpcb that it notifies of ICMPv6 events. This prevents inpcb
assertions from firing when IPv6 generates and delievers event
notifications for inpcbs.

Reported by: kuriyama
Tested by: kuriyama


# 07385abd 27-Jul-2004 Robert Watson <rwatson@FreeBSD.org>

Commit a first pass at in6pcb and pcbinfo locking for IPv6,
synchronizing IPv6 protocol control blocks and lists. These changes
are modeled on the inpcb locking for IPv4, submitted by Jennifer Yang,
and committed by Jeffrey Hsu. With these locking changes, IPv6 use of
inpcbs is now substantially more MPSAFE, and permits IPv4 inpcb locking
assertions to be run in the presence of IPv6 compiled into the kernel.


# f36cfd49 07-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 6823b823 27-Mar-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove unused argument.

Reviewed by: ume


# da0f4099 17-Feb-2004 Hajimu UMEMOTO <ume@FreeBSD.org>

IPSEC and FAST_IPSEC have the same internal API now;
so merge these (IPSEC has an extra ipsecstat)

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>


# efddf5c6 13-Feb-2004 Hajimu UMEMOTO <ume@FreeBSD.org>

supported IPV6_RECVPATHMTU socket option.

Obtained from: KAME


# f073c60f 03-Feb-2004 Hajimu UMEMOTO <ume@FreeBSD.org>

pass pcb rather than so. it is expected that per socket policy
works again.


# 5bd311a5 25-Nov-2003 Sam Leffler <sam@FreeBSD.org>

Split the "inp" mutex class into separate classes for each of divert,
raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness
complaints.

Reviewed by: rwatson
Approved by: re (rwatson)


# fc906dd8 20-Nov-2003 Andre Oppermann <andre@FreeBSD.org>

Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


# a557af22 17-Nov-2003 Robert Watson <rwatson@FreeBSD.org>

Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 0f9ade71 04-Nov-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.

Tested by: nork
Obtained from: KAME


# af12e09e 02-Nov-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

rename variables.

Obtained from: KAME


# 11de19f4 28-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

ip6_savecontrol() argument is redundant


# 02b9a206 26-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

re-add wrongly disappered IPV6_CHECKSUM stuff by introducing
ip6_raw_ctloutput().

Obtained from: KAME


# f95d4633 24-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542
(aka RFC2292bis). Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.

Obtained from: KAME


# 07eb2995 09-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- typo in comment
- style
- ANSIfy
(there is no functional change.)

Obtained from: KAME


# 40e39bbb 06-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

return(code) -> return (code)
(reduce diffs against KAME)


# 340c35de 19-Feb-2003 Jonathan Lemon <jlemon@FreeBSD.org>

Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block. Allow the socket and tcpcb structures to be freed
earlier than inpcb. Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# b9234faf 15-Oct-2002 Sam Leffler <sam@FreeBSD.org>

Tie new "Fast IPsec" code into the build. This involves the usual
configuration stuff as well as conditional code in the IPv4 and IPv6
areas. Everything is conditional on FAST_IPSEC which is mutually
exclusive with IPSEC (KAME IPsec implmentation).

As noted previously, don't use FAST_IPSEC with INET6 at the moment.

Reviewed by: KAME, rwatson
Approved by: silence
Supported by: Vernier Networks


# 5d846453 15-Oct-2002 Sam Leffler <sam@FreeBSD.org>

Replace aux mbufs with packet tags:

o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
inpcb parameter to ip_output and ip6_output to allow the IPsec code to
locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by: julian, luigi (silent), -arch, -net, darren
Approved by: julian, silence from everyone else
Obtained from: openbsd (mostly)
MFC after: 1 month


# 3ce144ea 14-Jun-2002 Jeffrey Hsu <hsu@FreeBSD.org>

Notify functions can destroy the pcb, so they have to return an
indication of whether this happenned so the calling function
knows whether or not to unlock the pcb.

Submitted by: Jennifer Yang (yangjihui@yahoo.com)
Bug reported by: Sid Carter (sidcarter@symonds.net)


# 4cc20ab1 31-May-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by: hsu


# 243917fe 19-May-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

- so_count
- so_options
- so_linger
- so_state

o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:

- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()

Reviewed by: alfred


# 960ed29c 29-Apr-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.

Requested by: bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# c4c0c592 26-Feb-2002 Alfred Perlstein <alfred@FreeBSD.org>

Fix warnings caused by discarding const.

Hairy Eyeball At: peter


# 9494d596 25-Sep-2001 Brooks Davis <brooks@FreeBSD.org>

Make faith loadable, unloadable, and clonable.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# f9132ceb 05-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


# 13cf67f3 26-Jul-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

move ipsec security policy allocation into in_pcballoc, before
making pcbs available to the outside world. otherwise, we will see
inpcb without ipsec security policy attached (-> panic() in ipsec.c).

Obtained from: KAME
MFC after: 3 days


# 33841545 10-Jun-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 2a0c503e 21-Dec-2000 Bosko Milekic <bmilekic@FreeBSD.org>

* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.


# 46aa3347 27-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


# 686cdd19 04-Jul-2000 Jun-ichiro itojun Hagino <itojun@FreeBSD.org>

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)


# e7e3ecb6 27-Feb-2000 Yoshinobu Inoue <shin@FreeBSD.org>

At detaching IPv6 raw socket, also finish IPv6 multicast router.

Approved by: jkh

Submitted by: fenner


# 91ec0a1e 27-Jan-2000 Yoshinobu Inoue <shin@FreeBSD.org>

Sorry I didn't commit these files at the commit just a few minutes before.
(IPv6 multicast routing)
I think I mistakenly touched TAB and the last arg sys/netinet6 to
the cvs commit changed to sys/netinet6/in6_proto.c.


# 6a800098 22-Dec-1999 Yoshinobu Inoue <shin@FreeBSD.org>

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# cfa1ca9d 07-Dec-1999 Yoshinobu Inoue <shin@FreeBSD.org>

udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# e1da8747 22-Nov-1999 Yoshinobu Inoue <shin@FreeBSD.org>

Removed IPSEC and IPV6FIREWALL because they are not ready yet.


# 82cd038d 21-Nov-1999 Yoshinobu Inoue <shin@FreeBSD.org>

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project