History log of /freebsd-current/sys/netinet6/in6_src.c
Revision Date Author Comments
# 60d8dbbe 18-Jan-2024 Kristof Provost <kp@FreeBSD.org>

netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters

When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.

Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).

For example:
dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'

This can be disabled by setting the following kernel option:
options KDTRACE_NO_MIB_SDT

Reviewed by: gallatin, tuexen (previous version), gnn (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43504


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 3d0d5b21 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Explicitly include <net/if_private.h> in netstack

Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200


# 2756774c 04-Jul-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

netinet6: simplify selectroute()

Effectively selectroute() addresses two different cases:
providing interface info for multicast destinations and providing
nexthop data for unicast ones. Current implementation intertwines
handling of both cases, especially in the error handling part.
Factor out all route lookup logic in a separate function,
lookup_route() to simplify the code.
Ensure consistent KPI: no error means *retifp is set and otherwise.

Differential Revision: https://reviews.freebsd.org/D35711
MFC after: 2 weeks


# 81a235ec 04-Jul-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

netinet6: factor out cached route lookups from selectroute().

Currently selectroute() contains two nearly-identical versions of
the route lookup logic - one for original destination and another
for the case when IPV6_NEXTHOP option was set on the socket.

Factor out handling these route lookups in a separation function to
improve readability.
This change also fixes handling of link-local IPV6_NEXTHOPs.

Differential Revision: https://reviews.freebsd.org/D35710
MFC after: 2 weeks


# 147f018a 18-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Move in6_pcbsetport() to in6_pcb.c

This function was originally carved out of in6_pcbbind(), which
is in in6_pcb.c. This function also uses KPI private to the PCB
database - in_pcb_lport().


# 8268d82c 15-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove per-packet ifa refcounting from IPv6 fast path.

Currently ip6_input() calls in6ifa_ifwithaddr() for
every local packet, in order to check if the target ip
belongs to the local ifa in proper state and increase
its counters.

in6ifa_ifwithaddr() references found ifa.
With epoch changes, both `ip6_input()` and all other current callers
of `in6ifa_ifwithaddr()` do not need this reference
anymore, as epoch provides stability guarantee.

Given that, update `in6ifa_ifwithaddr()` to allow
it to return ifa without referencing it, while preserving
option for getting referenced ifa if so desired.

MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28648


# 605284b8 13-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Enforce net epoch in in6_selectsrc().

in6_selectsrc() may call fib6_lookup() in some cases, which requires
epoch. Wrap in6_selectsrc* calls into epoch inside its users.
Mark it as requiring epoch by adding NET_EPOCH_ASSERT().

MFC after: 1 weeek
Differential Revision: https://reviews.freebsd.org/D28647


# 1440f622 20-Sep-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove unused nhop_ref_any() function.
Remove "opt_mpath.h" header where not needed.

No functional changes.


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 6ad7446c 02-Jul-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().

fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.

With no callers remaining, remove fib[46]_lookup_nh_ functions.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445


# 49c9f84f 26-Apr-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix IPv6 link-local operations with RADIX_MPATH.

It was broken by r360292 as fib6_lookup() assumes de-embedded addresses
while rtalloc_mpath_fib() requires sockaddr with embedded ones.

New fib6_lookup() transparently supports multipath, hence
remove old RADIX_MPATH condition.


# 983066f0 25-Apr-2020 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert route caching to nexthop caching.

This change is build on top of nexthop objects introduced in r359823.

Nexthops are separate datastructures, containing all necessary information
to perform packet forwarding such as gateway interface and mtu. Nexthops
are shared among the routes, providing more pre-computed cache-efficient
data while requiring less memory. Splitting the LPM code and the attached
data solves multiple long-standing problems in the routing layer,
drastically reduces the coupling with outher parts of the stack and allows
to transparently introduce faster lookup algorithms.

Route caching was (re)introduced to minimise (slow) routing lookups, allowing
for notably better performance for large TCP senders. Caching works by
acquiring rtentry reference, which is protected by per-rtentry mutex.
If the routing table is changed (checked by comparing the rtable generation id)
or link goes down, cache record gets withdrawn.

Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
cached object, which is mostly mechanical. Other moving parts like cache
cleanup on rtable change remains the same.

Differential Revision: https://reviews.freebsd.org/D24340


# 8483fce6 03-Mar-2020 Bjoern A. Zeeb <bz@FreeBSD.org>

ip6: retire in6_selectroute_fib() as promised 8 years ago

In r231852 I added in6_selectroute_fib() as a compat function with the
fibnum as an extra argument compared to in6_selectroute() to keep the
KPI stable.
Way too late retire this function again and add the fib to in6_selectroute()
which also only has a single consumer now and was an orphan function before.


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 0ecd976e 02-Aug-2019 Bjoern A. Zeeb <bz@FreeBSD.org>

IPv6 cleanup: kernel

Finish what was started a few years ago and harmonize IPv6 and IPv4
kernel names. We are down to very few places now that it is feasible
to do the change for everything remaining with causing too much disturbance.

Remove "aliases" for IPv6 names which confusingly could indicate
that we are talking about a different data structure or field or
have two fields, one for each address family.
Try to follow common conventions used in FreeBSD.

* Rename sin6p to sin6 as that is how it is spelt in most places.
* Remove "aliases" (#defines) for:
- in6pcb which really is an inpcb and nothing separate
- sotoin6pcb which is sotoinpcb (as per above)
- in6p_sp which is inp_sp
- in6p_flowinfo which is inp_flow
* Try to use ia6 for in6_addr rather than in6p.
* With all these gone also rename the in6p variables to inp as
that is what we call it in most of the network stack including
parts of netinet6.

The reasons behind this cleanup are that we try to further
unify netinet and netinet6 code where possible and that people
will less ignore one or the other protocol family when doing
code changes as they may not have spotted places due to different
names for the same thing.

No functional changes.

Discussed with: tuexen (SCTP changes)
MFC after: 3 months
Sponsored by: Netflix


# b1536a81 24-May-2019 Andrey V. Elsukov <ae@FreeBSD.org>

Restore IPV6_NEXTHOP option support that seem was partially broken
since r286195.

Do not forget results of route lookup and initialize rt and ifp pointers.

PR: 238098
Submitted by: Masse Nicolas <nicolas.masse at stormshield eu>
MFC after: 1 week


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# 1a43cff9 06-Jun-2018 Sean Bruno <sbruno@FreeBSD.org>

Load balance sockets with new SO_REUSEPORT_LB option.

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by: Johannes Lundberg <johalun0@gmail.com>
Obtained from: DragonflyBSD
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003


# d7c5a620 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366


# 7875017c 24-Apr-2018 Sean Bruno <sbruno@FreeBSD.org>

Revert r332894 at the request of the submitter.

Submitted by: Johannes Lundberg <johalun0_gmail.com>
Sponsored by: Limelight Networks


# 7b7796ee 23-Apr-2018 Sean Bruno <sbruno@FreeBSD.org>

Load balance sockets with new SO_REUSEPORT_LB option

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by: Johannes Lundberg <johanlun0@gmail.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003


# 151ba793 24-Dec-2017 Alexander Kabaev <kan@FreeBSD.org>

Do pass removing some write-only variables from the kernel.

This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 559b4296 17-Mar-2017 Alan Somers <asomers@FreeBSD.org>

Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0

sys/netinet6/icmp6.c
Use the interface's FIB for source address selection in ICMPv6 error
responses.

sys/netinet6/in6.c
In in6_newaddrmsg, announce arrival of local addresses on the
interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6
cache instead of a single cache.

sys/netinet6/in6_src.c
In in6_selectsrc, use the caller's fib instead of the default fib.
In in6_selectsrc_socket, remove a superfluous check.

sys/netinet6/nd6.c
In nd6_lle_event, use the interface's fib for routing socket
messages. In nd6_is_new_addr_neighbor, check all FIBs when trying
to determine whether an address is a neighbor. Also, simplify the
code for point to point interfaces.

sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
Make defrouter_select fib-aware, and make all of its callers pass in
the interface fib.

sys/netinet6/nd6_nbr.c
When inputting a Neighbor Solicitation packet, consider the
interface fib instead of the default fib for DAD. Output NS and
Neighbor Advertisement packets on the correct fib.

sys/netinet6/nd6_rtr.c
Allow installing the same host route on different interfaces in
different FIBs. If rt_add_addr_allfibs=0, only install or delete
the prefix route on the interface fib.

tests/sys/netinet/fibs_test.sh
Clear some expected failures, but add a skip for the newly revealed
BUG217871.

PR: 196361
Submitted by: Erick Turnquist <jhujhiti@adjectivism.org>
Reported by: Jason Healy <jhealy@logn.net>
Reviewed by: asomers
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9451


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 4616026f 09-Feb-2017 Ermal Luçi <eri@FreeBSD.org>

Revert r313527

Heh svn is not git


# c0fadfdb 09-Feb-2017 Ermal Luçi <eri@FreeBSD.org>

Correct missed variable name.

Reported-by: ohartmann@walstatt.org


# 0b519fe4 08-Apr-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Fix regression introduced in r296986.

Currently we don't keep zoneid in in6_ifaddr structure, because there
is still some code, that doesn't properly initialize sin6_scope_id,
but some functions use sa_equal() for addresses comparison. sa_equal()
compares full sockaddr_in6 structures and such comparison will fail.
For now use zero zoneid in in6ifa_ifwithaddr(). It is safe, because
used address is in embedded form. In future we will use zoneid, so mark it
with XXX comment.

Reported by: kp
Tested by: kp


# 3e16fab3 17-Mar-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Reduce the number of local variables. Remove redundant check that inp
pointer isn't NULL, it is safe, because we are handling IPV6_PKTINFO
socket option in this block of code. Also, use in6ifa_withaddr() instead
of ifa_withaddr().


# 04894769 17-Mar-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Change in6_selectsrc() to allow usage of non-local IPv6 addresses in
IPV6_PKTINFO ancillary data when IPV6_BINDANY socket option is set.

Submitted by: n_hibma
MFC after: 2 weeks


# 601c0b8b 10-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().

in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
socket-less (ND code). The main reason for that change is inability to
specify non-default FIB for callers w/o socket since (internally) inpcb
is used to determine fib.

As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
pass IPv6 address w/ explicitly specified scope as separate argument.

Reviewed by: ae (previous version)


# 357ce739 03-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove 'struct route_int6' argument from in6_selectsrc() and
in6_selectif().

The main task of in6_selectsrc() is to return IPv6 SAS (along with
output interface used for scope checks). No data-path code uses
route argument for caching. The only users are icmp6 (reflect code),
ND6 ns/na generation code. All this fucntions are control-plane, so
there is no reason to try to 'optimize' something by passing cached
route into to ip6_output(). Given that, simplify code by eliminating
in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is
used only by in6_selectsrc(), eliminate its 'struct route_in6' argument,
too. While here, reshape rte-related code inside in6_selectif() to
free lookup result immediately after saving all the needed fields.


# 9977be4a 09-Dec-2015 Alexander V. Chernikov <melifaro@FreeBSD.org>

Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
return IPv6 destination address with embedded scope. Currently
rw_gateway has it scope embedded, do the same for non-gatewayed
destinations.

Sponsored by: Yandex LLC


# 51a01baf 01-Aug-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Properly handle IPV6_NEXTHOP socket option in selectroute().

o remove disabled code;
o if nexthop address is link-local, use embedded scope zone id to
determine outgoing interface;
o properly fill ro_dst before doing route lookup;
o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.

Sponsored by: Yandex LLC


# a6f7dea1 02-Aug-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Remove redundant check.


# cc0a3c8c 29-Jul-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.

Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149


# f9723c77 20-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp
we need to return.
Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags
fields for all relevant functions.


# 45d1880a 10-Nov-2014 Andrey V. Elsukov <ae@FreeBSD.org>

For now handle only multicast addresses, we still use routes to
LLA unicasts yet.

Sponsored by: Yandex LLC


# ea455de9 09-Nov-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Use embedded scope zone id to determine outgoing interface for link-local
and node-local addresses.


# 257480b8 04-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert netinet6/ to use new routing API.

* Remove &ifpp from ip6_output() in favor of ri->ri_nh_info
* Provide different wrappers to in6_selectsrc:
Currently it is used by 2 differenct type of customers:
- socket-based one, which all are unsure about provided
address scope and
- in-kernel ones (ND code mostly), which don't have
any sockets, options, crededentials, etc.
So, we provide two different wrappers to in6_selectsrc()
returning select source.
* Make different versions of selectroute():
Currenly selectroute() is used in two scenarios:
- SAS, via in6_selecsrc() -> in6_selectif() -> selectroute()
- output, via in6_output -> wrapper -> selectroute()
Provide different versions for each customer:
- fib6_lookup_nh_basic()-based in6_selectif() which is
capable of returning interface only, without MTU/NHOP/L2
calculations
- full-blown fib6_selectroute() with cached route/multipath/
MTU/L2
* Stop using routing table for link-local address lookups
* Add in6_ifawithifp_lla() to make for-us check faster for link-local
* Add in6_splitscope / in6_setllascope for faster embed/deembed scopes


# 30514718 25-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Convert several places inside netinet6/ to new api.


# 5dbfa43f 09-Sep-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Add the ability to set `prefer_source' flag to an IPv6 address.
It affects the IPv6 source address selection algorithm (RFC 6724)
and allows override the last rule ("longest matching prefix") for
choosing among equivalent addresses. The address with `prefer_source'
will be preferred source address.

Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC


# ccc53de9 01-Sep-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Add the reverse part to rule #9. Also change its description in the
netstat(8) output.

MFC after: 1 week


# d6e6b994 30-Jul-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Add new rule to source address selection algorithm. It prefers address
with better virtual status. Use ifa_preferred() to choose better address.

PR: 187341
Tested by: des
MFC after: 1 week


# 1dc8f6a8 08-Jan-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code
to do IPv6 addr & mask in different places.

MFC after: 2 weeks


# b88aef1d 02-Jan-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Use pointer to struct sockaddr_in6 in lla_lookup() call.
This prevents from triggering KASSERT in in6_lltable_lookup.


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 4ff7c740 16-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Fix accounting after the r249528, also add several another counters to
the statistics.


# e7a87117 15-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

The source address selection algorithm tries to apply several rules
for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics,
even if given rule did not won. Change this and take into account only
those rules, that won. Also add accounting for cases, when algorithm
fails to select an address.


# ecc5c738 12-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Free memory after deleting an address policy entry.

MFC after: 1 week


# 9cb8d207 09-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.

MFC after: 1 week


# 164051ce 09-Jan-2013 Hajimu UMEMOTO <ume@FreeBSD.org>

Add no_prefer_iface option.
It stops treating the address on the interface as special by source
address selection rule even when the interface is outgoing interface.
This is desired in some situation.

Requested by: hrs
Reviewed by: IHANet folks including hrs
MFC after: 1 week


# 73cb2f38 15-Nov-2012 Andrey V. Elsukov <ae@FreeBSD.org>

Reduce the overhead of locking, use IF_AFDATA_RLOCK() when we are doing
simple lookups.

Sponsored by: Yandex LLC
MFC after: 1 week


# f1e0de69 12-Nov-2012 David E. O'Brien <obrien@FreeBSD.org>

Use consistent style.


# 6f56329a 22-Oct-2012 Xin LI <delphij@FreeBSD.org>

Remove __P.

Submitted by: kevlo
Reviewed by: md5(1)
MFC after: 2 months


# 23bc7025 22-Jun-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Just add a comment to further investigate when being closer to that code
again next time. The condition of the 2nd if() is very unlikely ever met.


# f8315b5f 24-May-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4 bz_ipv6_fast:

Simplify the code removing a return from an earlier else case,
not differing from the default function return called now.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


# 5aa7e8ed 24-Feb-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

In selectroute() add a missing fibnum argument to an in6_rtalloc()
call in an #if 0 section.

In in6_selecthlim() optimize a case where in6p cannot be NULL due to an
earlier check.

More consistently use u_int instead of int for fibnum function arguments.

Sponsored by: Cisco Systems, Inc.
MFC after: 3 days


# 1b46c7f8 14-Feb-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Allow to provide a hint to in6_selectsrc() for the interface using the
return ifnet double pointer.
Pass that hint down to in6_selectif() to be used when i) the default FIB
is queried and ii) route lookup fails because the network is not present
(i.e. someone deleted the connected subnet).
This hint should not be generally used from anywhere outside the neighbor
discovery code. We just make use of it from nd6_ns_output().

Extend the nd6_na_output() interface by a nd6_na_output_fib() version
and pass the FIB number from the NS mbuf on to NA to allow the new mbuf
to inherit the FIB tag and a later lookup from ip6_output() to succeed
in the aformentioned example case.
Provide a wrapper function for the old public interface also used from
CARP but mark it with BURN_BRIDGES to cleanup in HEAD after MFC.

Sponsored by: Cisco Systems, Inc.


# 81d5d46b 03-Feb-2012 Bjoern A. Zeeb <bz@FreeBSD.org>

Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
are locked to the default FIB. Individual IPv6 addresses will only
appear in the default FIB, however redirect information and prefixes
of connected subnets are automatically propagated to all FIBs by
default (mimicking IPv4 behavior as closely as possible).

Sponsored by: Cisco Systems, Inc.


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# fa046d87 30-May-2011 Robert Watson <rwatson@FreeBSD.org>

Decompose the current single inpcbinfo lock into two locks:

- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


# 68e0d7e0 23-May-2011 Robert Watson <rwatson@FreeBSD.org>

Move from passing a wildcard boolean to a general set up lookup flags into
in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly
for IPv6 functions. In the future, we would like to support other flags
relating to locking strategy.

This change doesn't appear to modify the KBI in practice, as callers already
passed in INPLOOKUP_WILDCARD rather than a simple boolean.

MFC after: 3 weeks
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


# efc76f72 12-Mar-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Merge the two identical implementations for local port selections from
in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport().

MFC after: 2 weeks


# 4a2b2562 12-Mar-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Push a possible "unbind" in some situation from in6_pcbsetport() to
callers. This also fixes a problem when the prison call could set
the inp->in6p_laddr (laddr) and a following priv_check_cred() call
would return an error and will allow us to merge the IPv4 and IPv6
implementation.

MFC after: 2 weeks


# 3e288e62 22-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 31c6a003 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 480d7c6c 06-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH


# 6377d380 02-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207276:
Make sure IPv6 source address selection does not change interface
addresses while walking the IPv6 address list if in the jail case
something is connecting to ::1.

Reported by: Pieter de Boer (pieter thedarkside.nl)
Tested by: Pieter de Boer (pieter thedarkside.nl)


# 82cea7e6 29-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# 877fc3b6 27-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Make sure IPv6 source address selection does not change interface
addresses while walking the IPv6 address list if in the jail case
something is connecting to ::1.

Reported by: Pieter de Boer (pieter thedarkside.nl)
Tested by: Pieter de Boer (pieter thedarkside.nl)
MFC after: 4 days


# 3bcceea4 23-Jan-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r202468:

Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by: jamie, hrs (ipv6 part)
Pointed out by: hrs [1]


# 592bcae8 16-Jan-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by: jamie, hrs (ipv6 part)
Pointed out by: hrs [1]
MFC After: 2 weeks
Asked for by: Jase Thew (bazerka beardz.net)


# 3d2a8d36 05-Sep-2009 Qing Li <qingli@FreeBSD.org>

MFC r196864

This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by: bz
Approved by: re


# 9452b0d2 05-Sep-2009 Qing Li <qingli@FreeBSD.org>

This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by: bz
MFC after: immediately


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 1e77c105 16-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# d1da0a06 25-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Add address list locking for in6_ifaddrhead/ia_link: as with locking
for in_ifaddrhead, we stick with an rwlock for the time being, which
we will revisit in the future with a possible move to rmlocks.

Some pieces of code require significant further reworking to be
safe from all classes of writer-writer races.

Reviewed by: bz
MFC after: 6 weeks


# 80af0152 24-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt
the code styles and conventions present in netinet where possible.

Reviewed by: gnn, bz
MFC after: 6 weeks (possibly not MFCable?)


# 88d166bf 23-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory
to save the selected source address rather than returning an
unreferenced copy to a pointer that might long be gone by the
time we use the pointer for anything meaningful.

Asked for by: rwatson
Reviewed by: rwatson


# 8c0fec80 23-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


# 1ed81b73 06-Apr-2009 Marko Zec <zec@FreeBSD.org>

First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

At this stage, the new per-vnet initializer functions are
directly called from the existing global initialization code,
which should in most cases result in compiler inlining those
new functions, hence yielding a near-zero functional change.

Modify the existing initializer functions which are invoked via
protosw, like ip_init() et. al., to allow them to be invoked
multiple times, i.e. per each vnet. Global state, if any,
is initialized only if such functions are called within the
context of vnet0, which will be determined via the
IS_DEFAULT_VNET(curvnet) check (currently always true).

While here, V_irtualize a few remaining global UMA zones
used by net/netinet/netipsec networking code. While it is
not yet clear to me or anybody else whether this is the right
thing to do, at this stage this makes the code more readable,
and makes it easier to track uncollected UMA-zone-backed
objects on vnet removal. In the long run, it's quite possible
that some form of shared use of UMA zone pools among multiple
vnets should be considered.

Bump __FreeBSD_version due to changes in layout of structs
vnet_ipfw, vnet_inet and vnet_net.

Approved by: julian (mentor)


# b89e82dd 05-Feb-2009 Jamie Gritton <jamie@FreeBSD.org>

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


# 688d079b 15-Dec-2008 Kip Macy <kmacy@FreeBSD.org>

check return from lla_lookup against NULL not zero


# fc384fa5 15-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().

Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.

Discussed with: rwatson
Reviewed by: rwatson (version before review requested changes)
MFC after: 4 weeks (set the timer and see then)


# 6e6b3f7c 14-Dec-2008 Qing Li <qingli@FreeBSD.org>

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 4b79449e 02-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 413628a7 29-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


# 97021c24 26-Nov-2008 Marko Zec <zec@FreeBSD.org>

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 44e33a07 19-Nov-2008 Marko Zec <zec@FreeBSD.org>

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 1ede983c 23-Oct-2008 Dag-Erling Smørgrav <des@FreeBSD.org>

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# dc3c09c8 20-Oct-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Bring over the change switching from using sequential to random
ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143
(initially from OpenBSD) and follow-up commits during the last four and
a half years including rev. 1.157, 1.162 and 1.199.
This now is relying on the same infrastructure as has been implemented
in in_pcb.c since rev. 1.199.

Reviewed by: silby, rpaulo, mlaier
MFC after: 2 months


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 078b7042 10-Jul-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Pass the ucred along into in{,6}_pcblookup_local for upcoming
prison checks.

Reviewed by: rwatson


# a55b8b20 09-Jul-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Document required locking in in6_sleectsrc() in case an inp is
passed in by adding an assert.

Requested by: rwatson
Reviewed by: rwatson


# f2f877d3 08-Jul-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Change the parameters to in6_selectsrc():
- pass in the inp instead of both in6p_moptions and laddr.
- pass in cred for upcoming prison checks.

Reviewed by: rwatson


# 8501a69c 17-Apr-2008 Robert Watson <rwatson@FreeBSD.org>

Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after: 3 months
Tested by: kris (superset of committered patch)


# e440aed9 12-Apr-2008 Qing Li <qingli@FreeBSD.org>

This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by: robert, sam, gnn, julian, kmacy


# ab569b9c 20-Jan-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Correct the commented out debugging printf()s in REPLACE and NEXT macros.
ip6_sprintf() needs a buffer as first argument these days.

MFC after: 2 weeks


# 9233d8f3 08-Jan-2008 David E. O'Brien <obrien@FreeBSD.org>

un-__P()


# b48287a3 10-Dec-2007 David E. O'Brien <obrien@FreeBSD.org>

Clean up VCS Ids.


# 2a463222 05-Jul-2007 Xin LI <delphij@FreeBSD.org>

Space cleanup

Approved by: re (rwatson)


# 1272577e 05-Jul-2007 Xin LI <delphij@FreeBSD.org>

ANSIfy[1] plus some style cleanup nearby.

Discussed with: gnn, rwatson
Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by: re (rwatson)


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# 712fc218 30-Apr-2007 Robert Watson <rwatson@FreeBSD.org>

Rename some fields of struct inpcbinfo to have the ipi_ prefix,
consistent with the naming of other structure field members, and
reducing improper grep matches. Clean up and comment structure
fields in structure definition.


# 4e7f640d 31-Mar-2007 John Baldwin <jhb@FreeBSD.org>

Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd


# 1d54aa3b 11-Dec-2006 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4: 92972, 98913 + one more change

In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 43bc7a9c 04-Aug-2006 Brooks Davis <brooks@FreeBSD.org>

With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.


# f8366b03 22-May-2006 Seigo Tanimura <tanimura@FreeBSD.org>

Avoid spurious release of an rtentry.


# 8deea4a8 24-Apr-2006 Robert Watson <rwatson@FreeBSD.org>

Move lock assertions to top of in6_pcbladdr(): we still want them to run
even if we're going to return an argument-based error.

Assert pcbinfo lock in in6_pcblookup_local(), in6_pcblookup_hash(), since
they walk pcbinfo inpcb lists.

Assert inpcb and pcbinfo locks in in6_pcbsetport(), since
port reservations are changing.

MFC after: 3 months


# c1a049ac 19-Oct-2005 SUZUKI Shinsuke <suz@FreeBSD.org>

sync with KAME (removed a unnecesary non-standard macro)

Obtained from: KAME
Reviewd by: ume, gnn


# 5d525653 17-Aug-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

- fix race condition using sx lock.
- use TAILQ_FOREACH() for readability.

Suggested by: jhb


# 1c446786 16-Aug-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

avoid exclusive sleep mutex.


# cd0fdcf7 12-Aug-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

- fix typo in comment.
- nuke unused code.

Submitted by: suz
Obtained from: KAME


# d6bb0cb7 26-Jul-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

nuke duplicate inclusion of scope6_var.h.


# a1f7e5f8 24-Jul-2005 Hajimu UMEMOTO <ume@FreeBSD.org>

scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME


# 403cbcf5 14-May-2005 George V. Neville-Neil <gnn@FreeBSD.org>

Fixes for various nits found by the Coverity tool.

In particular 2 missed return values and an inappropriate bcopy from
a possibly NULL pointer.

Reviewed by: jake
Approved by: rwatson
MFC after: 1 week


# caf43b02 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes, separate for KAME


# f36cfd49 07-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# b0330ed9 27-Mar-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:
- in_pcbbind(),
- in_pcbbind_setup(),
- in_pcbconnect(),
- in_pcbconnect_setup(),
- in6_pcbbind(),
- in6_pcbconnect(),
- in6_pcbsetport().
"It should simplify/clarify things a great deal." --rwatson

Requested by: rwatson
Reviewed by: rwatson, ume


# 68efda09 03-Feb-2004 Hajimu UMEMOTO <ume@FreeBSD.org>

KNF

Obtained from: KAME


# a89ec05e 22-Dec-2003 Peter Wemm <peter@FreeBSD.org>

Catch a few places where NULL (pointer) was used where 0 (integer) was
expected.


# 97d8d152 20-Nov-2003 Andre Oppermann <andre@FreeBSD.org>

Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


# f4dec803 12-Nov-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

reflect ip6_pktopts and ip6_moptions into embeded scope of
destination address. it makes `ping6 -I <if> <link-local>'
work again. since we don't merge scope cleanup yet, we need
this for workaround.


# e6a27350 05-Nov-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

make sure to treat destrination address as KAME internal form
of embedscope.


# d6385b1c 04-Nov-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

source address selection part of RFC3484.
TODO: since there is scope issue to be solved, multicast and
link-local address are treated as special for workaround for
now.

Obtained from: KAME


# 4a201dfc 04-Nov-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- update comments to refrect recent BSDs.
- nuke unused macro PSUEDO_SET().
- I believe our if_xname stuff is nothing strange against other BSDs.

Obtained from: KAME


# 349b668a 30-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- unlock on error.
- don't call malloc with M_WAITOK within lock context.


# 7fc91b3f 30-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

add management part of address selection policy described in
RFC3484.

Obtained from: KAME


# 37bdc280 23-Oct-2003 Sam Leffler <sam@FreeBSD.org>

check return result from rtalloc1 before invoking RTUNLOCK


# 66bb118e 22-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

drop the code of HAVE_NRL_INPCB part. our system doesn't
use NRL style INPCB.


# 9a4f9608 21-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- change scope to zone.
- change node-local to interface-local.
- better error handling of address-to-scope mapping.
- use in6_clearscope().

Obtained from: KAME


# 31b1bfe1 17-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME


# 7efe5d92 08-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

- fix typo in comments.
- style.
- NULL is not 0.
- some variables were renamed.
- nuke unused logic.
(there is no functional change.)

Obtained from: KAME


# 40e39bbb 06-Oct-2003 Hajimu UMEMOTO <ume@FreeBSD.org>

return(code) -> return (code)
(reduce diffs against KAME)


# d1dd20be 03-Oct-2003 Sam Leffler <sam@FreeBSD.org>

Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)


# 4cc20ab1 31-May-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by: hsu


# 243917fe 19-May-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

- so_count
- so_options
- so_linger
- so_state

o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:

- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()

Reviewed by: alfred


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 54c1b882 21-Jan-2002 Hajimu UMEMOTO <ume@FreeBSD.org>

- Check the address family of a cached destination, in case of
sharing the cache with IPv4.
- Check if the cached route is up in in6_selectsrc().

Obtained from: KAME


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# f9132ceb 05-Sep-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


# 33841545 10-Jun-2001 Hajimu UMEMOTO <ume@FreeBSD.org>

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 0634b4a7 29-Jan-2001 Peter Wemm <peter@FreeBSD.org>

Yikes, these files bogusly #include "loop.h" but didn't use the value.
My searching for NLOOP missed them. :-(


# cf9fa8e7 29-Oct-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


# 686cdd19 04-Jul-2000 Jun-ichiro itojun Hagino <itojun@FreeBSD.org>

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)