History log of /freebsd-current/sys/netipsec/ipsec_input.c
Revision Date Author Comments
# 80044c78 16-Jan-2024 Xavier Beaudouin <xavier.beaudouin@klarasystems.com>

Add UDP encapsulation of ESP in IPv6

This patch provides UDP encapsulation of ESP packets over IPv6.
Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c
As required by the RFC and unlike in IPv4 encapsulation,
UDP checksums are calculated.

Co-authored-by: Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu>
Sponsored-by: Stormshield
Sponsored-by: Wiktel
Sponsored-by: Klara, Inc.


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 3d0d5b21 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Explicitly include <net/if_private.h> in netstack

Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200


# e68b3792 07-Dec-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: embed inpcb into tcpcb

For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself. With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa. Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that. However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort. The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone. Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision: https://reviews.freebsd.org/D37127


# fcb3f813 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet*: remove PRC_ constants and streamline ICMP processing

In the original design of the network stack from the protocol control
input method pr_ctlinput was used notify the protocols about two very
different kinds of events: internal system events and receival of an
ICMP messages from outside. These events were coded with PRC_ codes.
Today these methods are removed from the protosw(9) and are isolated
to IPv4 and IPv6 stacks and are called only from icmp*_input(). The
PRC_ codes now just create a shim layer between ICMP codes and errors
or actions taken by protocols.

- Change ipproto_ctlinput_t to pass just pointer to ICMP header. This
allows protocols to not deduct it from the internal IP header.
- Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer.
It has all the information needed to the protocols. In the structure,
change ip6c_finaldst fields to sockaddr_in6. The reason is that
icmp6_input() already has this address wrapped in sockaddr, and the
protocols want this address as sockaddr.
- For UDP tunneling control input, as well as for IPSEC control input,
change the prototypes to accept a transparent union of either ICMP
header pointer or struct ip6ctlparam pointer.
- In icmp_input() and icmp6_input() do only validation of ICMP header and
count bad packets. The translation of ICMP codes to errors/actions is
done by protocols.
- Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap,
inet6ctlerrmap arrays.
- In protocol ctlinput methods either trust what icmp_errmap() recommend,
or do our own logic based on the ICMP header.

Differential revision: https://reviews.freebsd.org/D36731


# 809fef29 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netipsec: move specific ipsecmethods declarations to ipsec_support.h

where struct ipsec_methods is defined. Not a functional change.
Allows further modification of method prototypes without breaking
compilation of other ipsec compilation units.

Differential revision: https://reviews.freebsd.org/D36730


# 46ddeb6b 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

netinet6: retire ip6protosw.h

The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with
the former removed in f0ffb944d25 in 2001 and the latter survived until
today. It has been reduced down to only one useful declaration that
moves to ip6_var.h

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36726


# 78b1fc05 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: separate pr_input and pr_ctlinput out of protosw

The protosw KPI historically has implemented two quite orthogonal
things: protocols that implement a certain kind of socket, and
protocols that are IPv4/IPv6 protocol. These two things do not
make one-to-one correspondence. The pr_input and pr_ctlinput methods
were utilized only in IP protocols. This strange duality required
IP protocols that doesn't have a socket to declare protosw, e.g.
carp(4). On the other hand developers of socket protocols thought
that they need to define pr_input/pr_ctlinput always, which lead to
strange dead code, e.g. div_input() or sdp_ctlinput().

With this change pr_input and pr_ctlinput as part of protosw disappear
and IPv4/IPv6 get their private single level protocol switch table
ip_protox[] and ip6_protox[] respectively, pointing at array of
ipproto_input_t functions. The pr_ctlinput that was used for
control input coming from the network (ICMP, ICMPv6) is now represented
by ip_ctlprotox[] and ip6_ctlprotox[].

ipproto_register() becomes the only official way to register in the
table. Those protocols that were always static and unlikely anybody
is interested in making them loadable, are now registered by ip_init(),
ip6_init(). An IP protocol that considers itself unloadable shall
register itself within its own private SYSINIT().

Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36157


# 489482e2 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

ipsec: isolate knowledge about protocols that are last header

Retire PR_LASTHDR protosw flag.

Reviewed by: ae
Differential revision: https://reviews.freebsd.org/D36155


# 863871d3 27-Jul-2022 Kornel Dulęba <kd@FreeBSD.org>

ipsec: Improve validation of PMTU

Currently there is no upper bound on the PMTU value that is accepted.
Update hostcache only if the new pmtu is smaller than the current entry
and the link MTU.

Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D35872


# 590d0715 17-Sep-2021 Mateusz Guzik <mjg@FreeBSD.org>

ipsec: enter epoch before calling into ipsec_run_hhooks

pfil_run_hooks which eventually can get called asserts on it.

Reviewed by: ae
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32007


# 10eb2a2b 10-Sep-2021 Mark Johnston <markj@FreeBSD.org>

ipsec: Validate the protocol identifier in ipsec4_ctlinput()

key_allocsa() expects to handle only IPSec protocols and has an
assertion to this effect. However, ipsec4_ctlinput() has to handle
messages from ICMP unreachable packets and was not validating the
protocol number. In practice such a packet would simply fail to match
any SADB entries and would thus be ignored.

Reported by: syzbot+6a9ef6fcfadb9f3877fe@syzkaller.appspotmail.com
Reviewed by: ae
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31890


# d9d59bb1 08-Aug-2021 Wojciech Macek <wma@FreeBSD.org>

ipsec: Handle ICMP NEEDFRAG message.

It will be needed for upcoming PMTU implementation in ipsec.
For now simply create/update an entry in tcp hostcache when needed.
The code is based on https://people.freebsd.org/~ae/ipsec_transport_mode_ctlinput.diff

Authored by: Kornel Duleba <mindal@semihalf.com>
Differential revision: https://reviews.freebsd.org/D30992
Reviewed by: tuxen
Sponsored by: Stormshield
Obtained from: Semihalf


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# f82eb2a6 25-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Enter and exit the network epoch for async IPsec callbacks.

When an IPsec packet has been encrypted or decrypted, the next step in
the packet's traversal through the network stack is invoked from a
crypto worker thread, not from the original calling thread. These
threads need to enter the network epoch before passing packets down to
IP output routines or up to transport protocols.

Reviewed by: ae
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25444


# 1a01e0e7 31-Jul-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
from enc_hhook().

This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.

Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.

PR: 220217
MFC after: 3 weeks
Sponsored by: Yandex LLC


# 7f1f6591 29-May-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Disable IPsec debugging code by default when IPSEC_DEBUG kernel option
is not specified.

Due to the long call chain IPsec code can produce the kernel stack
exhaustion on the i386 architecture. The debugging code usually is not
used, but it requires a lot of stack space to keep buffers for strings
formatting. This patch conditionally defines macros to disable building
of IPsec debugging code.

IPsec currently has two sysctl variables to configure debug output:
* net.key.debug variable is used to enable debug output for PF_KEY
protocol. Such debug messages are produced by KEYDBG() macro and
usually they can be interesting for developers.
* net.inet.ipsec.debug variable is used to enable debug output for
DPRINTF() macro and ipseclog() function. DPRINTF() macro usually
is used for development debugging. ipseclog() function is used for
debugging by administrator.

The patch disables KEYDBG() and DPRINTF() macros, and formatting buffers
declarations when IPSEC_DEBUG is not present in kernel config. This reduces
stack requirement for up to several hundreds of bytes.
The net.inet.ipsec.debug variable still can be used to enable ipseclog()
messages by administrator.

PR: 219476
Reported by: eugen
No objection from: #network
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10869


# 5f7c516f 23-May-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Fix possible double releasing for SA reference.

There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For inbound packets the direct call chain is the following:
IPSEC_INPUT() method -> ipsec_common_input() -> xform_input() ->
-> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
-> xform_input_cb() -> ipsec[46]_common_input_cb() -> netisr_queue().

The SA reference is held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did SA reference releasing in
xform_input_cb(). But when the crypto callback called directly, in case of
error (e.g. data authentification failed) the error handling in
ipsec_common_input() also did SA reference releasing.

To fix this, remove error handling from ipsec_common_input() and do it
in xform_input() before crypto_dispatch().

PR: 219356
MFC after: 10 days


# fcf59617 06-Feb-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Merge projects/ipsec into head/.

Small summary
-------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
option IPSEC_SUPPORT added. It enables support for loading
and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
support was removed. Added TCP/UDP checksum handling for
inbound packets that were decapsulated by transport mode SAs.
setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
build as part of ipsec.ko module (or with IPSEC kernel).
It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
methods. The only one header file <netipsec/ipsec_support.h>
should be included to declare all the needed things to work
with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
- now all security associations stored in the single SPI namespace,
and all SAs MUST have unique SPI.
- several hash tables added to speed up lookups in SADB.
- SADB now uses rmlock to protect access, and concurrent threads
can do SA lookups in the same time.
- many PF_KEY message handlers were reworked to reflect changes
in SADB.
- SADB_UPDATE message was extended to support new PF_KEY headers:
SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
avoid locking protection for ipsecrequest. Now we support
only limited number (4) of bundled SAs, but they are supported
for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
check for full history of applied IPsec transforms.
o References counting rules for security policies and security
associations were changed. The proper SA locking added into xform
code.
o xform code was also changed. Now it is possible to unregister xforms.
tdb_xxx structures were changed and renamed to reflect changes in
SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by: gnn, wblock
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9352


# 0c127808 31-Aug-2016 Andrey V. Elsukov <ae@FreeBSD.org>

Remove redundant sanity checks from ipsec[46]_common_input_cb().

This check already has been done in the each protocol callback.


# ef91a976 25-Nov-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Overhaul if_enc(4) and make it loadable in run-time.

Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).


# 705f4d9c 21-Jul-2015 Ermal Luçi <eri@FreeBSD.org>

IPSEC, remove variable argument function its already due.

Differential Revision: https://reviews.freebsd.org/D3080
Reviewed by: gnn, ae
Approved by: gnn(mentor)


# 574fde00 28-Apr-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Since PFIL can change mbuf pointer, we should update pointers after
calling ipsec_filter().

Sponsored by: Yandex LLC


# 962ac6c7 18-Apr-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Change ipsec_address() and ipsec_logsastr() functions to take two
additional arguments - buffer and size of this buffer.

ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.

ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.

To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).

And now ipsec_address() uses inet_ntop() function from libkern.

PR: 185996
Differential Revision: https://reviews.freebsd.org/D2321
Reviewed by: gnn
Sponsored by: Yandex LLC


# 1d3b268c 18-Apr-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Requeue mbuf via netisr when we use IPSec tunnel mode and IPv6.

ipsec6_common_input_cb() uses partial copy of ip6_input() to parse
headers. But this isn't correct, when we use tunnel mode IPSec.

When we stripped outer IPv6 header from the decrypted packet, it
can become IPv4 packet and should be handled by ip_input. Also when
we use tunnel mode IPSec with IPv6 traffic, we should pass decrypted
packet with inner IPv6 header to ip6_input, it will correctly handle
it and also can decide to forward it.

The "skip" variable points to offset where payload starts. In tunnel
mode we reset it to zero after stripping the outer header. So, when
it is zero, we should requeue mbuf via netisr.

Differential Revision: https://reviews.freebsd.org/D2306
Reviewed by: adrian, gnn
Sponsored by: Yandex LLC


# 1ae800e7 18-Apr-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Fix handling of scoped IPv6 addresses in IPSec code.

* in ipsec_encap() embed scope zone ids into link-local addresses
in the new IPv6 header, this helps ip6_output() disambiguate the
scope;
* teach key_ismyaddr6() use in6_localip(). in6_localip() is less
strict than key_sockaddrcmp(). It doesn't compare all fileds of
struct sockaddr_in6, but it is faster and it should be safe,
because all SA's data was checked for correctness. Also, since
IPv6 link-local addresses in the &V_in6_ifaddrhead are stored in
kernel-internal form, we need to embed scope zone id from SA into
the address before calling in6_localip.
* in ipsec_common_input() take scope zone id embedded in the address
and use it to initialize sin6_scope_id, then use this sockaddr
structure to lookup SA, because we keep addresses in the SADB without
embedded scope zone id.

Differential Revision: https://reviews.freebsd.org/D2304
Reviewed by: gnn
Sponsored by: Yandex LLC


# f0514a8b 11-Dec-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Remove now unused mtag argument from ipsec*_common_input_cb.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 2d957916 01-Dec-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Remove route chaching support from ipsec code. It isn't used for some time.
* remove sa_route_union declaration and route_cache member from struct secashead;
* remove key_sa_routechange() call from ICMP and ICMPv6 code;
* simplify ip_ipsec_mtu();
* remove #include <net/route.h>;

Sponsored by: Yandex LLC


# 612faae7 13-Nov-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Strip IP header only when we act in tunnel mode.

MFC after: 1 week
Sponsored by: Yandex LLC


# b6e1ad3a 06-Nov-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Pass mbuf to pfil processing before stripping outer IP header as it
is described in if_enc(4).

MFC after: 2 week
Sponsored by: Yandex LLC


# 1f194d8a 06-Nov-2014 Andrey V. Elsukov <ae@FreeBSD.org>

When mode isn't explicitly specified (wildcard) and inner protocol isn't
IPv4 or IPv6, assume it is the transport mode.

Reported by: jmg
MFC after: 1 week
Sponsored by: Yandex LLC


# a28b277a 01-Oct-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Do not strip outer header when operating in transport mode.
Instead requeue mbuf back to IPv4 protocol handler. If there is one extra IP-IP
encapsulation, it will be handled with tunneling interface. And thus proper
interface will be exposed into mbuf's rcvif. Also, tcpdump that listens on tunneling
interface will see packets in both directions.

Sponsored by: Yandex LLC


# 6ff8af1c 19-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically convert to if_inc_counter().


# 8f5a8818 07-Aug-2014 Kevin Lo <kevlo@FreeBSD.org>

Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have
only one protocol switch structure that is shared between ipv4 and ipv6.

Phabric: D476
Reviewed by: jhb


# aaf2cfc0 27-May-2014 VANHULLEBUS Yvan <vanhu@FreeBSD.org>

Fixed IPv4-in-IPv6 and IPv6-in-IPv4 IPsec tunnels.
For IPv6-in-IPv4, you may need to do the following command
on the tunnel interface if it is configured as IPv4 only:
ifconfig <interface> inet6 -ifdisabled

Code logic inspired from NetBSD.

PR: kern/169438
Submitted by: emeric.poupon@netasq.com
Reviewed by: fabient, ae
Obtained from: NETASQ


# 00a689c4 11-Nov-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Initialize prot variable.

PR: 177417
MFC after: 1 week


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# a04d64d8 20-Jun-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use corresponding macros to update statistics for AH, ESP, IPIP, IPCOMP,
PFKEY.

MFC after: 2 weeks


# 9cb8d207 09-Apr-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.

MFC after: 1 week


# d2bffb14 23-Oct-2012 Gleb Smirnoff <glebius@FreeBSD.org>

- Fix one more miss from r241913.
- Add XXX comment about necessity of the entire block,
that "fixes up" the IP header.


# d6d3f01e 08-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

o Fine grained locking, thus much better performance.
o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by: Florian Smeets <flo freebsd.org>
Tested by: Chekaluk Vitaly <artemrts ukr.net>
Tested by: Ben Wilber <ben desync.com>
Tested by: Ian FREISLICH <ianf cloudseed.co.za>


# 7f63ba51 04-Jun-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Remove completely the m_addr_changed() hack, and support of reverse
pointer in pf_state_ket, that ware 'if 0' since beginning of
SMP-friendly pf project. In the new locking scheme we can't reference
state keys from mbuf tags, nor a key can reference another key.


# db178eb8 27-Apr-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Make IPsec compile without INET adding appropriate #ifdef checks.

Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c
to not need three different versions depending on INET, INET6 or both.

Mark two places preparing for not yet supported functionality with IPv6.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


# 94294cad 25-Oct-2010 Thomas Quinot <thomas@FreeBSD.org>

Fix typo in comment.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 3abaa086 24-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4 @178283:

Improve IPsec flow distribution for better netisr parallelism.
Instead of using the pointer that would have the last bits masked in a %
statement in netisr_select_cpuid() to select the queue, use the SPI.

Reviewed by: rwatson
MFC after: 4 weeks


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 7b495c44 12-Jun-2009 VANHULLEBUS Yvan <vanhu@FreeBSD.org>

Added support for NAT-Traversal (RFC 3948) in IPsec stack.

Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry
Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele
(julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense
team, and all people who used / tried the NAT-T patch for years and
reported bugs, patches, etc...

X-MFC: never

Reviewed by: bz
Approved by: gnn(mentor)
Obtained from: NETASQ


# fc228fbf 10-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Properly hide IPv4 only variables and functions under #ifdef INET.


# d4b5cae4 01-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
workstream, or set of per-protocol queues. Threads may be bound
to specific CPUs, or allowed to migrate, based on a global policy.

In the future it would be desirable to support topology-centric
policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
currently be one of:

NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
an implicit or explicit source (such as an interface or socket).

NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
as well as allowing protocols to provide a flow generation function
for mbufs without flow identifers (m2flow). Falls back on
NETISR_POLICY_SOURCE if now flow ID is available.

NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
being used, as well as a mapping function from workstream to CPU ID,
which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
get and clear drop counters, which query data or apply changes across
all workstreams.

- Add a more extensible netisr registration interface, in which
protocols declare 'struct netisr_handler' structures for each
registered NETISR_ type. These include name, handler function,
optional mbuf to flow ID function, optional mbuf to CPU ID function,
queue limit, and ordering policy. Padding is present to allow these
to be expanded in the future. If no queue limit is declared, then
a default is used.

- Queue limits are now per-workstream, and raised from the previous
IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
with the exception of netnatm, default queue limits. Most protocols
register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
NETISR_POLICY_FLOW, and will therefore take advantage of driver-
generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
the netisr, rather than having polling pretend to be two protocols.
Provide two explicit hooks in the netisr worker for start and end
events for runs: netisr_poll() and netisr_pollmore(), as well as a
function, netisr_sched_poll(), to allow the polling code to schedule
netisr execution. DEVICE_POLLING still embeds single-netisr
assumptions in its implementation, so for now if it is compiled into
the kernel, a single and un-bound netisr thread is enforced
regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking. It will be enabled once further rmlock
optimization has taken place. However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by: bz


# 4b79449e 02-Dec-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 97c2a697 12-Aug-2008 VANHULLEBUS Yvan <vanhu@FreeBSD.org>

Increase statistic counters for enc0 interface when enabled
and processing IPSec traffic.

Approved by: gnn (mentor)
MFC after: 1 week


# eaa9325f 24-May-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

In addition to the ipsec_osdep.h removal a week ago, now also eliminate
IPSEC_SPLASSERT_SOFTNET which has been 'unused' since FreeBSD 5.0.


# 19ad9831 28-Nov-2007 Bjoern A. Zeeb <bz@FreeBSD.org>

Add sysctls to if_enc(4) to control whether the firewalls or
bpf will see inner and outer headers or just inner or outer
headers for incoming and outgoing IPsec packets.

This is useful in bpf to not have over long lines for debugging
or selcting packets based on the inner headers.
It also properly defines the behavior of what the firewalls see.

Last but not least it gives you if_enc(4) for IPv6 as well.

[ As some auxiliary state was not available in the later
input path we save it in the tdbi. That way tcpdump can give a
consistent view of either of (authentic,confidential) for both
before and after states. ]

Discussed with: thompsa (2007-04-25, basic idea of unifying paths)
Reviewed by: thompsa, gnn


# e61a9df5 11-Sep-2007 George V. Neville-Neil <gnn@FreeBSD.org>

Fix for an infinite loop in processing ESP, IPv6 packets.

The control input routine passes a NULL as its void argument when it
has reached the innermost header, which terminates the loop.

Reported by: Pawel Worach <pawel.worach@gmail.com>
Approved by: re


# b28cd334 19-Jul-2007 Bjoern A. Zeeb <bz@FreeBSD.org>

Replace hard coded options by their defined PFIL_{IN,OUT} names.

Approved by: re (hrs)


# 0e41ce65 15-Jun-2007 Bjoern A. Zeeb <bz@FreeBSD.org>

Looking at {ah,esp}_input_cb it seems we might be able to end up
without an mtag in ipsec4_common_input_cb.
So in case of !IPCOMP (AH,ESP) only change the m_tag_id if an mtag
was passed to ipsec4_common_input_cb.

Found with: Coverity Prevent(tm)
CID: 2523


# ceda1e7c 15-Jun-2007 Bjoern A. Zeeb <bz@FreeBSD.org>

s,#,*, in a multi-line comment. This is C.
No functional change.


# f4760821 15-Jun-2007 Bjoern A. Zeeb <bz@FreeBSD.org>

Though we are only called for the three security protocols we can
handle, document those sprotos using an IPSEC_ASSERT so that it will
be clear that 'spi' will always be initialized when used the first time.

Found with: Coverity Prevent(tm)
CID: 2533


# 224c45c4 14-Dec-2006 Bjoern A. Zeeb <bz@FreeBSD.org>

s,#if INET6,#ifdef INET6,
This unbreaks the build for FAST_IPSEC && !INET6 and was wrong anyway.

Reported by: Dmitry Pryanishnikov <dmitry atlantis.dp.ua>


# 1d54aa3b 11-Dec-2006 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4: 92972, 98913 + one more change

In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.


# bdea400f 26-Jun-2006 Andrew Thompson <thompsa@FreeBSD.org>

Add a pseudo interface for packet filtering IPSec connections before or after
encryption. There are two functions, a bpf tap which has a basic header with
the SPI number which our current tcpdump knows how to display, and handoff to
pfil(9) for packet filtering.

Obtained from: OpenBSD
Based on: kern/94829
No objections: arch, net
MFC after: 1 month


# 49ddabdf 04-Jun-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change '#if INET' and '#if INET6' to '#ifdef INET' and '#ifdef INET6'.
This unbreaks compiling a kernel with FAST_IPSEC and no INET6.


# 79bc655b 03-Jun-2006 George V. Neville-Neil <gnn@FreeBSD.org>

Extend the notdef #ifdef to cover the packet copy as there is no point in doing that if we're not doing the rest of the work.

Submitted by: thompsa
MFC after: 1 week


# c398230b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 3161f583 27-Aug-2004 Andre Oppermann <andre@FreeBSD.org>

Apply error and success logic consistently to the function netisr_queue() and
its users.

netisr_queue() now returns (0) on success and ERRNO on failure. At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.

Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success. Due to this
schednetisr() was never called to kick the scheduling of the isr. However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.

PR: kern/70988
Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 3 days


# 9ffa9677 29-Sep-2003 Sam Leffler <sam@FreeBSD.org>

MFp4: portability work, general cleanup, locking fixes

change 38496
o add ipsec_osdep.h that holds os-specific definitions for portability
o s/KASSERT/IPSEC_ASSERT/ for portability
o s/SPLASSERT/IPSEC_SPLASSERT/ for portability
o remove function names from ASSERT strings since line#+file pinpints
the location
o use __func__ uniformly to reduce string storage
o convert some random #ifdef DIAGNOSTIC code to assertions
o remove some debuggging assertions no longer needed

change 38498
o replace numerous bogus panic's with equally bogus assertions
that at least go away on a production system

change 38502 + 38530
o change explicit mtx operations to #defines to simplify
future changes to a different lock type

change 38531
o hookup ipv4 ctlinput paths to a noop routine; we should be
handling path mtu changes at least
o correct potential null pointer deref in ipsec4_common_input_cb

chnage 38685
o fix locking for bundled SA's and for when key exchange is required

change 38770
o eliminate recursion on the SAHTREE lock

change 38804
o cleanup some types: long -> time_t
o remove refrence to dead #define

change 38805
o correct some types: long -> time_t
o add scan generation # to secpolicy to deal with locking issues

change 38806
o use LIST_FOREACH_SAFE instead of handrolled code
o change key_flush_spd to drop the sptree lock before purging
an entry to avoid lock recursion and to avoid holding the lock
over a long-running operation
o misc cleanups of tangled and twisty code

There is still much to do here but for now things look to be
working again.

Supported by: FreeBSD Foundation


# 6464079f 31-Aug-2003 Sam Leffler <sam@FreeBSD.org>

Locking and misc cleanups; most of which I've been running for >4 months:

o add locking
o strip irrelevant spl's
o split malloc types to better account for memory use
o remove unused IPSEC_NONBLOCK_ACQUIRE code
o remove dead code

Sponsored by: FreeBSD Foundation


# 4dbc6e51 13-Aug-2003 Sam Leffler <sam@FreeBSD.org>

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec

Submitted by: Markus Friedl <markus@openbsd.org>
Obtained from: OpenBSD (rev 1.66 ipsec_input.c)


# aaea26ef 28-Mar-2003 Sam Leffler <sam@FreeBSD.org>

add missing copyright notices

Noticed by: Robert Watson


# 1cafed39 04-Mar-2003 Jonathan Lemon <jlemon@FreeBSD.org>

Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point. Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs


# e8539d32 08-Nov-2002 Sam Leffler <sam@FreeBSD.org>

FAST_IPSEC fixups:

o fix #ifdef typo
o must use "bounce functions" when dispatched from the protosw table

don't know how this stuff was missed in my testing; must've committed
the wrong bits

Pointy hat: sam
Submitted by: "Doug Ambrisko" <ambrisko@verniernetworks.com>


# 88768458 15-Oct-2002 Sam Leffler <sam@FreeBSD.org>

"Fast IPsec": this is an experimental IPsec implementation that is derived
from the KAME IPsec implementation, but with heavy borrowing and influence
of openbsd. A key feature of this implementation is that it uses the kernel
crypto framework to do all crypto work so when h/w crypto support is present
IPsec operation is automatically accelerated. Otherwise the protocol
implementations are rather differet while the SADB and policy management
code is very similar to KAME (for the moment).

Note that this implementation is enabled with a FAST_IPSEC option. With this
you get all protocols; i.e. there is no FAST_IPSEC_ESP option.

FAST_IPSEC and IPSEC are mutually exclusive; you cannot build both into a
single system.

This software is well tested with IPv4 but should be considered very
experimental (i.e. do not deploy in production environments). This software
does NOT currently support IPv6. In fact do not configure FAST_IPSEC and
INET6 in the same system.

Obtained from: KAME + openbsd
Supported by: Vernier Networks