History log of /freebsd-current/sys/netipsec/xform_esp.c
Revision Date Author Comments
# 1a56620b 24-Feb-2024 Konstantin Belousov <kib@FreeBSD.org>

ipsec esp: avoid dereferencing freed secasindex

It is possible that SA was removed while processing packed, in which
case it is changed to the DEAD state and it index is removed from the
tree. Dereferencing sav->sah then touches freed memory.

Reviewed by: ae
Sponsored by: NVIDIA networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44079


# 71625ec9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c comment pattern

Remove /^/[*/]\s*\$FreeBSD\$.*\n/


# 9f8f3a8e 18-Oct-2022 Kristof Provost <kp@FreeBSD.org>

ipsec: add support for CHACHA20POLY1305

Based on a patch by ae@.

Reviewed by: gbe (man page), pauamma (man page)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D37180


# 0361f165 23-Jun-2022 Kristof Provost <kp@FreeBSD.org>

ipsec: replace SECASVAR mtx by rmlock

This mutex is a significant point of contention in the ipsec code, and
can be relatively trivially replaced by a read-mostly lock.
It does require a separate lock for the replay protection, which we do
here by adding a separate mutex.

This improves throughput (without replay protection) by 10-15%.

MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D35763


# 91c35dd7 17-Feb-2022 Mateusz Guzik <mjg@FreeBSD.org>

ipsec: extend vnet coverage in esp_input/output_cb

key_delsav used to conditionally dereference vnet, leading to panics as
it was getting unset too early.

While the particular condition was removed, it makes sense to handle all
operations of the sort with correct vnet set so change it.

Reviewed by: ae
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34313


# 35d9e00d 24-Jan-2022 John Baldwin <jhb@FreeBSD.org>

IPsec: Use protocol-specific malloc types instead of M_XDATA.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33992


# 68f6800c 08-Feb-2021 Mark Johnston <markj@FreeBSD.org>

opencrypto: Introduce crypto_dispatch_async()

Currently, OpenCrypto consumers can request asynchronous dispatch by
setting a flag in the cryptop. (Currently only IPSec may do this.) I
think this is a bit confusing: we (conditionally) set cryptop flags to
request async dispatch, and then crypto_dispatch() immediately examines
those flags to see if the consumer wants async dispatch. The flag names
are also confusing since they don't specify what "async" applies to:
dispatch or completion.

Add a new KPI, crypto_dispatch_async(), rather than encoding the
requested dispatch type in each cryptop. crypto_dispatch_async() falls
back to crypto_dispatch() if the session's driver provides asynchronous
dispatch. Get rid of CRYPTOP_ASYNC() and CRYPTOP_ASYNC_KEEPORDER().

Similarly, add crypto_dispatch_batch() to request processing of a tailq
of cryptops, rather than encoding the scheduling policy using cryptop
flags. Convert GELI, the only user of this interface (disabled by
default) to use the new interface.

Add CRYPTO_SESS_SYNC(), which can be used by consumers to determine
whether crypto requests will be dispatched synchronously. This is just
a helper macro. Use it instead of looking at cap flags directly.

Fix style in crypto_done(). Also get rid of CRYPTO_RETW_EMPTY() and
just check the relevant queues directly. This could result in some
unnecessary wakeups but I think it's very uncommon to be using more than
one queue per worker in a given workload, so checking all three queues
is a waste of cycles.

Reviewed by: jhb
Sponsored by: Ampere Computing
Submitted by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28194


# 4d36d1fd 16-Oct-2020 Marcin Wojtas <mw@FreeBSD.org>

Add support for IPsec ESN and pass relevant information to crypto layer

Implement support for including IPsec ESN (Extended Sequence Number) to
both encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined
mode (eg. AES-GCM). Both ESP and AH protocols are updated. Additionally
pass relevant information about ESN to crypto layer.

For the ETA mode the ESN is stored in separate crp_esn buffer because
the high-order 32 bits of the sequence number are appended after the
Next Header (RFC 4303).

For the AEAD modes the high-order 32 bits of the sequence number
[e.g. RFC 4106, Chapter 5 AAD Construction] are included as part of
crp_aad (SPI + ESN (32 high order bits) + Seq nr (32 low order bits)).

Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb, gnn
Differential revision: https://reviews.freebsd.org/D22369
Obtained from: Semihalf
Sponsored by: Stormshield


# 8b7f3994 16-Oct-2020 Marcin Wojtas <mw@FreeBSD.org>

Implement anti-replay algorithm with ESN support

As RFC 4304 describes there is anti-replay algorithm responsibility
to provide appropriate value of Extended Sequence Number.

This patch introduces anti-replay algorithm with ESN support based on
RFC 4304, however to avoid performance regressions window implementation
was based on RFC 6479, which was already implemented in FreeBSD.

To keep things clean and improve code readability, implementation of window
is kept in seperate functions.

Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D22367
Obtained from: Semihalf
Sponsored by: Stormshield


# dae61c9d 25-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Simplify IPsec transform-specific teardown.

- Rename from the teardown callback from 'zeroize' to 'cleanup' since
this no longer zeroes keys.

- Change the callback return type to void. Nothing checked the return
value and it was always zero.

- Don't have esp call into ah since it no longer needs to depend on
this to clear the auth key. Instead, both are now private and
self-contained.

Reviewed by: delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25443


# 20869b25 25-Jun-2020 John Baldwin <jhb@FreeBSD.org>

Use zfree() to explicitly zero IPsec keys.

Reviewed by: delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25442


# 28d2a72b 29-May-2020 John Baldwin <jhb@FreeBSD.org>

Consistently include opt_ipsec.h for consumers of <netipsec/ipsec.h>.

This fixes ipsec.ko to include all of IPSEC_DEBUG.

Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25046


# b01edfb5 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Fix AES-CTR compatibility issue in ipsec

r361390 decreased blocksize of AES-CTR from 16 to 1.
Because of that ESP payload is no longer aligned to 16 bytes
before being encrypted and sent.
This is a good change since RFC3686 specifies that the last block
doesn't need to be aligned.
Since FreeBSD before r361390 couldn't decrypt partial blocks encrypted
with AES-CTR we need to enforce 16 byte alignment in order to preserve
compatibility.
Add a sysctl(on by default) to control it.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D24999


# 9c0e3d3a 25-May-2020 John Baldwin <jhb@FreeBSD.org>

Add support for optional separate output buffers to in-kernel crypto.

Some crypto consumers such as GELI and KTLS for file-backed sendfile
need to store their output in a separate buffer from the input.
Currently these consumers copy the contents of the input buffer into
the output buffer and queue an in-place crypto operation on the output
buffer. Using a separate output buffer avoids this copy.

- Create a new 'struct crypto_buffer' describing a crypto buffer
containing a type and type-specific fields. crp_ilen is gone,
instead buffers that use a flat kernel buffer have a cb_buf_len
field for their length. The length of other buffer types is
inferred from the backing store (e.g. uio_resid for a uio).
Requests now have two such structures: crp_buf for the input buffer,
and crp_obuf for the output buffer.

- Consumers now use helper functions (crypto_use_*,
e.g. crypto_use_mbuf()) to configure the input buffer. If an output
buffer is not configured, the request still modifies the input
buffer in-place. A consumer uses a second set of helper functions
(crypto_use_output_*) to configure an output buffer.

- Consumers must request support for separate output buffers when
creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are
only permitted to queue a request with a separate output buffer on
sessions with this flag set. Existing drivers already reject
sessions with unknown flags, so this permits drivers to be modified
to support this extension without requiring all drivers to change.

- Several data-related functions now have matching versions that
operate on an explicit buffer (e.g. crypto_apply_buf,
crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf).

- Most of the existing data-related functions operate on the input
buffer. However crypto_copyback always writes to the output buffer
if a request uses a separate output buffer.

- For the regions in input/output buffers, the following conventions
are followed:
- AAD and IV are always present in input only and their
fields are offsets into the input buffer.
- payload is always present in both buffers. If a request uses a
separate output buffer, it must set a new crp_payload_start_output
field to the offset of the payload in the output buffer.
- digest is in the input buffer for verify operations, and in the
output buffer for compute operations. crp_digest_start is relative
to the appropriate buffer.

- Add a crypto buffer cursor abstraction. This is a more general form
of some bits in the cryptosoft driver that tried to always use uio's.
However, compared to the original code, this avoids rewalking the uio
iovec array for requests with multiple vectors. It also avoids
allocate an iovec array for mbufs and populating it by instead walking
the mbuf chain directly.

- Update the cryptosoft(4) driver to support separate output buffers
making use of the cursor abstraction.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24545


# 897e4312 01-May-2020 John Baldwin <jhb@FreeBSD.org>

Don't pass bogus keys down for NULL algorithms.

The changes in r359374 added various sanity checks in sessions and
requests created by crypto consumers in part to permit backend drivers
to make assumptions instead of duplicating checks for various edge
cases. One of the new checks was to reject sessions which provide a
pointer to a key while claiming the key is zero bits long.

IPsec ESP tripped over this as it passes along whatever key is
provided for NULL, including a pointer to a zero-length key when an
empty string ("") is used with setkey(8). One option would be to
teach the IPsec key layer to not allocate keys of zero length, but I
went with a simpler fix of just not passing any keys down and always
using a key length of zero for NULL algorithms.

PR: 245832
Reported by: CI


# 16aabb76 01-May-2020 John Baldwin <jhb@FreeBSD.org>

Remove support for IPsec algorithms deprecated in r348205 and r360202.

Examples of depecrated algorithms in manual pages and sample configs
are updated where relevant. I removed the one example of combining
ESP and AH (vs using a cipher and auth in ESP) as RFC 8221 says this
combination is NOT RECOMMENDED.

Specifically, this removes support for the following ciphers:
- des-cbc
- 3des-cbc
- blowfish-cbc
- cast128-cbc
- des-deriv
- des-32iv
- camellia-cbc

This also removes support for the following authentication algorithms:
- hmac-md5
- keyed-md5
- keyed-sha1
- hmac-ripemd160

Reviewed by: cem, gnn (older verisons)
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24342


# 69a3eb62 22-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Fix name of 3DES cipher in deprecation warning.

Submitted by: cem
MFC after: 1 week


# e27a9ad8 22-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Deprecate 3des support in IPsec for FreeBSD 13.

RFC 8221 does not outright ban 3des as the algorithms deprecated for
13 in r348205, but it is listed as a SHOULD NOT and will likely be a
MUST NOT by the time 13 ships.

Discussed with: bjk
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24341


# c161c46d 20-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Update comments about IVs used in IPsec ESP.

Add some prose and a diagram describing the layout of the cipher IV
for AES-CTR and AES-GCM and how it relates to the ESP IV stored in the
packet after the ESP header. Also, remove an XXX comment about the
initial block counter value used for AES-CTR in esp_output as the
current code matches the RFC (and the equivalent code in esp_input
didn't have the XXX comment).

Discussed with: cem


# 8cbde414 20-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Generate IVs directly in esp_output.

This is the only place that uses CRYPTO_F_IV_GENERATE. All crypto
drivers currently duplicate the same boilerplate code to handle this
case. Doing the generation directly removes complexity from drivers.
It also simplifies support for separate input and output buffers.

Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24449


# c0341432 27-Mar-2020 John Baldwin <jhb@FreeBSD.org>

Refactor driver and consumer interfaces for OCF (in-kernel crypto).

- The linked list of cryptoini structures used in session
initialization is replaced with a new flat structure: struct
crypto_session_params. This session includes a new mode to define
how the other fields should be interpreted. Available modes
include:

- COMPRESS (for compression/decompression)
- CIPHER (for simply encryption/decryption)
- DIGEST (computing and verifying digests)
- AEAD (combined auth and encryption such as AES-GCM and AES-CCM)
- ETA (combined auth and encryption using encrypt-then-authenticate)

Additional modes could be added in the future (e.g. if we wanted to
support TLS MtE for AES-CBC in the kernel we could add a new mode
for that. TLS modes might also affect how AAD is interpreted, etc.)

The flat structure also includes the key lengths and algorithms as
before. However, code doesn't have to walk the linked list and
switch on the algorithm to determine which key is the auth key vs
encryption key. The 'csp_auth_*' fields are always used for auth
keys and settings and 'csp_cipher_*' for cipher. (Compression
algorithms are stored in csp_cipher_alg.)

- Drivers no longer register a list of supported algorithms. This
doesn't quite work when you factor in modes (e.g. a driver might
support both AES-CBC and SHA2-256-HMAC separately but not combined
for ETA). Instead, a new 'crypto_probesession' method has been
added to the kobj interface for symmteric crypto drivers. This
method returns a negative value on success (similar to how
device_probe works) and the crypto framework uses this value to pick
the "best" driver. There are three constants for hardware
(e.g. ccr), accelerated software (e.g. aesni), and plain software
(cryptosoft) that give preference in that order. One effect of this
is that if you request only hardware when creating a new session,
you will no longer get a session using accelerated software.
Another effect is that the default setting to disallow software
crypto via /dev/crypto now disables accelerated software.

Once a driver is chosen, 'crypto_newsession' is invoked as before.

- Crypto operations are now solely described by the flat 'cryptop'
structure. The linked list of descriptors has been removed.

A separate enum has been added to describe the type of data buffer
in use instead of using CRYPTO_F_* flags to make it easier to add
more types in the future if needed (e.g. wired userspace buffers for
zero-copy). It will also make it easier to re-introduce separate
input and output buffers (in-kernel TLS would benefit from this).

Try to make the flags related to IV handling less insane:

- CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv'
member of the operation structure. If this flag is not set, the
IV is stored in the data buffer at the 'crp_iv_start' offset.

- CRYPTO_F_IV_GENERATE means that a random IV should be generated
and stored into the data buffer. This cannot be used with
CRYPTO_F_IV_SEPARATE.

If a consumer wants to deal with explicit vs implicit IVs, etc. it
can always generate the IV however it needs and store partial IVs in
the buffer and the full IV/nonce in crp_iv and set
CRYPTO_F_IV_SEPARATE.

The layout of the buffer is now described via fields in cryptop.
crp_aad_start and crp_aad_length define the boundaries of any AAD.
Previously with GCM and CCM you defined an auth crd with this range,
but for ETA your auth crd had to span both the AAD and plaintext
(and they had to be adjacent).

crp_payload_start and crp_payload_length define the boundaries of
the plaintext/ciphertext. Modes that only do a single operation
(COMPRESS, CIPHER, DIGEST) should only use this region and leave the
AAD region empty.

If a digest is present (or should be generated), it's starting
location is marked by crp_digest_start.

Instead of using the CRD_F_ENCRYPT flag to determine the direction
of the operation, cryptop now includes an 'op' field defining the
operation to perform. For digests I've added a new VERIFY digest
mode which assumes a digest is present in the input and fails the
request with EBADMSG if it doesn't match the internally-computed
digest. GCM and CCM already assumed this, and the new AEAD mode
requires this for decryption. The new ETA mode now also requires
this for decryption, so IPsec and GELI no longer do their own
authentication verification. Simple DIGEST operations can also do
this, though there are no in-tree consumers.

To eventually support some refcounting to close races, the session
cookie is now passed to crypto_getop() and clients should no longer
set crp_sesssion directly.

- Assymteric crypto operation structures should be allocated via
crypto_getkreq() and freed via crypto_freekreq(). This permits the
crypto layer to track open asym requests and close races with a
driver trying to unregister while asym requests are in flight.

- crypto_copyback, crypto_copydata, crypto_apply, and
crypto_contiguous_subsegment now accept the 'crp' object as the
first parameter instead of individual members. This makes it easier
to deal with different buffer types in the future as well as
separate input and output buffers. It's also simpler for driver
writers to use.

- bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer.
This understands the various types of buffers so that drivers that
use DMA do not have to be aware of different buffer types.

- Helper routines now exist to build an auth context for HMAC IPAD
and OPAD. This reduces some duplicated work among drivers.

- Key buffers are now treated as const throughout the framework and in
device drivers. However, session key buffers provided when a session
is created are expected to remain alive for the duration of the
session.

- GCM and CCM sessions now only specify a cipher algorithm and a cipher
key. The redundant auth information is not needed or used.

- For cryptosoft, split up the code a bit such that the 'process'
callback now invokes a function pointer in the session. This
function pointer is set based on the mode (in effect) though it
simplifies a few edge cases that would otherwise be in the switch in
'process'.

It does split up GCM vs CCM which I think is more readable even if there
is some duplication.

- I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC
as an auth algorithm and updated cryptocheck to work with it.

- Combined cipher and auth sessions via /dev/crypto now always use ETA
mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored.
This was actually documented as being true in crypto(4) before, but
the code had not implemented this before I added the CIPHER_FIRST
flag.

- I have not yet updated /dev/crypto to be aware of explicit modes for
sessions. I will probably do that at some point in the future as well
as teach it about IV/nonce and tag lengths for AEAD so we can support
all of the NIST KAT tests for GCM and CCM.

- I've split up the exising crypto.9 manpage into several pages
of which many are written from scratch.

- I have converted all drivers and consumers in the tree and verified
that they compile, but I have not tested all of them. I have tested
the following drivers:

- cryptosoft
- aesni (AES only)
- blake2
- ccr

and the following consumers:

- cryptodev
- IPsec
- ktls_ocf
- GELI (lightly)

I have not tested the following:

- ccp
- aesni with sha
- hifn
- kgssapi_krb5
- ubsec
- padlock
- safe
- armv8_crypto (aarch64)
- glxsb (i386)
- sec (ppc)
- cesa (armv7)
- cryptocteon (mips64)
- nlmsec (mips64)

Discussed with: cem
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D23677


# a4adf6cc 30-Nov-2019 Bjoern A. Zeeb <bz@FreeBSD.org>

Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros.

r354748-354750 replaced the KAME macros with m_pulldown() calls.
Contrary to the rest of the network stack m_len checks before m_pulldown()
were not put in placed (see r354748).
Put these m_len checks in place for now (to go along with the style of the
network stack since the initial commits). These are not put in for
performance but to avoid an error scenario (even though it also will help
performance at the moment as it avoid allocating an extra mbuf; not because
of the unconditional function call).

The observed error case went like this:
(1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it.
(2) m_pullup() will call m_get() unless the requested length is larger than
MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the
requested length of data and pkthdr into the new mbuf.
(3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail.
This was observed with failing auto-configuration as an RA packet of
200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input()
dropped the mbuf.
(Re-)adding the m_len checks before m_pullup() calls avoids this problems
with mbufs using external storage for now.

MFC after: 3 weeks
Sponsored by: Netflix


# 3f44ee8e 27-Nov-2019 Andrey V. Elsukov <ae@FreeBSD.org>

Add support for dummy ESP packets with next header field equal to
IPPROTO_NONE.

According to RFC4303 2.6 they should be silently dropped.

Submitted by: aurelien.cazuc.external_stormshield.eu
MFC after: 10 days
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D22557


# 63abacc2 15-Nov-2019 Bjoern A. Zeeb <bz@FreeBSD.org>

netinet*: replace IP6_EXTHDR_GET()

In a few places we have IP6_EXTHDR_GET() left in upper layer protocols.
The IP6_EXTHDR_GET() macro might perform an m_pulldown() in case the data
fragment is not contiguous.

Convert these last remaining instances into m_pullup()s instead.
In CARP, for example, we will a few lines later call m_pullup() anyway,
the IPsec code coming from OpenBSD would otherwise have done the m_pullup()
and are copying the data a bit later anyway, so pulling it in seems no
better or worse.

Note: this leaves very few m_pulldown() cases behind in the tree and we
might want to consider removing them as well to make mbuf management
easier again on a path to variable size mbufs, especially given
m_pulldown() still has an issue not re-checking M_WRITEABLE().

Reviewed by: gallatin
MFC after: 8 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22335


# 0f702183 11-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Make the warning intervals for deprecated crypto algorithms tunable.

New sysctl/tunables can now set the interval (in seconds) between
rate-limited crypto warnings. The new sysctls are:
- kern.cryptodev_warn_interval for /dev/crypto
- net.inet.ipsec.crypto_warn_interval for IPsec
- kern.kgssapi_warn_interval for KGSSAPI

Reviewed by: cem
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20555


# c2fd516f 23-May-2019 John Baldwin <jhb@FreeBSD.org>

Add deprecation warnings for IPsec algorithms deprecated in RFC 8221.

All of these algorithms are either explicitly marked MUST NOT, or they
are implicitly MUST NOTs by virtue of not being included in IETF's
list of protocols at all despite having assignments from IANA.

Specifically, this adds warnings for the following ciphers:
- des-cbc
- blowfish-cbc
- cast128-cbc
- des-deriv
- des-32iv
- camellia-cbc

Warnings for the following authentication algorithms are also added:
- hmac-md5
- keyed-md5
- keyed-sha1
- hmac-ripemd160

Reviewed by: cem, gnn
MFC after: 3 days
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20340


# a8a16c71 03-Apr-2019 Conrad Meyer <cem@FreeBSD.org>

Replace read_random(9) with more appropriate arc4rand(9) KPIs

Reviewed by: ae, delphij
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19760


# 1b0909d5 17-Jul-2018 Conrad Meyer <cem@FreeBSD.org>

OpenCrypto: Convert sessions to opaque handles instead of integers

Track session objects in the framework, and pass handles between the
framework (OCF), consumers, and drivers. Avoid redundancy and complexity in
individual drivers by allocating session memory in the framework and
providing it to drivers in ::newsession().

Session handles are no longer integers with information encoded in various
high bits. Use of the CRYPTO_SESID2FOO() macros should be replaced with the
appropriate crypto_ses2foo() function on the opaque session handle.

Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to
the opaque handle interface. Discard existing session tracking as much as
possible (quick pass). There may be additional code ripe for deletion.

Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style
interface. The conversion is largely mechnical.

The change is documented in crypto.9.

Inspired by
https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html .

No objection from: ae (ipsec portion)
Reported by: jhb


# 2e08e39f 13-Jul-2018 Conrad Meyer <cem@FreeBSD.org>

OCF: Add a typedef for session identifiers

No functional change.

This should ease the transition from an integer session identifier model to
an opaque pointer model.


# fd40ecf3 20-Mar-2018 John Baldwin <jhb@FreeBSD.org>

Set the proper vnet in IPsec callback functions.

When using hardware crypto engines, the callback functions used to handle
an IPsec packet after it has been encrypted or decrypted can be invoked
asynchronously from a worker thread that is not associated with a vnet.
Extend 'struct xform_data' to include a vnet pointer and save the current
vnet in this new member when queueing crypto requests in IPsec. In the
IPsec callback routines, use the new member to set the current vnet while
processing the modified packet.

This fixes a panic when using hardware offload such as ccr(4) with IPsec
after VIMAGE was enabled in GENERIC.

Reported by: Sony Arpita Das and Harsh Jain @ Chelsio
Reviewed by: bz
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14763


# 151ba793 24-Dec-2017 Alexander Kabaev <kan@FreeBSD.org>

Do pass removing some write-only variables from the kernel.

This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385


# 39bbca6f 03-Nov-2017 Fabien Thomas <fabient@FreeBSD.org>

crypto(9) is called from ipsec in CRYPTO_F_CBIFSYNC mode. This is working
fine when a lot of different flows to be ciphered/deciphered are involved.

However, when a software crypto driver is used, there are
situations where we could benefit from making crypto(9) multi threaded:
- a single flow is to be ciphered: only one thread is used to cipher it,
- a single ESP flow is to be deciphered: only one thread is used to
decipher it.

The idea here is to call crypto(9) using a new mode (CRYPTO_F_ASYNC) to
dispatch the crypto jobs on multiple threads, if the underlying crypto
driver is working in synchronous mode.

Another flag is added (CRYPTO_F_ASYNC_KEEPORDER) to make crypto(9)
dispatch the crypto jobs in the order they are received (an additional
queue/thread is used), so that the packets are reinjected in the network
using the same order they were posted.

A new sysctl net.inet.ipsec.async_crypto can be used to activate
this new behavior (disabled by default).

Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu>
Reviewed by: ae, jmg, jhb
Differential Revision: https://reviews.freebsd.org/D10680
Sponsored by: Stormshield


# 7f1f6591 29-May-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Disable IPsec debugging code by default when IPSEC_DEBUG kernel option
is not specified.

Due to the long call chain IPsec code can produce the kernel stack
exhaustion on the i386 architecture. The debugging code usually is not
used, but it requires a lot of stack space to keep buffers for strings
formatting. This patch conditionally defines macros to disable building
of IPsec debugging code.

IPsec currently has two sysctl variables to configure debug output:
* net.key.debug variable is used to enable debug output for PF_KEY
protocol. Such debug messages are produced by KEYDBG() macro and
usually they can be interesting for developers.
* net.inet.ipsec.debug variable is used to enable debug output for
DPRINTF() macro and ipseclog() function. DPRINTF() macro usually
is used for development debugging. ipseclog() function is used for
debugging by administrator.

The patch disables KEYDBG() and DPRINTF() macros, and formatting buffers
declarations when IPSEC_DEBUG is not present in kernel config. This reduces
stack requirement for up to several hundreds of bytes.
The net.inet.ipsec.debug variable still can be used to enable ipseclog()
messages by administrator.

PR: 219476
Reported by: eugen
No objection from: #network
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10869


# 3aee7099 23-May-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Fix possible double releasing for SA and SP references.

There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For outbound packets the direct call chain is the following:
IPSEC_OUTPUT() method -> ipsec[46]_common_output() ->
-> ipsec[46]_perform_request() -> xform_output() ->
-> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
-> xform_output_cb() -> ipsec_process_done() -> ip[6]_output().

The SA and SP references are held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did references releasing in
xform_output_cb(). But when the crypto callback called directly, in case of
error the error handling code in ipsec[46]_perform_request() also did
references releasing.

To fix this, remove error handling from ipsec[46]_perform_request() and do it
in xform_output() before crypto_dispatch().

MFC after: 10 days


# 5f7c516f 23-May-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Fix possible double releasing for SA reference.

There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.

For inbound packets the direct call chain is the following:
IPSEC_INPUT() method -> ipsec_common_input() -> xform_input() ->
-> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
-> xform_input_cb() -> ipsec[46]_common_input_cb() -> netisr_queue().

The SA reference is held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did SA reference releasing in
xform_input_cb(). But when the crypto callback called directly, in case of
error (e.g. data authentification failed) the error handling in
ipsec_common_input() also did SA reference releasing.

To fix this, remove error handling from ipsec_common_input() and do it
in xform_input() before crypto_dispatch().

PR: 219356
MFC after: 10 days


# fcf59617 06-Feb-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Merge projects/ipsec into head/.

Small summary
-------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
option IPSEC_SUPPORT added. It enables support for loading
and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
support was removed. Added TCP/UDP checksum handling for
inbound packets that were decapsulated by transport mode SAs.
setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
build as part of ipsec.ko module (or with IPSEC kernel).
It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
methods. The only one header file <netipsec/ipsec_support.h>
should be included to declare all the needed things to work
with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
- now all security associations stored in the single SPI namespace,
and all SAs MUST have unique SPI.
- several hash tables added to speed up lookups in SADB.
- SADB now uses rmlock to protect access, and concurrent threads
can do SA lookups in the same time.
- many PF_KEY message handlers were reworked to reflect changes
in SADB.
- SADB_UPDATE message was extended to support new PF_KEY headers:
SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
avoid locking protection for ipsecrequest. Now we support
only limited number (4) of bundled SAs, but they are supported
for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
check for full history of applied IPsec transforms.
o References counting rules for security policies and security
associations were changed. The proper SA locking added into xform
code.
o xform code was also changed. Now it is possible to unregister xforms.
tdb_xxx structures were changed and renamed to reflect changes in
SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by: gnn, wblock
Obtained from: Yandex LLC
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9352


# bf435626 25-Nov-2016 Fabien Thomas <fabient@FreeBSD.org>

IPsec RFC6479 support for replay window sizes up to 2^32 - 32 packets.

Since the previous algorithm, based on bit shifting, does not scale
with large replay windows, the algorithm used here is based on
RFC 6479: IPsec Anti-Replay Algorithm without Bit Shifting.
The replay window will be fast to be updated, but will cost as many bits
in RAM as its size.

The previous implementation did not provide a lock on the replay window,
which may lead to replay issues.

Reviewed by: ae
Obtained from: emeric.poupon@stormshield.eu
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D8468


# 0c80e7df 16-Nov-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Use explicitly specified ivsize instead of blocksize when we mean IV size.
Set zero ivsize for enc_xform_null and remove special handling from
xform_esp.c.

Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D1503


# f3677984 30-Sep-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Take extra reference to security policy before calling crypto_dispatch().

Currently we perform crypto requests for IPSEC synchronous for most of
crypto providers (software, aesni) and only VIA padlock calls crypto
callback asynchronous. In synchronous mode it is possible, that security
policy will be removed during the processing crypto request. And crypto
callback will release the last reference to SP. Then upon return into
ipsec[46]_process_packet() IPSECREQUEST_UNLOCK() will be called to already
freed request. To prevent this we will take extra reference to SP.

PR: 201876
Sponsored by: Yandex LLC


# a2bc81bf 04-Aug-2015 John-Mark Gurney <jmg@FreeBSD.org>

Make IPsec work with AES-GCM and AES-ICM (aka CTR) in OCF... IPsec
defines the keys differently than NIST does, so we have to muck with
key lengths and nonce/IVs to be standard compliant...

Remove the iv from secasvar as it was unused...

Add a counter protected by a mutex to ensure that the counter for GCM
and ICM will never be repeated.. This is a requirement for security..
I would use atomics, but we don't have a 64bit one on all platforms..

Fix a bug where IPsec was depending upon the OCF to ensure that the
blocksize was always at least 4 bytes to maintain alignment... Move
this logic into IPsec so changes to OCF won't break IPsec...

In one place, espx was always non-NULL, so don't test that it's
non-NULL before doing work..

minor style cleanups...

drop setting key and klen as they were not used...

Enforce that OCF won't pass invalid key lengths to AES that would
panic the machine...

This was has been tested by others too... I tested this against
NetBSD 6.1.5 using mini-test suite in
https://github.com/jmgurney/ipseccfgs and the only things that don't
pass are keyed md5 and sha1, and 3des-deriv (setkey syntax error),
all other modes listed in setkey's man page... The nice thing is
that NetBSD uses setkey, so same config files were used on both...

Reviewed by: gnn


# 42e5fcbf 30-Jul-2015 John-Mark Gurney <jmg@FreeBSD.org>

these are comparing authenticators and need to be constant time...
This could be a side channel attack... Now that we have a function
for this, use it...

jmgurney/ipsecgcm: 24d704cc and 7f37a14


# 817c7ed9 30-Jul-2015 John-Mark Gurney <jmg@FreeBSD.org>

Clean up this header file...

use CTASSERTs now that we have them...

Replace a draft w/ RFC that's over 10 years old.

Note that _AALG and _EALG do not need to match what the IKE daemons
think they should be.. This is part of the KABI... I decided to
renumber AESCTR, but since we've never had working AESCTR mode, I'm
not really breaking anything.. and it shortens a loop by quite
a bit..

remove SKIPJACK IPsec support... SKIPJACK never made it out of draft
(in 1999), only has 80bit key, NIST recommended it stop being used
after 2010, and setkey nor any of the IKE daemons I checked supported
it...

jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6

Reviewed by: gnn (earlier version)


# a09a7146 29-Jul-2015 John-Mark Gurney <jmg@FreeBSD.org>

RFC4868 section 2.3 requires that the output be half... This fixes
problems that was introduced in r285336... I have verified that
HMAC-SHA2-256 both ah only and w/ AES-CBC interoperate w/ a NetBSD
6.1.5 vm...

Reviewed by: gnn


# 0b75d21e 09-Jul-2015 George V. Neville-Neil <gnn@FreeBSD.org>

Summary: Fix LINT build. The names of the new AES modes were not
correctly used under the REGRESSION kernel option.


# 16de9ac1 09-Jul-2015 George V. Neville-Neil <gnn@FreeBSD.org>

Add support for AES modes to IPSec. These modes work both in software only
mode and with hardware support on systems that have AESNI instructions.

Differential Revision: D2936
Reviewed by: jmg, eri, cognet
Sponsored by: Rubicon Communications (Netgate)


# 3d80e82d 26-Apr-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Fix possible use after free due to security policy deletion.

When we are passing mbuf to IPSec processing via ipsec[46]_process_packet(),
we hold one reference to security policy and release it just after return
from this function. But IPSec processing can be deffered and when we release
reference to security policy after ipsec[46]_process_packet(), user can
delete this security policy from SPDB. And when IPSec processing will be
done, xform's callback function will do access to already freed memory.

To fix this move KEY_FREESP() into callback function. Now IPSec code will
release reference to SP after processing will be finished.

Differential Revision: https://reviews.freebsd.org/D2324
No objections from: #network
Sponsored by: Yandex LLC


# 962ac6c7 18-Apr-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Change ipsec_address() and ipsec_logsastr() functions to take two
additional arguments - buffer and size of this buffer.

ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.

ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.

To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).

And now ipsec_address() uses inet_ntop() function from libkern.

PR: 185996
Differential Revision: https://reviews.freebsd.org/D2321
Reviewed by: gnn
Sponsored by: Yandex LLC


# f0514a8b 11-Dec-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Remove now unused mtag argument from ipsec*_common_input_cb.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 08537f45 11-Dec-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Remove code related to PACKET_TAG_IPSEC_IN_CRYPTO_DONE mbuf tag.
It isn't used in FreeBSD.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC


# 2d957916 01-Dec-2014 Andrey V. Elsukov <ae@FreeBSD.org>

Remove route chaching support from ipsec code. It isn't used for some time.
* remove sa_route_union declaration and route_cache member from struct secashead;
* remove key_sa_routechange() call from ICMP and ICMPv6 code;
* simplify ip_ipsec_mtu();
* remove #include <net/route.h>;

Sponsored by: Yandex LLC


# 6df8a710 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.

Sponsored by: Nginx, Inc.


# eedc7fd9 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Provide includes that are needed in these files, and before were read
in implicitly via if.h -> if_var.h pollution.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# db8c0879 09-Jul-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Migrate structs ahstat, espstat, ipcompstat, ipipstat, pfkeystat,
ipsec4stat, ipsec6stat to PCPU counters.


# a04d64d8 20-Jun-2013 Andrey V. Elsukov <ae@FreeBSD.org>

Use corresponding macros to update statistics for AH, ESP, IPIP, IPCOMP,
PFKEY.

MFC after: 2 weeks


# d3e8e66d 26-Nov-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove unused 'plen' variable.


# cdb7ebe3 26-Nov-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

The esp_max_ivlen global variable is not needed, we can just use
EALG_MAX_BLOCK_LEN.


# 5be4c9b9 26-Nov-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

malloc(M_WAITOK) never fails, so there is no need to check for NULL.


# 0e4fb1db 26-Nov-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Eliminate 'err' variable and just use existing 'error'.


# 0a95a08e 26-Nov-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Simplify code a bit.


# b6a4c9ac 26-Nov-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

There is no need to virtualize esp_max_ivlen.


# db178eb8 27-Apr-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Make IPsec compile without INET adding appropriate #ifdef checks.

Unfold the IPSEC_COMMON_INPUT_CB() macro in xform_{ah,esp,ipcomp}.c
to not need three different versions depending on INET, INET6 or both.

Mark two places preparing for not yet supported functionality with IPv6.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


# 73171433 31-Mar-2011 Fabien Thomas <fabient@FreeBSD.org>

Optimisation in IPSEC(4):
- Remove contention on ISR during the crypto operation by using rwlock(9).
- Remove a second lookup of the SA in the callback.

Gain on 6 cores CPU with SHA1/AES128 can be up to 30%.

Reviewed by: vanhu
MFC after: 1 month


# 442da28a 18-Feb-2011 VANHULLEBUS Yvan <vanhu@FreeBSD.org>

Fixed IPsec's HMAC_SHA256-512 support to be RFC4868 compliant.
This will break interoperability with all older versions of
FreeBSD for those algorithms.

Reviewed by: bz, gnn
Obtained from: NETASQ
MFC after: 1w


# 3e288e62 22-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 31c6a003 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 480d7c6c 06-May-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH


# 82cea7e6 29-Apr-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# a45bff04 01-Oct-2009 VANHULLEBUS Yvan <vanhu@FreeBSD.org>

Changed an IPSEC_ASSERT to a simple test, as such invalid packets
may come from outside without being discarded before.

Submitted by: aurelien.ansel@netasq.com
Reviewed by: bz (secteam)
Obtained from: NETASQ
MFC after: 1m


# 530c0060 01-Aug-2009 Robert Watson <rwatson@FreeBSD.org>

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 1e77c105 16-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# bfe1aba4 10-Apr-2009 Marko Zec <zec@FreeBSD.org>

Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized. In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw). Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module. The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on. Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first. The framework will panic if it detects any unresolved
dependencies before completing system initialization. Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run. In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container. IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by: bz
Approved by: julian (mentor)


# 1ed81b73 06-Apr-2009 Marko Zec <zec@FreeBSD.org>

First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

At this stage, the new per-vnet initializer functions are
directly called from the existing global initialization code,
which should in most cases result in compiler inlining those
new functions, hence yielding a near-zero functional change.

Modify the existing initializer functions which are invoked via
protosw, like ip_init() et. al., to allow them to be invoked
multiple times, i.e. per each vnet. Global state, if any,
is initialized only if such functions are called within the
context of vnet0, which will be determined via the
IS_DEFAULT_VNET(curvnet) check (currently always true).

While here, V_irtualize a few remaining global UMA zones
used by net/netinet/netipsec networking code. While it is
not yet clear to me or anybody else whether this is the right
thing to do, at this stage this makes the code more readable,
and makes it easier to track uncollected UMA-zone-backed
objects on vnet removal. In the long run, it's quite possible
that some form of shared use of UMA zone pools among multiple
vnets should be considered.

Bump __FreeBSD_version due to changes in layout of structs
vnet_ipfw, vnet_inet and vnet_net.

Approved by: julian (mentor)


# 44e33a07 19-Nov-2008 Marko Zec <zec@FreeBSD.org>

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# eaa9325f 24-May-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

In addition to the ipsec_osdep.h removal a week ago, now also eliminate
IPSEC_SPLASSERT_SOFTNET which has been 'unused' since FreeBSD 5.0.


# 0bf686c1 06-Aug-2007 Robert Watson <rwatson@FreeBSD.org>

Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.

Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)


# 559d3390 09-May-2007 George V. Neville-Neil <gnn@FreeBSD.org>

Integrate the Camellia Block Cipher. For more information see RFC 4132
and its bibliography.

Submitted by: Tomoyuki Okazaki <okazaki at kick dot gr dot jp>
MFC after: 1 month


# 80e35494 17-May-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- The authsize field from auth_hash structure was removed.
- Define that we want to receive only 96 bits of HMAC.
- Names of the structues have no longer _96 suffix.

Reviewed by: sam


# 6131838b 10-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Hide net.inet.ipsec.test_{replay,integrity} sysctls under #ifdef REGRESSION.

Requested by: sam, rwatson


# dfa9422b 09-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Introduce two new sysctls:

net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.

net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.

I used the first one to discover that we don't have proper replay attacks
detection in ESP (in fast_ipsec(4)).


# 2320ec8b 09-Apr-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Be consistent with the rest of the code.


# a0196c3c 25-Mar-2006 George V. Neville-Neil <gnn@FreeBSD.org>

First steps towards IPSec cleanup.

Make the kernel side of FAST_IPSEC not depend on the shared
structures defined in /usr/include/net/pfkeyv2.h The kernel now
defines all the necessary in kernel structures in sys/netipsec/keydb.h
and does the proper massaging when moving messages around.

Sponsored By: Secure Computing


# ec31427d 23-Mar-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow to use fast_ipsec(4) on debug.mpsafenet=0 and INVARIANTS-enabled
systems. Without the change it will panic on assertions.

MFC after: 2 weeks


# d16f6f50 22-Mar-2006 Colin Percival <cperciva@FreeBSD.org>

Add missing code needed for the detection of IPSec packet replays. [1]

Correctly identify the user running opiepasswd(1) when the login name
differs from the account name. [2]

Security: FreeBSD-SA-06:11.ipsec [1]
Security: FreeBSD-SA-06:12.opie [2]


# 47e2996e 15-Mar-2006 Sam Leffler <sam@FreeBSD.org>

promote fast ipsec's m_clone routine for public use; it is renamed
m_unshare and the caller can now control how mbufs are allocated

Reviewed by: andre, luigi, mlaier
MFC after: 1 week


# c398230b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 8976be94 27-Jan-2004 Sam Leffler <sam@FreeBSD.org>

change SYSINIT starting point to be consistent with other modules


# 9ffa9677 29-Sep-2003 Sam Leffler <sam@FreeBSD.org>

MFp4: portability work, general cleanup, locking fixes

change 38496
o add ipsec_osdep.h that holds os-specific definitions for portability
o s/KASSERT/IPSEC_ASSERT/ for portability
o s/SPLASSERT/IPSEC_SPLASSERT/ for portability
o remove function names from ASSERT strings since line#+file pinpints
the location
o use __func__ uniformly to reduce string storage
o convert some random #ifdef DIAGNOSTIC code to assertions
o remove some debuggging assertions no longer needed

change 38498
o replace numerous bogus panic's with equally bogus assertions
that at least go away on a production system

change 38502 + 38530
o change explicit mtx operations to #defines to simplify
future changes to a different lock type

change 38531
o hookup ipv4 ctlinput paths to a noop routine; we should be
handling path mtu changes at least
o correct potential null pointer deref in ipsec4_common_input_cb

chnage 38685
o fix locking for bundled SA's and for when key exchange is required

change 38770
o eliminate recursion on the SAHTREE lock

change 38804
o cleanup some types: long -> time_t
o remove refrence to dead #define

change 38805
o correct some types: long -> time_t
o add scan generation # to secpolicy to deal with locking issues

change 38806
o use LIST_FOREACH_SAFE instead of handrolled code
o change key_flush_spd to drop the sptree lock before purging
an entry to avoid lock recursion and to avoid holding the lock
over a long-running operation
o misc cleanups of tangled and twisty code

There is still much to do here but for now things look to be
working again.

Supported by: FreeBSD Foundation


# 6464079f 31-Aug-2003 Sam Leffler <sam@FreeBSD.org>

Locking and misc cleanups; most of which I've been running for >4 months:

o add locking
o strip irrelevant spl's
o split malloc types to better account for memory use
o remove unused IPSEC_NONBLOCK_ACQUIRE code
o remove dead code

Sponsored by: FreeBSD Foundation


# d8409aaf 29-Jun-2003 Sam Leffler <sam@FreeBSD.org>

consolidate callback optimization check in one location by adding a flag
for crypto operations that indicates the crypto code should do the check
in crypto_done

MFC after: 1 day


# 1bb98f3b 27-Jun-2003 Sam Leffler <sam@FreeBSD.org>

Check crypto driver capabilities and if the driver operates synchronously
mark crypto requests with ``callback immediately'' to avoid doing a context
switch to return crypto results. This completes the work to eliminate
context switches for using software crypto via the crypto subsystem (with
symmetric crypto ops).


# eb73a605 23-Feb-2003 Sam Leffler <sam@FreeBSD.org>

o add a CRYPTO_F_CBIMM flag to symmetric ops to indicate the callback
should be done in crypto_done rather than in the callback thread
o use this flag to mark operations from /dev/crypto since the callback
routine just does a wakeup; this eliminates the last unneeded ctx switch
o change CRYPTO_F_NODELAY to CRYPTO_F_BATCH with an inverted meaning
so "0" becomes the default/desired setting (needed for user-mode
compatibility with openbsd)
o change crypto_dispatch to honor CRYPTO_F_BATCH instead of always
dispatching immediately
o remove uses of CRYPTO_F_NODELAY
o define COP_F_BATCH for ops submitted through /dev/crypto and pass
this on to the op that is submitted

Similar changes and more eventually coming for asymmetric ops.

MFC if re gives approval.


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 88768458 15-Oct-2002 Sam Leffler <sam@FreeBSD.org>

"Fast IPsec": this is an experimental IPsec implementation that is derived
from the KAME IPsec implementation, but with heavy borrowing and influence
of openbsd. A key feature of this implementation is that it uses the kernel
crypto framework to do all crypto work so when h/w crypto support is present
IPsec operation is automatically accelerated. Otherwise the protocol
implementations are rather differet while the SADB and policy management
code is very similar to KAME (for the moment).

Note that this implementation is enabled with a FAST_IPSEC option. With this
you get all protocols; i.e. there is no FAST_IPSEC_ESP option.

FAST_IPSEC and IPSEC are mutually exclusive; you cannot build both into a
single system.

This software is well tested with IPv4 but should be considered very
experimental (i.e. do not deploy in production environments). This software
does NOT currently support IPv6. In fact do not configure FAST_IPSEC and
INET6 in the same system.

Obtained from: KAME + openbsd
Supported by: Vernier Networks