History log of /openbsd-current/sys/netinet/ip_spd.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.120 17-Apr-2024 bluhm

Use struct ipsec_level within inpcb.

Instead of passing around u_char[4], introduce struct ipsec_level
that contains 4 ipsec levels. This provides better type safety.
The embedding struct inpcb is globally visible for netstat(1), so
put struct ipsec_level outside of #ifdef _KERNEL.

OK deraadt@ mvs@


Revision tags: OPENBSD_7_5_BASE
# 1.119 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


Revision tags: OPENBSD_7_4_BASE
# 1.118 22-Apr-2023 mvs

Call pfkeyv2_sysctl_policydumper() with shared netlock. It performs
read-olny access to netlock protected data, so the radix tree will
not be modified during spd_table_walk() run.

Also change netlock assertion within spd_table_add() and
ipsec_delete_policy() to exclusive. These are correlating functions
which modifies radix tree, so make us sure spd_table_walk() run with
shared netlock is safe.

Feedback and ok by bluhm@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.117 17-Jun-2022 bluhm

The timeout for ipsec acquire does not decrement the reference
counter to 0 properly. We have one reference count for the lists,
and one for the timeout handler. When the timout fires, it has to
decrement the reference to itself. Then the ipa is removed from
the lists and decremented again.
from Stefan Butz; OK tobhe@ mvs@


# 1.116 04-May-2022 bluhm

In ipsp_spd_lookup() rename the parameter tdbp to tdbin as it is
always the incoming TDB that has to be checked.
from markus@


Revision tags: OPENBSD_7_1_BASE
# 1.115 13-Mar-2022 bluhm

Hrvoje has hit a crash with IPsec acquire while testing the parallel
IP forwarding diff. Add mutex and refcount to make memory management
of struct ipsec_acquire MP safe.
testing Hrvoje Popovski; input sashan@; OK mvs@


# 1.114 08-Mar-2022 bluhm

In IPsec policy replace integer refcount with atomic refcount.
OK tobhe@ mvs@


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.119 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


Revision tags: OPENBSD_7_4_BASE
# 1.118 22-Apr-2023 mvs

Call pfkeyv2_sysctl_policydumper() with shared netlock. It performs
read-olny access to netlock protected data, so the radix tree will
not be modified during spd_table_walk() run.

Also change netlock assertion within spd_table_add() and
ipsec_delete_policy() to exclusive. These are correlating functions
which modifies radix tree, so make us sure spd_table_walk() run with
shared netlock is safe.

Feedback and ok by bluhm@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.117 17-Jun-2022 bluhm

The timeout for ipsec acquire does not decrement the reference
counter to 0 properly. We have one reference count for the lists,
and one for the timeout handler. When the timout fires, it has to
decrement the reference to itself. Then the ipa is removed from
the lists and decremented again.
from Stefan Butz; OK tobhe@ mvs@


# 1.116 04-May-2022 bluhm

In ipsp_spd_lookup() rename the parameter tdbp to tdbin as it is
always the incoming TDB that has to be checked.
from markus@


Revision tags: OPENBSD_7_1_BASE
# 1.115 13-Mar-2022 bluhm

Hrvoje has hit a crash with IPsec acquire while testing the parallel
IP forwarding diff. Add mutex and refcount to make memory management
of struct ipsec_acquire MP safe.
testing Hrvoje Popovski; input sashan@; OK mvs@


# 1.114 08-Mar-2022 bluhm

In IPsec policy replace integer refcount with atomic refcount.
OK tobhe@ mvs@


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.118 22-Apr-2023 mvs

Call pfkeyv2_sysctl_policydumper() with shared netlock. It performs
read-olny access to netlock protected data, so the radix tree will
not be modified during spd_table_walk() run.

Also change netlock assertion within spd_table_add() and
ipsec_delete_policy() to exclusive. These are correlating functions
which modifies radix tree, so make us sure spd_table_walk() run with
shared netlock is safe.

Feedback and ok by bluhm@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.117 17-Jun-2022 bluhm

The timeout for ipsec acquire does not decrement the reference
counter to 0 properly. We have one reference count for the lists,
and one for the timeout handler. When the timout fires, it has to
decrement the reference to itself. Then the ipa is removed from
the lists and decremented again.
from Stefan Butz; OK tobhe@ mvs@


# 1.116 04-May-2022 bluhm

In ipsp_spd_lookup() rename the parameter tdbp to tdbin as it is
always the incoming TDB that has to be checked.
from markus@


Revision tags: OPENBSD_7_1_BASE
# 1.115 13-Mar-2022 bluhm

Hrvoje has hit a crash with IPsec acquire while testing the parallel
IP forwarding diff. Add mutex and refcount to make memory management
of struct ipsec_acquire MP safe.
testing Hrvoje Popovski; input sashan@; OK mvs@


# 1.114 08-Mar-2022 bluhm

In IPsec policy replace integer refcount with atomic refcount.
OK tobhe@ mvs@


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.117 17-Jun-2022 bluhm

The timeout for ipsec acquire does not decrement the reference
counter to 0 properly. We have one reference count for the lists,
and one for the timeout handler. When the timout fires, it has to
decrement the reference to itself. Then the ipa is removed from
the lists and decremented again.
from Stefan Butz; OK tobhe@ mvs@


# 1.116 04-May-2022 bluhm

In ipsp_spd_lookup() rename the parameter tdbp to tdbin as it is
always the incoming TDB that has to be checked.
from markus@


Revision tags: OPENBSD_7_1_BASE
# 1.115 13-Mar-2022 bluhm

Hrvoje has hit a crash with IPsec acquire while testing the parallel
IP forwarding diff. Add mutex and refcount to make memory management
of struct ipsec_acquire MP safe.
testing Hrvoje Popovski; input sashan@; OK mvs@


# 1.114 08-Mar-2022 bluhm

In IPsec policy replace integer refcount with atomic refcount.
OK tobhe@ mvs@


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.116 04-May-2022 bluhm

In ipsp_spd_lookup() rename the parameter tdbp to tdbin as it is
always the incoming TDB that has to be checked.
from markus@


Revision tags: OPENBSD_7_1_BASE
# 1.115 13-Mar-2022 bluhm

Hrvoje has hit a crash with IPsec acquire while testing the parallel
IP forwarding diff. Add mutex and refcount to make memory management
of struct ipsec_acquire MP safe.
testing Hrvoje Popovski; input sashan@; OK mvs@


# 1.114 08-Mar-2022 bluhm

In IPsec policy replace integer refcount with atomic refcount.
OK tobhe@ mvs@


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.115 13-Mar-2022 bluhm

Hrvoje has hit a crash with IPsec acquire while testing the parallel
IP forwarding diff. Add mutex and refcount to make memory management
of struct ipsec_acquire MP safe.
testing Hrvoje Popovski; input sashan@; OK mvs@


# 1.114 08-Mar-2022 bluhm

In IPsec policy replace integer refcount with atomic refcount.
OK tobhe@ mvs@


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.114 08-Mar-2022 bluhm

In IPsec policy replace integer refcount with atomic refcount.
OK tobhe@ mvs@


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.113 06-Mar-2022 bluhm

Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.112 22-Feb-2022 guenther

Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>

net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.111 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.110 16-Dec-2021 bluhm

Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink().
Then ipsp_spd_lookup() could add it to tdb_policy_head after
tdb_cleanspd(). There it would stay until it hits the kassert in
tdb_free().
OK tobhe@


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.109 14-Dec-2021 bluhm

To cache lookups, the policy ipo is linked to its SA tdb. There
is also a list of SAs that belong to a policy. To make it MP safe,
protect these pointers with a mutex.
tested by Hrvoje Popovski; OK mvs@


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.108 03-Dec-2021 bluhm

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.107 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.106 30-Nov-2021 bluhm

Remove unused parameter from ipsp_spd_inp().
OK mvs@ yasuoka@


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.105 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


Revision tags: OPENBSD_7_0_BASE
# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.104 08-Jul-2021 mvs

Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of
doing that in runtime within ipsp_acquire_sa().

ok bluhm@


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.103 04-May-2021 mvs

Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.

ok bluhm@ mpi@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.102 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.101 10-Dec-2019 tobhe

Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.

ok bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.100 08-Jul-2019 mpi

free(9) sizes for M_RTABLE.

ok kn@


Revision tags: OPENBSD_6_5_BASE
# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.99 22-Oct-2018 cheloha

ipsec: use monotonic clock for SA creation/lookup timestamps; ok dlg@


Revision tags: OPENBSD_6_4_BASE
# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.98 25-Jun-2018 mpi

Assert that the NET_LOCK() is held when iterating over `ipsec_acquire_head'.

ok visa@ as part of a larger bigger diff


# 1.97 16-May-2018 reyk

Fix kernel builds without IPSEC.

OK mikeb@


Revision tags: OPENBSD_6_3_BASE
# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.


# 1.96 20-Nov-2017 mpi

Flush flows using the radix-tree instead of a global list.

This will allows us to get rid of the list.

ok visa@


# 1.95 07-Nov-2017 mpi

Remove unused debug macro.


# 1.94 27-Oct-2017 mpi

Dump IPsec flows by iterating over the rafdix-tree.

This enforces an order and will allow us to get rid of the global list.

ok millert@, visa@, markus@


# 1.93 16-Oct-2017 mpi

Last changes before running IPsec w/o KERNEL_LOCK().

Put more NET_ASSERT_LOCK() and document which globals it protects.

Add a mutex for pfkeyv2 globals.

Convert ipsp_delete_acquire() to timeout_set_proc().

Tested by Hrvoje Popovski, ok bluhm@ visa@


Revision tags: OPENBSD_6_2_BASE
# 1.92 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.91 27-Sep-2016 fcambus

Remove empty #ifdef and #ifndef blocks

OK natano@


# 1.90 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


# 1.89 06-Sep-2016 dlg

pool_setipl for various netinet and netinet6 bits

thank you to everyone who helped reviewed these diffs

ok mpi@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.88 07-Oct-2015 mpi

Initialize the routing table before domains.

The routing table is not an optional component of the network stack
and initializing it inside the "routing domain" requires some ugly
introspection in the domain interface.

This put the rtable* layer at the same level of the if* level. These
two subsystem are organized around the two global data structure used
in the network stack:

- the global &ifnet list, to be used in process context only, and
- the routing table which can be read in interrupt context.

This change makes the rtable_* layer domain-aware and extends the
"struct domain" such that INET, INET6 and MPLS can specify the length
of the binary key used in lookups. This allows us to keep, or move
towards, AF-free route and rtable layers.

While here stop the madness and pass the size of the maximum key length
in *byte* to rn_inithead0().

ok claudio@, mikeb@


# 1.87 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.86 17-Jul-2015 blambert

manage spd entries by using the radix api directly instead of
reaching around through the routing table

original diff by myself, much improved by mikeb@ and mpi@

ok and testing mikeb@ mpi@


# 1.85 23-May-2015 markus

introduce ipsec-id bundles and use them for ipsecflowinfo,
fixes rekeying for l2tp/ipsec against multiple windows clients
and saves memory (for many SAs to same peers); feedback and ok mikeb@


# 1.84 30-Apr-2015 millert

Merge two identical if() statements in ipsp_acquire_sa(). The
change in ip_spd.c 1.59 makes it appear that there is a cut & pasto.
OK mikeb@


# 1.83 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.82 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


# 1.81 13-Apr-2015 mikeb

Perform IPsec bypass check on a socket before performing TDB lookups.
OK markus, hshoexer


# 1.80 13-Apr-2015 mikeb

Rename gettdbbyaddr to gettdbbydst; OK markus, hshoexer, mpi


# 1.79 13-Apr-2015 mikeb

Remove unused arguments from gettdb* functions; OK markus, hshoexer, mpi


# 1.78 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.77 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.76 25-Nov-2014 mpi

The proliferation of "struct route" in all its flavors didn't make
any good to our network stack.

The most visible effect is the maze of #ifdef's and casts. But the
real problem is the very fragile way of checking if a (cached) route
entry is still valid or not. What should we do if the route jumped
to another ifaddr or if its gateway has been changed?

This change start the dance of "struct route" & friends removal by
sending the completly useless "struct route_enc" to the bucket.

Tweak & ok claudio@


# 1.75 01-Nov-2014 mpi

Rename rtalloc1() into rtalloc(9) and convert its flags to only enable
functionnality instead of a mix of enable/disable.

ok bluhm@, jca@


# 1.74 14-Oct-2014 mpi

Use rtfree() instead of RTFREE(), NULLify some free'd route pointers and
kill the macro.

ok mikeb@, henning@


# 1.73 27-Sep-2014 mpi

Kill rtalloc() and update rtalloc1() and rtalloc_mpath() to no longer
rely on "struct route" that should die.

ok claudio@


Revision tags: OPENBSD_5_6_BASE
# 1.72 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.71 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.70 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.69 24-Oct-2013 mpi

Remove the number of in6_var.h inclusions by moving some functions and
global variables to in6.h.

ok deraadt@


# 1.68 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


Revision tags: OPENBSD_5_4_BASE
# 1.67 14-May-2013 mpi

Fix build with ENCDEBUG defined.


# 1.66 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.65 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.64 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.63 28-Sep-2010 deraadt

missing PR_NOWAIT


Revision tags: OPENBSD_4_8_BASE
# 1.62 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.61 02-Jul-2010 david

don't reference an item after it has been returned to the pool
an 8 year old bug exposed by recent uvm changes

ok thib@ tedu@ deraadt@


Revision tags: OPENBSD_4_7_BASE
# 1.60 15-Jan-2010 chl

Replace pool_get() + bzero() with pool_get(..., PR_ZERO).

With input from oga@ and krw@

ok oga@ krw@ thib@ markus@ mk@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.59 27-Jan-2009 bluhm

In IPsec acquire mode, if the flow was configured for the "any"
network 0.0.0.0/0 or ::/0, the SA was established for the IP address
in the packet instead of the network in the flow. That means the
SA was not negotiated for the network 0.0.0.0 with mask 0 but for
the remote IP with mask 255.255.255.255. This SA did not match the
flow and did not work.

To differentiate between general flows that are used to trigger
specific host-to-host SAs and flows for matching network SAs, the
if condition only uses the ipo->ipo_dst field now. For a flow
without peer, an SA must be negotiated for each host-to-host
combination. Otherwise, if a peer exists at the flow, the kernel
acquires one SA for the whole network.

tested by todd@, ok hshoexer@, angelos@, todd@


# 1.58 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.57 22-Jul-2008 bluhm

Assign the struct size to sin6_len instead of sin6_family.
ok hshoexer claudio mpf henning


# 1.56 11-Jun-2008 blambert

0 -> PR_NOWAIT (which is defined as 0) in pool_get
as an aid to readability

ok and thinko-catching henning@


# 1.55 09-May-2008 claudio

more rtrequest() to rtrequest1() replacement.
OK henning@


Revision tags: OPENBSD_4_3_BASE
# 1.54 01-Sep-2007 henning

since the
MGET* macros were changed to function calls, there wasn't any
need for the pool declarations and the inclusion of pool.h
From: tbert <bret.lambert@gmail.com>


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE
# 1.53 14-Feb-2007 jsg

Consistently spell FALLTHROUGH to appease lint.
ok kettenis@ cloder@ tom@ henning@


Revision tags: OPENBSD_4_0_BASE
# 1.52 16-Jun-2006 henning

adjust functions dealing with the routing table to take a table ID as
parameter so they can work on alternate tables. table 0 hardcoded for
many callers yet, that will be adapted step by step.
input + ok claudio norby hshoexer


Revision tags: OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE
# 1.51 17-Feb-2005 jfb

miscellaneous typo fixes:
- sturct -> struct (spotted by pedro)
- elimination of consecutive 'the' words

ok jmc@, henning@, krw@, robert@, some whining by jolan@


Revision tags: OPENBSD_3_6_BASE
# 1.50 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.49 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.48 14-Apr-2004 markus

simpler ipsp_aux_match() API; ok henning, hshoexer


Revision tags: OPENBSD_3_3_BASE OPENBSD_3_4_BASE OPENBSD_3_5_BASE UBC_SYNC_A
# 1.47 12-Nov-2002 dhartmei

Check for undersized IP header, found by jbm@, ok angelos@


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.46 09-Jun-2002 itojun

branches: 1.46.2;
whitespace


# 1.45 31-May-2002 angelos

Per-socket policies and authentication. Finally.


Revision tags: OPENBSD_3_1_BASE
# 1.44 18-Feb-2002 angelos

Search the correct ACQUIRE list --- shifflett@nps.navy.mil


# 1.43 23-Jan-2002 art

It looks like there has been one crack smoking and a few cut and pastes.
PR_FREEHEADER should not be set in pool_init by the caller. It shouldn't
be set in pool_init at all. Besides, it's going away soon anyway.


# 1.42 23-Jan-2002 art

Pool deals fairly well with physical memory shortage, but it doesn't deal
well (not at all) with shortages of the vm_map where the pages are mapped
(usually kmem_map).

Try to deal with it:
- group all information the backend allocator for a pool in a separate
struct. The pool will only have a pointer to that struct.
- change the pool_init API to reflect that.
- link all pools allocating from the same allocator on a linked list.
- Since an allocator is responsible to wait for physical memory it will
only fail (waitok) when it runs out of its backing vm_map, carefully
drain pools using the same allocator so that va space is freed.
(see comments in code for caveats and details).
- change pool_reclaim to return if it actually succeeded to free some
memory, use that information to make draining easier and more efficient.
- get rid of PR_URGENT, noone uses it.


# 1.41 02-Jan-2002 deraadt

at least ; required after label or case; openbsd@davidkrause.com


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.40 24-Sep-2001 angelos

branches: 1.40.4;
Reset the error return value if the cached TDB matches the
policy. Pointed out by jdmcbride@iol.ie


# 1.39 21-Aug-2001 angelos

When the outgoing socket has BYPASS set, don't bother calling the
PCB-checking routine.


# 1.38 15-Aug-2001 niklas

bcmp done wrong, detected at bakeoff. Hint: always use
relational operators when using *cmp APIs in conditional expressions.


# 1.37 06-Aug-2001 angelos

Don't drop packets if we're using an ACQUIRE policy and some error
occurs while notifying key mgmt; also, always check for new TDBs for
policies where the destination gateway is left unspecified (end-to-end
IPsec case), to avoid asking for new SAs from key mgmt.


# 1.36 27-Jun-2001 angelos

Use TAILQ_FOREACH() instead of hand-crafted for loops.


# 1.35 27-Jun-2001 angelos

When determining whether there's a pending acquire wrt a policy, look
at the acquires associated with the policy only.


# 1.34 27-Jun-2001 angelos

Attach IPsec acquire state to policy entries, and relevant cleanups.


# 1.33 27-Jun-2001 angelos

Don't cache packets that hit policies -- we'll do that at the PCB for
local packets.


# 1.32 26-Jun-2001 angelos

Use the ACQUIRE sequence number to "wake up" acquire state kept and
cause retransmission of outgoing packets. Also, only store outgoing
packets -- just drop incoming packets that cause an SA
acquisition. Some comment fixup.


# 1.31 26-Jun-2001 angelos

ifdef out some currently unused code


# 1.30 26-Jun-2001 angelos

Rewrite ipsp_clear_acquire() to be more readable, after all the KNF'ing


# 1.29 26-Jun-2001 angelos

Use pool(9) for IPsec acquires too.


# 1.28 26-Jun-2001 angelos

Use pool(9) for IPsec policy structures.


# 1.27 26-Jun-2001 angelos

Keep the PFKEY sequence number at the TDB, plus a little bit of KNF


# 1.26 26-Jun-2001 angelos

KNF


# 1.25 25-Jun-2001 angelos

Copyright.


# 1.24 24-Jun-2001 mickey

use new timeouts for spd expirations; ho@ ok


# 1.23 08-Jun-2001 angelos

Trim include files.


# 1.22 07-Jun-2001 angelos

Simplify SPD logic (and correct some input cases).


# 1.21 30-May-2001 angelos

Match prototype.


# 1.20 30-May-2001 angelos

Correctly free information attached to the policy.


# 1.19 05-May-2001 angelos

branches: 1.19.2;
Check that SAs also match on the credentials and the IDs. This means
that flows with different source/destination ID requirements will
cause different SAs to be established by IKE (or whatever other
protocol). Also, use the new data types for allocated memory.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Apr-2001 art

Missing splx in error handling.


# 1.17 14-Apr-2001 angelos

Minor changes, preparing for real socket-attached TDBs; also, more
information will be stored in the TDB. ok ho@ provos@


# 1.16 10-Apr-2001 provos

allow host-to-host negotiations if no gateway has been specified.
from angelos@


# 1.15 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.14 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.13 15-Mar-2001 bjc

include <machine/cpu.h>, since schednetisr needs to do a splsoftnet


# 1.12 28-Feb-2001 angelos

Pretty.


# 1.11 28-Feb-2001 angelos

Handle failures more gracefully.


# 1.10 28-Feb-2001 angelos

Keep the last packet sent or received that matched an SPD entry, and
retransmit if we eventually have an SA setup for that policy.


# 1.9 14-Dec-2000 angelos

Compile in non-INET6 kernels.


# 1.8 14-Dec-2000 angelos

Always look for a suitable TDB if the gateway is left unspecified.


# 1.7 17-Nov-2000 angelos

All-1s addresses as policy destinations is also reserved for future
use (policy discovery).


Revision tags: OPENBSD_2_8_BASE
# 1.6 18-Oct-2000 chris

branches: 1.6.2;
Fix compile error if lacking -DINET6


# 1.5 14-Oct-2000 angelos

ASKPOLICY message; used by key management to inquire about policy
triggering an ACQUIRE.


# 1.4 29-Sep-2000 angelos

Make sure there's enough data on the mbuf for the TCP/UDP ports (if
applicable) -- bug located thanks to a crashdump from HJungheim@vpnet.com


# 1.3 27-Sep-2000 angelos

Fix checking for incoming packets when the remote gateway has been
fully specified in the flow.


# 1.2 20-Sep-2000 angelos

Add IDENTITY payloads to flow establishment (and cleanup accordingly)
-- this will address one of itojun's question on how are IDs for IKE
to be determined (need to add support for this to ipsecadm).


# 1.1 19-Sep-2000 angelos

Lots and lots of changes.