History log of /openbsd-current/sys/netinet/ipsec_input.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.206 16-Sep-2023 mpi

Allow counters_read(9) to take an optional scratch buffer.

Using a scratch buffer makes it possible to take a consistent snapshot of
per-CPU counters without having to allocate memory.

Makes ddb(4) show uvmexp command work in OOM situations.

ok kn@, mvs@, cheloha@


# 1.205 07-Aug-2023 dlg

add the glue between ipsec security associations and sec(4) interfaces.

if TDBF_IFACE is set on a tdb, the ipsec stack will pass it to the
sec(4) driver to keep track of instead of wiring it up for security
associations to use.

when sec(4) transmits a packet, it will look up it's list of tdbs
to find the right SA to encrypt and send the packet out with.

if an incoming ipsec packet arrives with TDBF_IFACE set, it's passed
to sec(4) to be injected back into the network stack as if it was
received on the sec interface, instead of being reinjected into the
IP stack like normal SA/SPD processing does.

note that this means you do not have to configure tunnel endpoints
on sec(4) interfaces, instead you line the interface unit number
in the ipsec config up with the minor number of the sec(4) interfaces.
the peer IPs used on the SAs are what's used as the traffic endpoints.

support from many including markus@ tobhe@ claudio@ sthen@ patrick@
now is a good time deraadt@


# 1.204 13-May-2023 bluhm

Instead of implementing IPv4 header checksum creation everywhere,
introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out().
OK claudio@


Revision tags: OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.203 22-Feb-2022 guenther

Delete unnecessary #includes of <netinet6/ip6protosw.h>: some never
needed it and some no longer need it after moving the externs from
there to <sys/protosw.h>

ok jsg@


# 1.202 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.201 23-Dec-2021 bluhm

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@


# 1.200 22-Dec-2021 tobhe

Consolidate enc_getif() lookups in IPsec input path to save one lookup
per packet and improve readability.

ok bluhm@


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.205 07-Aug-2023 dlg

add the glue between ipsec security associations and sec(4) interfaces.

if TDBF_IFACE is set on a tdb, the ipsec stack will pass it to the
sec(4) driver to keep track of instead of wiring it up for security
associations to use.

when sec(4) transmits a packet, it will look up it's list of tdbs
to find the right SA to encrypt and send the packet out with.

if an incoming ipsec packet arrives with TDBF_IFACE set, it's passed
to sec(4) to be injected back into the network stack as if it was
received on the sec interface, instead of being reinjected into the
IP stack like normal SA/SPD processing does.

note that this means you do not have to configure tunnel endpoints
on sec(4) interfaces, instead you line the interface unit number
in the ipsec config up with the minor number of the sec(4) interfaces.
the peer IPs used on the SAs are what's used as the traffic endpoints.

support from many including markus@ tobhe@ claudio@ sthen@ patrick@
now is a good time deraadt@


# 1.204 13-May-2023 bluhm

Instead of implementing IPv4 header checksum creation everywhere,
introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out().
OK claudio@


Revision tags: OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.203 22-Feb-2022 guenther

Delete unnecessary #includes of <netinet6/ip6protosw.h>: some never
needed it and some no longer need it after moving the externs from
there to <sys/protosw.h>

ok jsg@


# 1.202 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.201 23-Dec-2021 bluhm

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@


# 1.200 22-Dec-2021 tobhe

Consolidate enc_getif() lookups in IPsec input path to save one lookup
per packet and improve readability.

ok bluhm@


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.204 13-May-2023 bluhm

Instead of implementing IPv4 header checksum creation everywhere,
introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out().
OK claudio@


Revision tags: OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.203 22-Feb-2022 guenther

Delete unnecessary #includes of <netinet6/ip6protosw.h>: some never
needed it and some no longer need it after moving the externs from
there to <sys/protosw.h>

ok jsg@


# 1.202 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.201 23-Dec-2021 bluhm

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@


# 1.200 22-Dec-2021 tobhe

Consolidate enc_getif() lookups in IPsec input path to save one lookup
per packet and improve readability.

ok bluhm@


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.203 22-Feb-2022 guenther

Delete unnecessary #includes of <netinet6/ip6protosw.h>: some never
needed it and some no longer need it after moving the externs from
there to <sys/protosw.h>

ok jsg@


# 1.202 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.201 23-Dec-2021 bluhm

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@


# 1.200 22-Dec-2021 tobhe

Consolidate enc_getif() lookups in IPsec input path to save one lookup
per packet and improve readability.

ok bluhm@


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.202 04-Jan-2022 yasuoka

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 1.201 23-Dec-2021 bluhm

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@


# 1.200 22-Dec-2021 tobhe

Consolidate enc_getif() lookups in IPsec input path to save one lookup
per packet and improve readability.

ok bluhm@


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.201 23-Dec-2021 bluhm

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@


# 1.200 22-Dec-2021 tobhe

Consolidate enc_getif() lookups in IPsec input path to save one lookup
per packet and improve readability.

ok bluhm@


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.201 23-Dec-2021 bluhm

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@


# 1.200 22-Dec-2021 tobhe

Consolidate enc_getif() lookups in IPsec input path to save one lookup
per packet and improve readability.

ok bluhm@


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.199 20-Dec-2021 mvs

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 1.198 20-Dec-2021 bluhm

Fix function name in panic string.


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.197 08-Dec-2021 bluhm

Start documenting the locking strategy of struct tdb fields. Note
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in
udpencap_ctlinput() to protect the access to tdb_snext. Make the
braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED()
into the functions where the read access happens.
OK mvs@


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.196 02-Dec-2021 bluhm

ipsec_common_input_cb() extracted the inner IP header of IPsec
tunnels. It is never used, so this is useless code. Remove ipn
and ip6n IP header variables and the m_copydata() to fill them.
OK mvs@ kn@ sthen@


# 1.195 02-Dec-2021 bluhm

Allow to build kernel without IPSEC or INET6 defines.
OK mpi@ mvs@


# 1.194 01-Dec-2021 bluhm

Let ipsp_spd_lookup() return an error instead of a TDB. The TDB
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.193 25-Nov-2021 bluhm

Implement reference counting for IPsec tdbs. Not all cases are
covered yet, more ref counts to come. The timeouts are protected,
so the racy tdb_reaper() gets retired. The tdb_policy_head, onext
and inext lists are protected. All gettdb...() functions return a
tdb that is ref counted and has to be unrefed later. A flag ensures
that tdb_delete() is called only once.
Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.192 21-Nov-2021 bluhm

Fix whitespace and long lines.


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.191 11-Nov-2021 bluhm

Do not call ip_deliver() recursively from IPsec. As there is no
crypto task anymore, it is possible to return the next protocol.
Then ip_deliver() will walk the header chain in its loop.
IPsec bridge(4) tested by jan@
OK mvs@ tobhe@ jan@


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.190 01-Nov-2021 bluhm

In ipsec_common_input_cb() pass mbuf pointer to pf_test() so that
all callers get an update if the mbuf changes.
OK tobhe@


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.189 24-Oct-2021 bluhm

Remove code duplication by merging the v4 and v6 input functions
for ah, esp, and ipcomp. Move common code into ipsec_protoff()
which finds the offset of the next protocol field in the previous
header.
OK tobhe@


# 1.188 24-Oct-2021 bluhm

There are more m_pullup() in IPsec input. Pass down the pointer
to the mbuf to update it globally. At the end it will reach
ip_deliver() which expects a pointer to an mbuf.
OK sashan@


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.187 23-Oct-2021 bluhm

There is an m_pullup() down in AH input. As it may free or change
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@


# 1.186 23-Oct-2021 tobhe

Retire asynchronous crypto API as it is no longer required by any driver and
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.

Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().

ok bluhm@ mvs@ patrick@


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.185 22-Oct-2021 bluhm

Make error handling in IPsec consistent. Pass errors to the callers.
OK tobhe@


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.184 13-Oct-2021 bluhm

Remove redundant NULL checks in IPsec which are never reached.
ok mvs@


# 1.183 13-Oct-2021 bluhm

The function crypto_dispatch() never returns an error. Make it
void and remove error handling in the callers.
OK patrick@ mvs@


# 1.182 05-Oct-2021 bluhm

Cleanup the error handling in ipsec ipip_output() and consistently
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6
case. Also use combined packet and byte counter.
OK sthen@ dlg@


# 1.181 05-Oct-2021 bluhm

Move setting ipsec mtu into a function. The NULL and invalid check
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu()
does that anyway. udpencap_ctlinput() did not work for bundled SA,
this also needs the loop in ipsec_set_mtu().
OK sthen@


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.180 29-Sep-2021 bluhm

Global variables to track initialisation behave poorly with MP.
Move the tdb pool init into an init function.
OK mvs@


Revision tags: OPENBSD_7_0_BASE
# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.179 27-Jul-2021 mvs

Revert "Use per-CPU counters for tunnel descriptor block" diff.

Panic reported by Hrvoje Popovski.


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.178 26-Jul-2021 mvs

Use per-CPU counters for tunnel descriptor block (tdb) statistics.
'tdb_data' struct became unused and was removed.

ok bluhm@


# 1.177 26-Jul-2021 bluhm

Do not queue crypto operations for IPsec. The packet entries in
task queues were unlimited and could overflow during havy traffic.
Even if we still use hardware drivers that sleep, softnet task
instead of soft interrupt can handle this now. Without queues net
lock is inherited and kernel lock is only needed once per packet.
This results in less lock contention and faster IPsec.
Also protect tdb drop counters with net lock and avoid a leak in
crypto dispatch error handling.
intense testing Hrvoje Popovski; OK mpi@


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.176 21-Jul-2021 bluhm

Also count crypto errors in ipsec_input_cb() like IPsec output in
previous commit.


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.175 08-Jul-2021 bluhm

Debug printfs in encdebug were inconsistent, some missing newlines
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.174 18-Jun-2021 bluhm

The crypto(9) framework used by IPsec runs on a kernel task that
is protected by kernel lock. There were crashes in swcr_authenc()
when it was accessing swcr_sessions. As a quick fix, protect all
calls from network stack to crypto with kernel lock. This also
covers the rekeying case that is called from pfkey via tdb_init().
OK mvs@


Revision tags: OPENBSD_6_8_BASE OPENBSD_6_9_BASE
# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.173 01-Sep-2020 gnezdo

Convert *_sysctl in ipsec_input.c to sysctl_bounded_arr

The best-guessed limits will be tested by trial.


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.172 01-Aug-2020 gnezdo

Move range check inside sysctl_int_arr

Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.

OK kn@


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.171 24-Jun-2020 cheloha

kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)

time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.

This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).

There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.

There is no performance cost on 64-bit (__LP64__) platforms.

With input from visa@, dlg@, and tedu@.

Several bugs squashed by visa@.

ok kettenis@


Revision tags: OPENBSD_6_7_BASE
# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.170 23-Apr-2020 tobhe

Add support for autmatically moving traffic between rdomains on ipsec(4)
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.

The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.

The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.

Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.

As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.

discussed with chris@ and kn@
ok markus@, patrick@


Revision tags: OPENBSD_6_6_BASE
# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.169 30-Sep-2019 dlg

remove the "copy function" argument to bpf_mtap_hdr.

it was previously (ab)used by pflog, which has since been fixed.
apart from that nothing else used it, so we can trim the cruft.

ok kn@ claudio@ visa@
visa@ also made sure i fixed ipw(4) so i386 won't break.


Revision tags: OPENBSD_6_5_BASE
# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.168 09-Nov-2018 claudio

Remove the last few XXX rdomain markers. Even those functions respect the
rdomain now and are therefor rdomain save.
OK mpi@


Revision tags: OPENBSD_6_4_BASE
# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.167 14-Sep-2018 mestre

Initialize the TDB to NULL in ipsec_common_input() and
ipsec_{input,output}_cb() so that in the case of sending or receiving a bogus
mbuf (NULL) we don't end up trying to dereference the TDB, while being an
uninitialized pointer, to increase the drops.

Coverity IDs 1473312, 1473313 and 1473317.

OK mpi@ visa@


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.166 28-Aug-2018 mpi

Add per-TDB counters and a new SADB extension to export them to
userland.

Inputs from markus@, ok sthen@


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.165 11-Jul-2018 mpi

Convert AH & IPcomp to ipsec_input_cb() and count drops on input.

ok markus@


# 1.164 10-Jul-2018 mpi

Introduce new IPsec (per-CPU) statistics and refactor ESP input
callbacks to be able to count dropped packet.

Having more generic statistics will help troubleshooting problems
with specific tunnels. Per-TDB counters are coming once all the
refactoring bits are in.

ok markus@


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.163 14-May-2018 bluhm

When checking the IPsec enable sysctls, ipsec_common_input() had
switches for protocol and address family. Move this code to the
specific functions from where the common function is called.
As a consequence the raw ip input functions can never be called
from udp_input() anymore. If IPsec is disabled, the functions
ah6_input(), esp6_input(), and ipcomp6_input() do not start processing
the header chain. The raw ip input functions are called with the
mbuf and offset pointers from the protocol walking loop which is
the usual behavior.
OK mpi@ markus@


# 1.162 12-May-2018 bluhm

Cleanup IPsec common input error handling with consistent goto drop.
from markus@; OK mpi@


Revision tags: OPENBSD_6_3_BASE
# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c


# 1.161 20-Nov-2017 mpi

Sprinkle some NET_ASSERT_LOCKED(), const and co to prepare running
pr_input handlers without KERNEL_LOCK().

ok visa@


# 1.160 14-Nov-2017 mpi

Introduce ipsec_sysctl() and move IPsec tunables where they belong.

ok bluhm@, visa@


# 1.159 08-Nov-2017 visa

Make {ah,esp,ipcomp}stat use percpu counters.

OK bluhm@, mpi@


# 1.158 06-Nov-2017 mpi

Use %s and __func__ in DPRINTF() to reduce false positive with grep(1).

ok kettenis@, dhill@, visa@, jca@


# 1.157 09-Oct-2017 mpi

Reduces the scope of the NET_LOCK() in sysctl(2) path.

Exposes per-CPU counters to real parrallelism.

ok visa@, bluhm@, jca@


Revision tags: OPENBSD_6_2_BASE
# 1.156 05-Jul-2017 bluhm

The IP in IP input function strips the outer header and reinserts
the inner IP packet into the internet queue. The IPv6 local delivery
code has a loop to deal with header chains. The idea is to use
this loop and avoid the queueing and rescheduling. The IPsec packet
will be processed in a single flow.
Merge the IP deliver loop from both IP versions into a single
ip_deliver() function that can handle both addresss families. This
allows to process an IP in IP header like a normal extension header.
If af != AF_UNSPEC, we are already in a deliver loop and have the
kernel look. Then we can just return the next protocol. Otherwise
we enqueue. The dequeue thread has the kernel lock and starts an
IP delivery loop.
OK mpi@


# 1.155 19-Jun-2017 bluhm

When dealing with mbuf pointers passed down as function parameters,
bugs could easily result in use-after-free or double free. Introduce
m_freemp() which automatically resets the pointer before freeing
it. So we have less dangling pointers in the kernel.
OK krw@ mpi@ claudio@


# 1.154 28-May-2017 bluhm

Rename ip_local() to ip_deliver() and give it the same parameters
as the pr_input functions. Add an assert that IPv4 delivery ends
in IP proto done to assure that IPv4 protocol functions work like
IPv6.
OK mpi@


# 1.153 22-May-2017 bluhm

Move IPsec forward and local policy check functions to ipsec_input.c
and give them better names.
input and OK mikeb@


# 1.152 16-May-2017 mpi

Replace remaining splsoftassert(IPL_SOFTNET) by NET_ASSERT_LOCKED().

ok visa@


# 1.151 12-May-2017 bluhm

IPsec packets were passed through ip_input() a second time after
they have been decrypted. That means that all the IP header fields
were checked twice. Also fragment reassembly was tried twice.
At pf incoming packets in tunnel mode appeared twice on the enc0
interface, once as IP-in-IP and once as the inner packet. In the
outgoing path pf only sees the inner packet. Asymmetry is bad for
stateful filtering.
IPv6 shows that IPsec works without that. After decrypting immediately
continue with local delivery. In tunnel mode the IP-in-IP protocol
functions pass the inner header to ip6_input(). In transport mode
only pf_test() has to be called for the enc0 device.
Introduce ip_local() to avoid needless processing and cleaner pf
behavior in IPv4 IPsec.
OK mikeb@


# 1.150 12-May-2017 bluhm

Instead of printing a debug message at the end of processing, panic
early if the IPsec security protocol is unknown. ipsec_common_input()
and ipsec_common_input_cb() can only be called with the IP protocols
ESP, AH, or IPComp. Everything else is a programming mistake.
OK claudio@


# 1.149 11-May-2017 bluhm

IPv6 IPsec transport mode did not work if pf is enabled. The
decrypted packets in the input path were not checked with pf. So
with stateful filtering on enc0, direction aware protocols like
ping or TCP did not pass. Add an explicit pf_test() in
ipsec_common_input_cb() for IPv6 transport mode to fix this.
OK mikeb@


# 1.148 05-May-2017 bluhm

Expand SA_LEN(), there is no benefit for using the macro in the
kernel. It was only used in IPsec sources. No binary change
OK deraadt@


# 1.147 14-Apr-2017 bluhm

Pass down the address family through the pr_input calls. This
allows to simplify code used for both IPv4 and IPv6.
OK mikeb@ deraadt@


# 1.146 06-Apr-2017 dhill

Replace bcopy with a simple assignment where both variables are
properly aligned and sockaddr_union fields, or with memcpy when
the memory doesn't overlap.

OK bluhm@


Revision tags: OPENBSD_6_1_BASE
# 1.145 28-Feb-2017 mpi

Some refactoring in ip6_input() needed to un-KERNEL_LOCK() the IPv6
forwarding path.

Rename ip6_ours() in ip6_local() as this function dispatches packets
to the upper layer.

Introduce ip6_ours() and get rid of 'goto hbhcheck'. This function
will be later used to enqueue local packets.

As a bonus this reduces differences with IPv4.

Inputs and ok bluhm@


# 1.144 08-Feb-2017 bluhm

Remove the ipsec protocol callbacks which all do the same. Implement
it in ipsec_common_input_cb() instead. The code that was copied
to ah6_input_cb() is now in ip6_ours() so we can call it directly.
OK mpi@


# 1.143 07-Feb-2017 bluhm

Error propagation does neither make sense for ip input path nor for
asynchronous callbacks. Make the IPsec functions void, there is
already a counter in the error path.
OK mpi@


# 1.142 05-Feb-2017 jca

Use percpu counters for ip6stat

Try to follow the existing examples. Some notes:
- don't implement counters_dec() yet, which could be used in two
similar chunks of code. Let's see if there are more users first.
- stop incrementing IPv6-specific mbuf stats, IPv4 has no equivalent.

Input from mpi@, ok bluhm@ mpi@


# 1.141 29-Jan-2017 bluhm

Change the IPv4 pr_input function to the way IPv6 is implemented,
to get rid of struct ip6protosw and some wrapper functions. It is
more consistent to have less different structures. The divert_input
functions cannot be called anyway, so remove them.
OK visa@ mpi@


# 1.140 26-Jan-2017 bluhm

Reduce the difference between struct protosw and ip6protosw. The
IPv4 pr_ctlinput functions did return a void pointer that was always
NULL and never used. Make all functions void like in the IPv6 case.
OK mpi@


# 1.139 25-Jan-2017 bluhm

Since raw_input() and route_input() are gone from pr_input, we can
make the variable parameters of the protocol input functions fixed.
Also add the proto to make it similar to IPv6.
OK mpi@ guenther@ millert@


# 1.138 23-Jan-2017 mpi

Assert for IPL_SOFTNET rather than raising SPL recursively.

ok benno@


# 1.137 20-Jan-2017 mpi

Kill recursive splsofnet()/splx() dances.

Tested by Hrvoje Popovski, ok visa@


# 1.136 02-Sep-2016 vgross

Drop non-encapulated ESP packets using a UDP-encapsulating TDB, and add
the relevant counters.

Ok mikeb@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.135 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.134 09-Sep-2015 mpi

Kill a couple of if_get()s only needed to increment per-ifp IPv6 stats.

We do not export those per-ifp statistics and they will soon all die.

"We're putting inet6 on a diet" claudio@

ok dlg@, mikeb@, claudio@


Revision tags: OPENBSD_5_8_BASE
# 1.133 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.132 11-Jun-2015 mikeb

Move away from using hzto(9); OK dlg


# 1.131 13-May-2015 jsg

test mbuf pointers against NULL not 0
ok krw@ miod@


# 1.130 17-Apr-2015 mikeb

Stubs and support code for NIC-enabled IPsec bite the dust.
No objection from reyk@, OK markus, hshoexer


# 1.129 14-Apr-2015 mikeb

make ipsp_address thread safe; ok mpi


# 1.128 10-Apr-2015 dlg

replace the use of ifqueues for most input queues serviced by netisr
with niqueues.

this change is so big because there's a lot of code that takes
pointers to different input queues (eg, ether_input picks between
ipv4, ipv6, pppoe, arp, and mpls input queues) and falls through
to code to enqueue packets against the pointer. if i changed only
one of the input queues id have to add sepearate code paths, one
for ifqueues and one for niqueues in each of these places

by flipping all these input queues at once i can keep the currently
common code common.

testing by mpi@ sthen@ and rafael zalamena
ok mpi@ sthen@ claudio@ henning@


# 1.127 26-Mar-2015 mikeb

Remove bits of unfinished IPsec proxy support. DNS' KX records, anyone?
ok markus, hshoexer


Revision tags: OPENBSD_5_7_BASE
# 1.126 24-Jan-2015 deraadt

Userland (base & ports) was adapted to always include <netinet/in.h>
before <net/pfvar.h> or <net/if_pflog.h>. The kernel files can be
cleaned up next. Some sockaddr_union steps make it into here as well.
ok naddy


# 1.125 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.124 05-Dec-2014 mpi

Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.123 20-Nov-2014 krw

Yet more #include de-duplication.

ok deraadt@ tedu@


Revision tags: OPENBSD_5_6_BASE
# 1.122 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.121 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.120 14-Apr-2014 mpi

"struct pkthdr" holds a routing table ID, not a routing domain one.
Avoid the confusion by using an appropriate name for the variable.

Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:

rtableid = rdomain

But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).

claudio@ likes it, ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.119 09-Jan-2014 tedu

bzero/bcmp -> memset/memcmp. ok matthew


# 1.118 11-Nov-2013 mpi

Replace most of our formating functions to convert IPv4/6 addresses from
network to presentation format to inet_ntop().

The few remaining functions will be soon converted.

ok mikeb@, deraadt@ and moral support from henning@


# 1.117 23-Oct-2013 mpi

Remove the number of in_var.h inclusions by moving some functions and
global variables to in.h.

ok mikeb@, deraadt@


# 1.116 17-Oct-2013 bluhm

The header file netinet/in_var.h included netinet6/in6_var.h. This
created a bunch of useless dependencies. Remove this implicit
inclusion and do an explicit #include <netinet6/in6_var.h> when it
is needed.
OK mpi@ henning@


Revision tags: OPENBSD_5_4_BASE
# 1.115 01-Jun-2013 bluhm

Fix typo backswards -> backwards.


# 1.114 24-Apr-2013 mpi

Instead of having various extern declarations for protocol variables,
declare them once in their corresponding header file.


# 1.113 11-Apr-2013 mpi

Remove the extern keyword from function declarations, document
sysctl declarations, move variables and functions used in only
one place in their corresponding file. No functional change.

No objection from markus@, ok mikeb@


# 1.112 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.111 31-Mar-2013 bluhm

Do not transfer diverted packets into IPsec processing. They should
reach the socket that the user has specified in pf.conf.
OK reyk@


# 1.110 28-Mar-2013 tedu

code that calls timeout functions should include timeout.h
slipped by on i386, but the zaurus doesn't automagically pick it up.
spotted by patrick


# 1.109 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.108 26-Sep-2012 markus

add M_ZEROIZE as an mbuf flag, so copied PFKEY messages (with embedded keys)
are cleared as well; from hshoexer@, feedback and ok bluhm@, ok claudio@


# 1.107 20-Sep-2012 blambert

spltdb() was really just #define'd to be splsoftnet(); replace the former
with the latter

no change in md5 checksum of generated files

ok claudio@ henning@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.106 22-Dec-2011 sperreault

Fix RFC reference section

spotted by bluhm@, ok yasuoka@


# 1.105 21-Dec-2011 sperreault

Compute mandatory UDP checksum for IPv6 packets

ok yasuoka@ bluhm@


# 1.104 19-Dec-2011 yasuoka

Fix checksum of UDP/TCP packets following RFC 3948. This is required for
transport mode IPsec NAT-T.

ok markus


Revision tags: OPENBSD_5_0_BASE
# 1.103 26-Apr-2011 bluhm

In ipsec_common_input() the packet can be either IPv4 or IPv6. So
pass it to the correct raw ip input function if IPsec is disabled.
ok todd@ mpf@ mikeb@ blambert@ matthew@ deraadt@


# 1.102 06-Apr-2011 markus

uncompress a packet with an IPcomp header only once; this prevents
endless loops by IPcomp-quine attacks as discovered by Tavis Ormandy;
it also prevents nested IPcomp-IPIP-IPcomp attacks provied by matthew@;
feedback and ok matthew@, deraadt@, djm@, claudio@


# 1.101 03-Apr-2011 henning

don't rely on implict net/route.h inclusion via pf, claudio ok


# 1.100 05-Mar-2011 bluhm

The function pf_tag_packet() never fails. Remove a redundant check
and make it void.
ok henning@, markus@, mcbride@


Revision tags: OPENBSD_4_9_BASE
# 1.99 21-Dec-2010 markus

don't leak short packets; ok mikeb@


Revision tags: OPENBSD_4_8_BASE
# 1.98 09-Jul-2010 reyk

Add support for using IPsec in multiple rdomains.

This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.

Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.

ok claudio@ naddy@


# 1.97 01-Jul-2010 reyk

Allow to specify an alternative enc(4) interface for an SA. All
traffic for this SA will appear on the specified enc interface instead
of enc0 and can be filtered and monitored separately. This will allow
to group individual ipsec policies to virtual interfaces and
simplifies monitoring and pf filtering with many ipsec policies a lot.

This diff includes the following changes:
- Store the enc interface unit (default 0) in the TDB of an SA and pass
it to the enc_getif() lookup when running the bpf or pf_test() handlers.
- Add the pfkey SADB_X_EXT_TAP extension to communicate the encX
interface unit for a specified SA between userland and kernel.
- Update enc(4) again to use an allocate array instead of the TAILQ to
lookup the matching enc interface in enc_getif() quickly.

Discussed with many, tested by a few, will need more testing & review.

ok deraadt@


# 1.96 29-Jun-2010 reyk

Replace enc(4) with a new implementation as a cloner device. We still
create enc0 by default, but it is possible to add additional enc
interfaces. This will be used later to allow alternative encs per
policy or to have an enc per rdomain when IPsec becomes rdomain-aware.

manpage bits ok jmc@
input from henning@ deraadt@ toby@ naddy@
ok henning@ claudio@


# 1.95 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


Revision tags: OPENBSD_4_7_BASE
# 1.94 02-Jan-2010 markus

uninitalized protocol version for ipv6; from mickey; ok claudio


# 1.93 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


# 1.92 09-Aug-2009 henning

once again ipsec tries to be clever and plays fast, this time by
recycling an mbuf tag and changing its type. just always get a new one.
theo ok


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.91 22-Oct-2008 mpf

#if INET => #ifdef INET
#if INET6 => #ifdef INET6


# 1.90 22-Oct-2008 markus

filter ipv6 ipsec packets on enc0 (in and out), similar to ipv4;
ok bluhm, fries, mpf; fixes pr 4188


# 1.89 26-Aug-2008 henning

call pf_pkt_addr_changed instead of manually clearing the pf state key ptr


Revision tags: OPENBSD_4_4_BASE
# 1.88 24-Jul-2008 henning

ipsec is glued into the stack in a very weird way, violating all kinds
of expected semantics. thus, for return packets coming out of an ipsec
tunnel, we need to clear the pf state key pointer in the mbuf header
to prevent a state for encapsulated traffic to be linked to the
decapsulated traffic one.
problem noticed by Oleg Safiullin <form@pdp-11.org.ru>, took me some
time to understand what the hell was going on. ok ryan


# 1.87 14-Jun-2008 todd

make easier to read, found during a bug hunt earlier
ok bluhm@


# 1.86 11-Jun-2008 canacar

fix an old typo that prevented outer ipv6 headers from being corrected,
also fix the correction amount. This was only really visible on tcpdump,
as a "truncated-ip6 - 48 bytes missing" warning. The inner packet made
it into the stack just fine, minus a few sanity checks.
reported by and debuged together with and ok todd@


Revision tags: OPENBSD_4_3_BASE
# 1.85 14-Dec-2007 deraadt

add sysctl entry points into various network layers, in particular to
provide netstat(1) with data it needs; ok claudio reyk


Revision tags: OPENBSD_4_2_BASE
# 1.84 28-May-2007 henning

double pf performance.
boring details:
pf used to use an mbuf tag to keep track of route-to etc, altq, tags,
routing table IDs, packets redirected to localhost etc. so each and every
packet going through pf got an mbuf tag. mbuf tags use malloc'd memory,
and that is knda slow.
instead, stuff the information into the mbuf header directly.
bridging soekris with just "pass" as ruleset went from 29 MBit/s to
58 MBit/s with that (before ryan's randomness fix, now it is even betterer)
thanks to chris for the test setup!
ok ryan ryan ckuethe reyk


Revision tags: OPENBSD_4_1_BASE
# 1.83 08-Feb-2007 itojun

- AH: when computing crypto checksum for output, massage source-routing
header.
- ipsec_input: fix mistake in IPv6 next-header chasing.
- ipsec_output: look for the position to insert AH more carefully.
- ip6_output: enable use of AH with extension headers.
avoid tunnellinng when source-routing header is present.

ok by deraad, naddy, hshoexer


# 1.82 15-Dec-2006 otto

make enc(4) count; ok markus@ henning@ deraadt@


# 1.81 05-Dec-2006 markus

do not install pmtu routes for transport mode SAs, as they do not
the dest IP; PMTU debugging support; ok hshoexer


# 1.80 24-Nov-2006 reyk

add support to tag ipsec traffic belonging to specific IKE-initiated
phase 2 traffic. this allows policy-based filtering of encrypted and
unencrypted ipsec traffic with pf(4). see ipsec.conf(5) and
isakmpd.conf(5) for details and examples.

this is work in progress and still needs some testing and feedback,
but it is safe to put it in now.

ok hshoexer@


Revision tags: OPENBSD_4_0_BASE
# 1.79 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.78 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.77 13-Jan-2006 mpf

Path MTU discovery for NAT-T.
OK markus@, "looks good" hshoexer@


Revision tags: OPENBSD_3_8_BASE
# 1.76 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


Revision tags: OPENBSD_3_7_BASE
# 1.75 25-Nov-2004 markus

resolve conflict between M_TUNNEL and M_ANYCAST6, remove M_COMP (it's
only set and never read), update documentation; ok fgsch, deraadt, millert


Revision tags: OPENBSD_3_6_BASE
# 1.74 21-Jun-2004 tholo

First step towards more sane time handling in the kernel -- this changes
things such that code that only need a second-resolution uptime or wall
time, and used to get that from time.tv_secs or mono_time.tv_secs now get
this from separate time_t globals time_second and time_uptime.

ok art@ niklas@ nordin@


# 1.73 21-Jun-2004 itojun

make it possble to use IPsec over link-local address (policy table uses
sin6_scope_id, IPsec porion uses embedded form). beck ok


Revision tags: SMP_SYNC_A SMP_SYNC_B
# 1.72 18-Apr-2004 markus

pass esp/ah/ipcmp to rawip if processing is disabled with sysctl;
allows userland ipsec; tested by sturm@; ok deraadt@, ho@, hshoexer@


Revision tags: OPENBSD_3_5_BASE
# 1.71 17-Feb-2004 markus

switch to sysctl_int_arr(); ok henning, deraadt


# 1.70 02-Dec-2003 markus

UDP encapsulation for ESP in transport mode (draft-ietf-ipsec-udp-encaps-XX.txt)
ok deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.69 28-Jul-2003 markus

allow gif(4) over ipsec: mark mbuf for transport mode SA,
so in_gif_input can detect whether a proto 4 header is due
to ipsec tunnel mode or gif(4) encapsulation; fixes pr 3023
ok itojun@. provos@ and angelos@ agree; tested by sturm@


# 1.68 24-Jul-2003 markus

update ip_len to reflect tunnel header removal (lost duing ip_len
flip changes); ok itojun; noticed by jrrs@ice-nine.org


# 1.67 09-Jul-2003 itojun

do not flip ip_len/ip_off in netinet stack. deraadt ok.
(please test, especially PF portion)


# 1.66 08-Jul-2003 markus

make sure the packets contains a complete inner header
for ip{4,6}-in-ip{4,6} encapsulation; fixes panic
for truncated ip-in-ip over ipsec; ok angelos@


# 1.65 04-Jul-2003 markus

knf typo


Revision tags: UBC_SYNC_A
# 1.64 03-May-2003 itojun

just as a safety measure, set m_flags to 0 for mbufs allocated on stack.
dhartmei ok


Revision tags: OPENBSD_3_3_BASE
# 1.63 20-Feb-2003 deraadt

knf


# 1.62 20-Feb-2003 jason

If there's no tag to be reset, don't reset it (avoids a NULL deref in the IPCOMP case)


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.61 28-Jun-2002 angelos

Fix usage counter for IPCOMP --- sam@errno.com


# 1.60 25-Jun-2002 angelos

Forgot variable.


# 1.59 25-Jun-2002 angelos

Handle correctly return values from xf_input methods --- since the
return value was ignored anyway, this wasn't a problem so far. From
sam@errno.com


# 1.58 13-Jun-2002 angelos

Remove whitespace from the end of the file.


# 1.57 09-Jun-2002 itojun

whitespace


# 1.56 09-Jun-2002 angelos

Set/clear M_AUTH_AH.


Revision tags: OPENBSD_3_1_BASE
# 1.55 23-Jan-2002 provos

disable pmtu for ipsec when the sysctl says so; bug report cjkim2000@yahoo.com


Revision tags: UBC_BASE
# 1.54 06-Dec-2001 angelos

branches: 1.54.2;
Use hzto() to handle overflow of (hz * timeout) cases --- when using
extremely long SA expirations.


Revision tags: OPENBSD_3_0_BASE
# 1.53 09-Aug-2001 angelos

Don't check the source address on the packet vs. the one on the SA, as
this prevents use of ESP in mobility; pointed out on the IETF mailing
list by Francis Dupont.


# 1.52 08-Aug-2001 jjbg

Remove IPCOMP option, it's now part of IPSEC option. You still need to
enable ipcomp via sysctl to use it. deraadt@ ok.


# 1.51 07-Aug-2001 deraadt

enable ah & esp by default, now that we trust the code more


# 1.50 06-Jul-2001 jjbg

Don't use enc0 interface for IPComp. angelos@ ok.


# 1.49 05-Jul-2001 jjbg

IPComp support. angelos@ ok.


# 1.48 26-Jun-2001 angelos

KNF


# 1.47 25-Jun-2001 angelos

Copyright.


# 1.46 24-Jun-2001 provos

path mtu discovery for ipsec. on receiving a need fragment icmp match
against active tdb and store the ipsec header size corrected mtu


# 1.45 23-Jun-2001 fgsch

Remove unneeded ip_id convertions.
Instead of using HTONS macro in some places, use htons directly in the
struct member and save us a few bytes.
Fix comment.


# 1.44 19-Jun-2001 deraadt

mop up after angelos


# 1.43 08-Jun-2001 angelos

Trim include files.


# 1.42 05-Jun-2001 angelos

Add a few DPRINTF()'s


# 1.41 29-May-2001 angelos

Record last use time for SAs.


# 1.40 27-May-2001 angelos

If we are passed a packet tag, it's an IPSEC_IN_CRYPTO_DONE so convert
it to IPSEC_IN_DONE, rather than adding a new one.


# 1.39 27-May-2001 angelos

Forgot to convert this tag.


# 1.38 20-May-2001 angelos

Use packet tags to signal input IPsec processing to upper layer protocols.


# 1.37 11-May-2001 aaron

Check m_pullup() and m_pullup2() return for NULL, not 0; itojun@ ok


Revision tags: OPENBSD_2_9_BASE
# 1.36 06-Apr-2001 csapuntz

Move offsetof define into sys/param.h


# 1.35 30-Mar-2001 angelos

Protect the IF_XXX macros in the callback routines with splimp(). Doh!

Thanks to erik@ipunplugged.com


# 1.34 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.33 15-Mar-2001 mickey

convert SA expirations to the new timeouts.
simplifies expirations handling a lot.
tdb_exp_timeout and tdb_soft_timeout are made
consistant throughout the code to be a relative time offsets,
just like first_use timeouts.
tested on singlehost isakmpd setup.
lots of dangling spaces and tabs removed.
angelos@ ok


Revision tags: OPENBSD_2_8_BASE
# 1.32 19-Sep-2000 angelos

Lots and lots of changes.


# 1.31 17-Sep-2000 angelos

Drop dubious ESP/AH packets without crashing (thanks to dr@kyx.net and
mfranz@cisco.com for finding the problem).


# 1.30 11-Jul-2000 millert

Correctly handle ip_off; angelos@


# 1.29 20-Jun-2000 itojun

do not play with rcvif, if the traffic is non-IPv4.
by setting rcvif to enc*, we break IPv6 scope considerations.


# 1.28 19-Jun-2000 itojun

correct header chasing code. take care of AH length.


# 1.27 18-Jun-2000 angelos

Arguments.


# 1.26 18-Jun-2000 angelos

Use ip6_sprintf() rather than the home-cooked inet6_ntoa4()


# 1.25 18-Jun-2000 itojun

IPv6 AH/ESP support, inbound side only. tested with KAME.


# 1.24 18-Jun-2000 angelos

Remove outdated comment.


Revision tags: OPENBSD_2_7_BASE
# 1.23 29-Mar-2000 angelos

branches: 1.23.2;
Be consistent about packet properties.


# 1.22 29-Mar-2000 angelos

Fix problem with TCP/UDP and ACLs.


# 1.21 29-Mar-2000 angelos

Minor cleanup.


# 1.20 17-Mar-2000 angelos

Cryptographic services framework, and software "device driver". The
idea is to support various cryptographic hardware accelerators (which
may be (detachable) cards, secondary/tertiary/etc processors,
software crypto, etc). Supports session migration between crypto
devices. What it doesn't (yet) support:
- multiple instances of the same algorithm used in the same session
- use of multiple crypto drivers in the same session
- asymmetric crypto

No support for a userland device yet.

IPsec code path modified to allow for asynchronous cryptography
(callbacks used in both input and output processing). Some unrelated
code simplification done in the process (especially for AH).

Development of this code kindly supported by Network Security
Technologies (NSTI). The code was writen mostly in Greece, and is
being committed from Montreal.


Revision tags: SMP_BASE
# 1.19 07-Feb-2000 itojun

branches: 1.19.2;
fix include file path related to ip6.


# 1.18 27-Jan-2000 angelos

Merge "old" and "new" ESP and AH in two files (one for each).
Fix a couple of buglets with ingress flow deletion.
tcpdump on enc0 should now show all outgoing packets *before* being
processed, and all incoming packets *after* being processed.

Good to be in Canada (land of the free commits).


# 1.17 25-Jan-2000 espie

Ok, so setsoftnet is md.

Well, on the amiga, setsoftnet *REQUIRES* machine/cpu.h to work...
and no include mentioned in those files pulls machine/cpu.h...

Nit-fix: / * INET6 */ -> /* INET6 */


# 1.16 15-Jan-2000 angelos

Remove unnecessary definition.


# 1.15 15-Jan-2000 angelos

Add function prototype.


# 1.14 15-Jan-2000 angelos

Change function type to non-static.


# 1.13 10-Jan-2000 angelos

1) Setup a silent TDB expiration for embryonic SAs.
2) Fix check_ipsec_policy() to deal with v6 PCBs.
3) Fix ACL protocol check.


# 1.12 10-Jan-2000 angelos

Fix tdbi setup for TCP and UDP packets.


# 1.11 10-Jan-2000 angelos

Typo.


# 1.10 10-Jan-2000 angelos

Quick-drop packets (before real processing) if ingress filtering is on
and the SA ACL is empty.


# 1.9 10-Jan-2000 angelos

Fix error message.


# 1.8 09-Jan-2000 angelos

Add ingress ACL for IPsec: after being processed, IPsec packets are
matched against a list of acceptable packet classes, if
sysctl variable net.inet.ip.ipsec-acl is set to 1.


# 1.7 08-Jan-2000 angelos

Fix serious crash-and-burn bug I introduced with last revision.


# 1.6 03-Jan-2000 angelos

Chase down the IPv6 header chain to find the right place swap the Next
Payload value. Note to self: it would be nice if we had a very of
m_copydata() with memory (so it wouldn't need to start the search from
the begining of the mbuf).


# 1.5 02-Jan-2000 angelos

Move the requeueing logic from ipsec_input() to ah_input() and
esp_input(), since this is only needed for IPv4; IPv6 header
processing follows a different approach.


# 1.4 02-Jan-2000 angelos

Change ipsec_input() to return error.


# 1.3 31-Dec-1999 itojun

fix IPv6 ipsec template lossage.
- previous code grabbed new nexthdr mistakingly
- parameter passing must follow ip6protows
(actually the code will never get called until in6_proto.c is updated)

the current code assumes that {AH,ESP} is right next to IPv6 header.
the assumption must be removed, but it means that we need to chase
header chain...


# 1.2 25-Dec-1999 angelos

Change some function prototypes, dont unnecessarily initialize some
variables.


# 1.1 09-Dec-1999 angelos

So I was lying...unify ESP and AH wrapper-input processing. The new
file contains a common routine for massaging the packet, doing
peripheral checks, update statistics, etc. common for both AH/ESP,
both IPv4/IPv6. Also wrapper routines for AH/ESP-v4/v6, and the sysctl
routines from ip_ah.c/ip_esp.c