History log of /freebsd-current/sys/net/if_tuntap.c
Revision Date Author Comments
# 2cb0fce2 22-Oct-2023 Seth Hoffert <seth.hoffert@gmail.com>

bpf: Make BPF interop consistent with if_loop

The pseudo_AF_HDRCMPLT check is already being done in if_loop and
just needed to be ported over to if_ic, if_wg, if_disc, if_gif,
if_gre, if_me, if_tuntap and ng_iface. This is needed in order to
allow these interfaces to work properly with e.g., tcpreplay.

PR: 256587
Reviewed by: markj
MFC after: 2 weeks
Pull Request: https://github.com/freebsd/freebsd-src/pull/876


# fa93ba40 29-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

if_tuntap: simplify storage of per-vnet cloners

There is no need for a separate structure neither for a linked list.
Provide each VNET with an array of pointers to if_clone that has the same
size as the driver list.

Reviewed by: zlei, kevans, kp
Differential Revision: https://reviews.freebsd.org/D44307


# 0365e5fc 11-Dec-2023 Konstantin Belousov <kib@FreeBSD.org>

if_tun: check device name

to avoid panic if the name already exists, which is possible with the
interface renaming.

PR: 266999
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43001


# 5b0010b4 04-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

if_tuntap: fix NOIP build

Note: this removes one TUNDEBUG() for the sake of not having one more
ifdefed variable declaration and for the overall code brevity. The call
from tuntap into LRO can be so easily traced with dtrace(1) that an
80-ish printf(9)-based debugging can be omitted.

Fixes: 99c79cab422705f92f05a2924a29bdf823372ebf


# 99c79cab 19-Nov-2023 Michael Tuexen <tuexen@FreeBSD.org>

if_tuntap: add LRO support to tap devices

This allows testing the LRO code with packetdrill in local mode.

Reviewed by: rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D42548


# 44669b76 09-Nov-2023 Michael Tuexen <tuexen@FreeBSD.org>

if_tuntap: remove redundant check

eh can't be NULL, so there is no need to check for it.
Reported by: zlei
MFC after: 1 week
Sponsored by: Netflix, Inc.


# ff69d13a 09-Nov-2023 Michael Tuexen <tuexen@FreeBSD.org>

if_tuntap: support receive checksum offloading for tap interfaces

When enabled, pretend that the IPv4 and transport layer checksum
is correct for packets injected via the character device.
This is a prerequisite for adding support for LRO, which will
be added next. Then packetdrill can be used to test the LRO
code in local mode.

Reviewed by: rscheff, zlei
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D42477


# 35af22ac 05-Nov-2023 Michael Tuexen <tuexen@FreeBSD.org>

if_tuntap: trigger the bpf hook on transmitting for the tap interface

The tun interface triggers the bpf hook when a packet is transmitted,
the tap interface triggers it when the packet is read from the
character device. This is inconsistent.
So fix the tap device such that it behaves like the tun device.
This is needed for adding support for the tap device to packetdrill.

Reviewed by: kevans, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D42467


# 4ffe410e 04-Nov-2023 Michael Tuexen <tuexen@FreeBSD.org>

if_tuntap: improve code consistency

No functional change intended.

Reviewed by: rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D42462


# 27f1ec0b 21-Sep-2023 Konstantin Belousov <kib@FreeBSD.org>

tun/tap: correct ref count on cloned cdevs

Reported and tested by: eugen
PR: 273418
Discussed with: jah, kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42008


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 0ec220df 31-May-2023 Alexandre Snarskii <snar@snar.spb.ru>

tap(4): allow full-duplex and non-zero speed

tap(4) devices advertise themselves as just 'ethernet autoselect',
without duplex or speed capabilities.
This advertisement makes them unable to be aggregated into lacp-based
lagg(4):
- lacp code requires underlying interfaces to be full-duplex, else
interface will not participate in lacp at all
- lacp code requires underlying interface to have non-zero speed, else
this interface can not be selected as active aggregator

PR: 217374
Reported-by: Alexandre Snarskii <snar@snar.spb.ru>
Co-authored-by: Mina Galić <freebsd@igalic.co>
Reviewed-by: imp,karles
Pull-request: https://github.com/freebsd/freebsd-src/pull/745


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 2c2b37ad 13-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

ifnet/API: Move struct ifnet definition to a <net/if_private.h>

Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.

Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046


# 91ebcbe0 21-Sep-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

if_clone: migrate some consumers to the new KPI.

Convert most of the cloner customers who require custom params
to the new if_clone KPI.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36636
MFC after: 2 weeks


# 497240de 19-Aug-2022 Mateusz Guzik <mjg@FreeBSD.org>

Retire clone_drain_lock

It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36268


# 62e1a437 22-Aug-2021 Zhenlei Huang <zlei.huang@gmail.com>

routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).

Implement kernel support for RFC 5549/8950.

* Relax control plane restrictions and allow specifying IPv6 gateways
for IPv4 routes. This behavior is controlled by the
net.route.rib_route_ipv6_nexthop sysctl (on by default).

* Always pass final destination in ro->ro_dst in ip_forward().

* Use ro->ro_dst to exract packet family inside if_output() routines.
Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.

* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
It leverages recent lltable changes committed in c541bd368f86.

Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0

Differential Revision: https://reviews.freebsd.org/D30398
MFC after: 2 weeks


# 51221b68 21-Jul-2021 Kyle Evans <kevans@FreeBSD.org>

tuntap: clean up cc --analyze

One complaint of a dead-store, smack it with a __diagused.


# a6b76897 11-Jan-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Remove redundant rtinit() calls from tuntap.

Removed code iterates over if_addrhead and tries to remove
routes for each ifa.
This is exactly the thing that if_purgeaddrs() do, and
if_purgeaddr() is already called in the end.

Reviewed by: glebius
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D28106


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# cef5fc74 16-Jul-2020 Kyle Evans <kevans@FreeBSD.org>

tuntap: drop redundant if_mtu assignment in tuncreate

ether_ifattach will immediately clobber if_mtu with ETHERMTU anyways, just
let it happen.

MFC after: 1 week


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# af614b8e 22-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

tap(4) calls ether_input() in context of write(2). Enter network
epoch here.

The tun(4) side doesn't need this, as netisr code will take care.


# f7810883 22-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): Fix NOINET build after r353741

Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs.
INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good.

Reported by: cem


# 200abb43 21-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): properly declare if_tun and if_tap modules

Simply adding MODULE_VERSION does not do the trick, because the modules
haven't been declared. This should actually fix modfind/kldstat, which
r351229 aimed and failed to do.

This should make vm-bhyve do the right thing again when using the ports
version, rather than the latest version not in ports.

MFC after: 3 days


# 3d501333 21-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): restrict scope of net.link.tap.user_open slightly

net.link.tap.user_open has historically allowed non-root users to do devfs
cloning and open /dev/tap* nodes based on permissions. Loosen this up to
make it only allow users to do devfs cloning -- we no longer check it in
tunopen.

This allows tap devices to be created that can actually be opened by a user,
rather than swiftly restricting them to root because the magic sysctl has
not been set.

The sysctl has not yet been completely deprecated, because more thought is
needed for how to handle the devfs cloning case. There is not an easy
suitable replacement for the sysctl there, and more care needs to be placed
in determining whether that's OK or not.

PR: 200185


# 60250777 20-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): use cdevpriv w/ dtor for last close instead of d_close

cdevpriv dtors will be called when the reference count on the associated
struct file drops to 0, while d_close can be unreliable for cleaning up
state at "last close" for a number of reasons. As far as tunclose/tundtor is
concerned the difference is minimal, so make the switch.


# 6869d530 20-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): Use make_dev_s to avoid si_drv1 race

This allows us to avoid some dance in tunopen for dealing with the
possibility of dev->si_drv1 being NULL as it's set prior to the devfs node
being created in all cases.

There's still the possibility that the tun device hasn't been fully
initialized, since that's done after the devfs node was created. Alleviate
this by returning ENXIO if we're not to that point of tuncreate yet.

This work is what sparked r353128, full initialization of cloned devices
w/ specified make_dev_args.


# 486c0b22 20-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): break out after setting TUN_DSTADDR

This is now the only flag we set in this loop, terminate early.


# 6041d76e 20-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): Drop TUN_IASET

This flag appears to have been effectively unused since introduction to
if_tun(4) -- drop it now.


# f8bc74e2 18-Oct-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

tap: add support for virtio-net offloads

This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.

Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D21263


# 73c96bbe 10-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Don't use if_maddr_rlock() in tuntap(4), use epoch(9) directly instead.


# b8a6e03f 07-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111


# 29128766 04-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap(4): loosen up tunclose restrictions

Realistically, this cannot work. We don't allow the tun to be opened twice,
so it must be done via fd passing, fork, dup, some mechanism like these.
Applications demonstrably do not enforce strict ordering when they're
handing off tun devices, so the parent closing before the child will easily
leave the tun/tap device in a bad state where it can't be destroyed and a
confused user because they did nothing wrong.

Concede that we can't leave the tun/tap device in this kind of state because
of software not playing the TUNSIFPID game, but it is still good to find and
fix this kind of thing to keep ifconfig(8) up-to-date and help ensure good
discipline in tun handling.

MFC after: 3 days


# 59997c3c 03-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

if_tuntap: create /dev aliases when a tuntap device gets renamed

Currently, if you do:

$ ifconfig tun0 create
$ ifconfig tun0 name wg0
$ ls -l /dev | egrep 'wg|tun'

You will see tun0, but no wg0. In fact, it's slightly more annoying to make
the association between the new name and the old name in order to open the
device (if it hadn't been opened during the rename).

Register an eventhandler for ifnet_arrival_events and catch interface
renames. We can determine if the ifnet is a tun easily enough from the
if_dname, which matches the cevsw.d_name from the associated tuntap_driver.

Some locking dance is required because renames don't require the device to
be opened, so it could go away in the middle of handling the ioctl, but as
soon as we've verified this isn't the case we can attempt to busy the tun
and either bail out if the tun device is dying, or we can proceed with the
rename.

We only create these aliases on a best-effort basis. Renaming a tun device
to "usbctl", which doesn't exist as an ifnet but does as a /dev, is clearly
not that disastrous, but we can't and won't create a /dev for that.


# c4cad154 03-Oct-2019 Kyle Evans <kevans@FreeBSD.org>

if_tuntap: add a busy/unbusy mechanism, replace destroy OPEN check

A future commit will create device aliases when a tuntap device is renamed
so that it's still easily found in /dev after the rename. Said mechanism
will want to keep the tun alive long enough to either realize that it's
about to go away or complete the alias creation, even if the alias is about
to get destroyed.

While we're introducing it, using it to prevent open devices from going away
makes plenty of sense and keeps the logic on waking up tun_destroy clean, so
we don't have multiple places trying to cv_broadcast unless it's still in
use elsewhere.


# 5c4eed86 19-Aug-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap: belatedly add MODULE_VERSION for if_tun and if_tap

When tun/tap were merged, appropriate MODULE_VERSION should have been added
for things like modfind(2) to continue to do the right thing with the old
names.

Reported by: jhb


# b5b83671 19-Aug-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

if_tuntap: minor improvements

Rewrite a loop to avoid duplicating the exit condition.
Simplify mask processing in tunpoll().
Fix minor typos.

Reviewed by: kevans, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21302


# 0dbac71f 25-Jul-2019 Kyle Evans <kevans@FreeBSD.org>

if_tuntap(4): Add TUNGIFNAME

This effectively just moves TAPGIFNAME into common ioctl territory.

MFC after: 3 days


# e2e050c8 19-May-2019 Conrad Meyer <cem@FreeBSD.org>

Extract eventfilter declarations to sys/_eventfilter.h

This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.


# db226f0d 14-May-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap: Defer clearing if_softc until after if_detach

r346670 added an sx to close a race between the ifioctl handler and
interface destruction. Unfortunately, it clears if_softc immediately after
the interface is closed, but before if_detach has been invoked.

Any time before detachment, an interface that's part of a bridge may still
receive traffic that's pushed through tunstart/tunstart_l2 and promptly
lead to a panic because if_softc is now NULL.

Fix it by deferring the clearing of if_softc until after the interface has
detached and thus been removed from the bridge. if_softc still gets cleared
in case another thread has already entered the ioctl handler before it's
replaced with ifdead_ioctl.

Reported by: markj
MFC after: 3 days


# 81b3b91e 10-May-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap: Improve style

No functional change.

tun_flags of the tuntap_driver was renamed to ident_flags to reflect the
fact that it's a subset of the tun_flags that identifies a tuntap device.
This maps more easily (visually) to the TUN_DRIVER_IDENT_MASK that masks off
the bits of tun_flags that are applicable to tuntap driver ident. This is a
purely cosmetic change.


# 16760d8e 09-May-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap: Don't down tap interfaces if LINK0 is set


# a6fa0495 09-May-2019 Kyle Evans <kevans@FreeBSD.org>

tuntap: Properly detach tap ifp


# 251a32b5 07-May-2019 Kyle Evans <kevans@FreeBSD.org>

tun/tap: merge and rename to `tuntap`

tun(4) and tap(4) share the same general management interface and have a lot
in common. Bugs exist in tap(4) that have been fixed in tun(4), and
vice-versa. Let's reduce the maintenance requirements by merging them
together and using flags to differentiate between the three interface types
(tun, tap, vmnet).

This fixes a couple of tap(4)/vmnet(4) issues right out of the gate:
- tap devices may no longer be destroyed while they're open [0]
- VIMAGE issues already addressed in tun by kp

[0] emaste had removed an easy-panic-button in r240938 due to devdrn
blocking. A naive glance over this leads me to believe that this isn't quite
complete -- destroy_devl will only block while executing d_* functions, but
doesn't block the device from being destroyed while a process has it open.
The latter is the intent of the condvar in tun, so this is "fixed" (for
certain definitions of the word -- it wasn't really broken in tap, it just
wasn't quite ideal).

ifconfig(8) also grew the ability to map an interface name to a kld, so
that `ifconfig {tun,tap}0` can continue to autoload the correct module, and
`ifconfig vmnet0 create` will now autoload the correct module. This is a
low overhead addition.

(MFC commentary)

This may get MFC'd if many bugs in tun(4)/tap(4) are discovered after this,
and how critical they are. Changes after this are likely easily MFC'd
without taking this merge, but the merge will be easier.

I have no plans to do this MFC as of now.

Reviewed by: bcr (manpages), tuexen (testing, syzkaller/packetdrill)
Input also from: melifaro
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20044