History log of /freebsd-current/sys/net/if_vxlan.c
Revision Date Author Comments
# 93fbfef0 20-May-2024 Zhenlei Huang <zlei@FreeBSD.org>

if_vxlan(4): Add checking for loops and nesting of tunnels

User misconfiguration, either tunnel loops, or a large number of
different nested tunnels, can overflow the kernel stack. Prevent that
by using if_tunnel_check_nesting().

PR: 278394
Diagnosed by: markj
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45197


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 2c2b37ad 13-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

ifnet/API: Move struct ifnet definition to a <net/if_private.h>

Hide the ifnet structure definition, no user serviceable parts inside,
it's a netstack implementation detail. Include it temporarily in
<net/if_var.h> until all drivers are updated to use the accessors
exclusively.

Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38046


# de1ea2d5 07-Oct-2022 Zhenlei Huang <zlei.huang@gmail.com>

if_vxlan(4): Correct the statistic for output bytes

The vxlan interface encapsulates the Ethernet frame by prepending IP/UDP
and vxlan headers. For statistics, only the payload, i.e. the
encapsulated (inner) frame should be counted.

Event: Aberdeen Hackathon 2022
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36855


# 1fc839f4 04-Oct-2022 Zhenlei Huang <zlei.huang@gmail.com>

if_vxlan(4): Add missing statistic for input packets

Event: Aberdeen hackathon 2022
Reviewed by: bryanv, kp
Differential Revision: https://reviews.freebsd.org/D36841


# 8707cb19 30-Sep-2022 Zhenlei Huang <zlei.huang@gmail.com>

if_vxlan(4): Check the size of data available in mbuf before using them

PR: 261711
Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36794


# 91ebcbe0 21-Sep-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

if_clone: migrate some consumers to the new KPI.

Convert most of the cloner customers who require custom params
to the new if_clone KPI.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36636
MFC after: 2 weeks


# 7f7a804a 08-Jul-2022 Zhenlei Huang <zlei.huang@gmail.com>

vxlan: Add support for socket ioctls SIOC[SG]TUNFIB

Submitted by: Luiz Amaral <email@luiz.eng.br>
PR: 244004
Differential Revision: https://reviews.freebsd.org/D32820
MFC after: 2 weeks


# 742e7210 11-Apr-2022 Kristof Provost <kp@FreeBSD.org>

udp: allow udp_tun_func_t() to indicate it did not eat the packet

Allow udp tunnel functions to indicate they have not taken ownership of
the packet, and that normal UDP processing should continue.

This is especially useful for scenarios where the kernel has taken
ownership of a socket that was originally created by userspace. It
allows the tunnel function to pass through certain packets for userspace
processing.

The primary user of this is if_ovpn, when it receives messages from
unknown peers (which might be a new client).

Reviewed by: tuexen
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34883


# bef80a72 07-Feb-2022 Gordon Bergling <gbe@FreeBSD.org>

vxlan(4): Fix two typos in sysctl descriptions

- s/fowarding/forwarding/

MFC after: 3 days


# ceaf442f 06-Feb-2022 Aleksandr Fedorov <afedorov@FreeBSD.org>

if_vxlan(4): Allow netmap_generic to intercept RX packets.

Netmap (generic) intercepts the if_input method to handle RX packets.

Call ifp->if_input() instead of netisr_dispatch().
Add stricter check for incoming packet length.

This change is very useful with bhyve + vale + if_vxlan.

Reviewed by: vmaffione (mentor), kib, np, donner
Approved by: vmaffione (mentor), kib, np, donner
MFC after: 2 weeks
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D30638


# a3c2c06b 06-Jun-2021 Bjoern A. Zeeb <bz@FreeBSD.org>

Make LINT NOINET and NOIP kernel builds warning free.

Apply #ifdef INET or #if defined(INET6) || defined(INET) to make
universe NOINET and NOIP LINT kernels warning free as well again.


# baacf701 28-Mar-2021 Konstantin Belousov <kib@FreeBSD.org>

vxlan: correct interface MTU when using hw offloads

Otherwise it breaks when offloading like checksum or TSO are used,
because second (encapsulated) ip_output() processing passes fragments of
the encapsulated packet down to the hardware interface.

Diagnosed by: hselasky
Reviewed by: np
Sponsored by: Nvidia Networking / Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29501


# e243367b 12-Feb-2021 Konstantin Belousov <kib@FreeBSD.org>

mbuf: add a way to mark flowid as calculated from the internal headers

In some settings offload might calculate hash from decapsulated packet.
Reserve a bit in packet header rsstype to indicate that.

Add m_adj_decap() that acts similarly to m_adj, but also either clear
flowid if it is not marked as inner, or transfer it to the decapsulated
header, clearing inner indicator. It depends on the internals of m_adj()
that reuses the argument packet header for the result.

Use m_adj_decap() for decapsulating vxlan(4) and gif(4) input packets.

Reviewed by: ae, hselasky, np
Sponsored by: Nvidia Networking / Mellanox Technologies
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28773


# 994e4702 21-Dec-2020 Konstantin Belousov <kib@FreeBSD.org>

vxlan: stop checking CSUM_ENCAP_VXLAN when converting inner CSUM flags into normal, for decapsulation.

The packet, if processed at this point, was already parsed to be UDP
directed to a vxlan port.

Connect-X 4+ does not provide easy method to infer which parser
processed the packet, so driver cannot set the flag without a lot of
efforts which are only to satisfy the formal requirements.

Reviewed by: bryanv, np
Sponsored by: Mellanox Technologies/NVidia Networking
Differential revision: https://reviews.freebsd.org/D27449
MFC after: 1 week


# 610d3459 22-Oct-2020 Navdeep Parhar <np@FreeBSD.org>

if_vxlan(4): csum_flags_to_inner_flags takes the tunnel protocol as a parameter.

No functional change.


# b092fd6c 17-Sep-2020 Navdeep Parhar <np@FreeBSD.org>

if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.

This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx
and rx), TSO, and RSS if the NIC is capable of performing these operations on
inner VXLAN traffic.

A VXLAN interface inherits the capabilities of its vxlandev interface if one is
specified or of the interface that hosts the vxlanlocal address. If other
interfaces will carry traffic for that VXLAN then they must have the same
hardware capabilities.

On transmit, if_vxlan verifies that the outbound interface has the required
capabilities and then translates the CSUM_ flags to their inner equivalents.
This tells the hardware ifnet that it needs to operate on the inner frame and
not the outer VXLAN headers.

An event is generated when a VXLAN ifnet starts. This allows hardware drivers to
configure their devices to expect VXLAN traffic on the specified incoming port.

On receive, the hardware does RSS and checksum verification on the inner frame.
if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is
not very clear why it didn't do this already.

Future work:
Rx: it should be possible to avoid the first trip up the protocol stack to get
the frame to if_vxlan just so it can decapsulate and requeue for a second trip
up the stack. The hardware NIC driver could directly call an if_vxlan receive
routine for VXLAN traffic instead.

Rx: LRO. depends on what happens with the previous item. There will have to to
be a mechanism to indicate that it's time for if_vxlan to flush its LRO state.

Reviewed by: kib@
Relnotes: Yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25873


# a5154bb2 15-Aug-2020 Qing Li <qingli@FreeBSD.org>

Correct the mask byte order when checking for reserved bits.

Reviewed by: gnn
Approved by: gnn
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26071


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# b7592822 24-Jul-2019 Kirill Ponomarev <krion@FreeBSD.org>

Allow set MTU more than 1500 bytes.

Submitted by: Alexandr Fedorov <aleksandr.fedorov_itglobal_dot_com>
Approved by: jhb, rgrimes
Sponsored by: ITGlobal.com
Differential Revision: https://reviews.freebsd.org/D19422


# 59854ecf 25-Jun-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Convert all IPv4 and IPv6 multicast memberships into using a STAILQ
instead of a linear array.

The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.

To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi

All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.

This patch has been tested by pho@ .

Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by: markj @
MFC after: 1 week
Sponsored by: Mellanox Technologies


# 3c3aa8c1 17-Apr-2019 Kyle Evans <kevans@FreeBSD.org>

net: adjust randomized address bits

Give devices that need a MAC a 16-bit allocation out of the FreeBSD
Foundation OUI range. Change the name ether_fakeaddr to ether_gen_addr now
that we're dealing real MAC addresses with a real OUI rather than random
locally-administered addresses.

Reviewed by: bz, rgrimes
Differential Revision: https://reviews.freebsd.org/D19587


# 6b7e0c1c 14-Mar-2019 Kyle Evans <kevans@FreeBSD.org>

ether: centralize fake hwaddr generation

We currently have two places with identical fake hwaddr generation --
if_vxlan and if_bridge. Lift it into if_ethersubr for reuse in other
interfaces that may also need a fake addr.

Reviewed by: bryanv, kp, philip
Differential Revision: https://reviews.freebsd.org/D19573


# 46d0f824 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

net: fix set but not used


# ac2b436d 30-Dec-2017 Bryan Venteicher <bryanv@FreeBSD.org>

Add macro for vxlan list mutex lock and unlock

This will simplify some later VNET support.

Submitted by: hrs
MFC after: 2 weeks


# 6d7bc583 30-Dec-2017 Bryan Venteicher <bryanv@FreeBSD.org>

Advertise IFCAP_LINKSTAT after r326480 added link status support

MFC after: 2 weeks


# 33e0d8f0 29-Dec-2017 Bryan Venteicher <bryanv@FreeBSD.org>

Add support for IPv6 scoped addresses to vxlan

MFC after: 2 weeks


# 0e253fd1 16-Dec-2017 Andrey V. Elsukov <ae@FreeBSD.org>

Fix possible memory leak.

vxlan_ftable entries are sorted in ascending order, due to wrong arguments
order it is possible to stop search before existing element will be found.
Then new element will be allocated in vxlan_ftable_update_locked() and can
be inserted in the list second time or trigger MPASS() assertion with
enabled INVARIANTS.

PR: 224371
MFC after: 1 week


# 93a5a3b0 02-Dec-2017 Bryan Venteicher <bryanv@FreeBSD.org>

Add if media and link status events to vxlan

PR: 214359
MFC after: 2 weeks


# 36ad8372 06-Jun-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties

Reviewed by: hps, erj, tuexen
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6688


# abb901c5 28-Apr-2016 Randall Stewart <rrs@FreeBSD.org>

Complete the UDP tunneling of ICMP msgs to those protocols
interested in having tunneled UDP and finding out about the
ICMP (tested by Michael Tuexen with SCTP.. soon to be using
this feature).

Differential Revision: http://reviews.freebsd.org/D5875


# c2529042 01-Dec-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after: 1 month
Sponsored by: Mellanox Technologies


# 854f7e89 20-Oct-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Use the size of the Ethernet address, not the entire header, when
copying into forwarding entry.

Reported by: Coverity
CID: 1248849


# 007054f0 20-Oct-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Add vxlan interface

vxlan creates a virtual LAN by encapsulating the inner Ethernet frame in
a UDP packet. This implementation is based on RFC7348.

Currently, the IPv6 support is not fully compliant with the specification:
we should be able to receive UPDv6 packets with a zero checksum, but we
need to support RFC6935 first. Patches for this should come soon.

Encapsulation protocols such as vxlan emphasize the need for the FreeBSD
network stack to support batching, GRO, and GSO. Each frame has to make
two trips through the network stack, and each frame will be at most MTU
sized. Performance suffers accordingly.

Some latest generation NICs have begun to support vxlan HW offloads that
we should also take advantage of. VIMAGE support should also be added soon.

Differential Revision: https://reviews.freebsd.org/D384
Reviewed by: gnn
Relnotes: yes