History log of /freebsd-current/sys/dev/virtio/network/if_vtnet.c
Revision Date Author Comments
# a01c7081 18-Apr-2024 Gleb Smirnoff <glebius@FreeBSD.org>

vtnet: use CURVNET_SET() instead of CURVNET_SET_QUIET()

We don't expect the VNET context to be set for virtqueue neither
for taskqueue handlers.

Suggested by: zec
Fixes: 3f2b9607756d0f92ca29c844db0718b313a06634


# 3f2b9607 28-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

vtnet: set VNET context in RX handler

The context is required for NIC-level pfil(9) filtering.


# 0ea4b408 04-Feb-2024 Warner Losh <imp@FreeBSD.org>

vtnet: Avoid ifdefs based on __NO_STRICT_ALIGNMENT

Some platforms require an adjustment of the ethernet hearders. Rather
than make this be on __NO_STRICT_ALIGNMENT being defined, define
VTNET_ETHER_ALIGN to be either 0 or ETHER_ALIGN (aka 2). Add a test to
the if statements to only do them when != 0. This eliminates the #ifdef
sprinkled in the code, still communicates the intent and gives the same
compiled results.

Sponsored by: Netflix
Reviewed by: bz, bryanv
Differential Revision: https://reviews.freebsd.org/D43654


# d9e0e426 04-Feb-2024 Warner Losh <imp@FreeBSD.org>

vtnet: Account for the padding when selecting allocation size

While we account for the padding in the length of the mbuf we use, we do
not account for it when we 'guess' the size of the mbuf to allocate
based in the MTU of the device. This leads to a situation where we might
fail if the mtu is close to a bucket size (say 2018) such that the added
padding would push us over the edge for a full-sized packet. mtu of 2018
is super rare (2016 and 2020 would both work), but fix it none-the-less.
It's a shame we can't just set VTNET_RX_HEADER_PAD to 2 in this case. The 4
seems hard-coded somewhere I've not found documented (I think it's in the
protocol given the comments about VIRTIO_F_ANY_LAYOUT).

Sponsored by: Netflix
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D43656


# 3be59adb 28-Jan-2024 Warner Losh <imp@FreeBSD.org>

vtnet: Adjust for ethernet alignment.

If the header that we add to the packet's size is 0 % 4 and we're
strictly aligning, then we need to adjust where we store the header so
the packet that follows will have it's struct ip header properly
aligned. We do this on allocation (and when we check the length of the
mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
softc, because it's also used to allocate the mbufs and it needs to be
the proper size for that. Since we otherwise use the size of the mbuf
(or sometimes the smaller size of the received packet) to compute how
much we can buffer, this ensures no overflows. The 2 byte adjustment
also does not affect how many packets we can receive in the lro_nomrg
case.

PR: 271288
Sponsored by: Netflix
Reviewed by: bryanv
Differential Revision: https://reviews.freebsd.org/D43224


# 23699ff2 28-Dec-2023 Warner Losh <imp@FreeBSD.org>

Revert "vtnet: Adjust rx buffer so IP header 32-bit aligned"

This reverts commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4.

This wasn't right to start with...

Requested by: markj


# 8ee1cc4a 28-Dec-2023 Warner Losh <imp@FreeBSD.org>

Revert "vtnet: Better adjust for ethernet alignment."

This reverts commit e9da71cd35d46ca13da4396d99e0af1703290e68.

This was inadvertantly pushed and turns out ot be not quite right.

Requested by: markj


# e9da71cd 21-Dec-2023 Warner Losh <imp@FreeBSD.org>

vtnet: Better adjust for ethernet alignment.

Move adjustment of the mbuf from where we allocate it to where we are
about to queue it to the device. Do this only on those platforms that
require it. This allows us to receive an entire jumbo frame on other
platforms. It also doesn't make the adjustment on subsequent frames when
we queue mulitple mbufs for LRO operations.

For the normal use case on armv7, there's no difference because we only
ever allocate one mbuf. However, for the LRO cases it increases what's
available in LRO. It also ensure that we get enough mbufs in those cases
as well (though I have no ability to test this on a LRO scenario with
armv7).

This has the side effect of reverting 527b62e37e68.

Fixes: 527b62e37e68
Sponsored by: Netflix


# 9e6d11ce 20-Dec-2023 Warner Losh <imp@FreeBSD.org>

vtnet: Adjust rx buffer so IP header 32-bit aligned

Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating them.
This improves performance everywhere, and allows armv7 to work at all.

PR: 271288 (PR had a different fix than I wound up with)
MFC After: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D43136


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 180c0240 18-Sep-2023 Mina Galić <freebsd@igalic.co>

virtio: remove virtio_alloc_virtqueues' flags arg

Summary:
the flags argument is unused.
Its initial design idea has been superceded by the addition of
virtio_setup_intr and related APIs.

Sponsored by: The FreeBSD Foundation

Reviewers: bryanv

Reviewed By: bryanv

Subscribers: cognet, imp

Differential Revision: https://reviews.freebsd.org/D41850


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 580cadd6 08-Aug-2023 Kristof Provost <kp@FreeBSD.org>

vtnet: allow IFF_ALLMULTI to be set without VIRTIO_NET_F_CTRL_RX

If the host doesn't announce VIRTIO_NET_F_CTRL_RX we cannot disable all
multicast traffic. Previously we'd refuse to set the IFF_ALLMULTI flag,
which is the exact opposite of what is actually happening.

This broke things such as igmpproxy.

See also: https://redmine.pfsense.org/issues/14301
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D41356


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# a6b55ee6 17-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH

Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6bf. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a4. However in e87c4940156 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.

This change not only reverts e87c4940156 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.

Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.

Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510


# a2256150 14-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

net: use pfil_mbuf_{in,out} where we always have an mbuf

This finalizes what has been started in 0b70e3e78b0.

Reviewed by: kp, mjg
Differential revision: https://reviews.freebsd.org/D37976


# 4ee96792 01-Mar-2022 Justin Hibbits <jhibbits@FreeBSD.org>

Mechanically convert if_vtnet(4) to IfAPI

Reviewed By: bryanv
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37799


# 5c4c96d3 06-May-2022 John Baldwin <jhb@FreeBSD.org>

virtio: Remove unused devclass arguments to DRIVER_MODULE.


# 53236f90 18-Apr-2022 Michael Tuexen <tuexen@FreeBSD.org>

if_vtnet: improve dumping a kernel

Disable software LRO during kernel dumping, because having it enabled
requires to be in a network epoch, which might or might not be the
case depending on the code path resulting in the panic.

Reviewed by: markj
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34787


# 127b40e7 13-Apr-2022 John Baldwin <jhb@FreeBSD.org>

vtnet: offset is only used for INET or INET6.


# fc035df8 05-Feb-2022 Aleksandr Fedorov <afedorov@FreeBSD.org>

if_vtnet(4): Restore the ability to set promisc mode.

PR: 254343, 255054
Reviewed by: vmaffione (mentor), donner
Approved by: vmaffione (mentor), donner
MFC after: 2 weeks
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D30639


# 526ddf17 20-Jan-2022 Mark Johnston <markj@FreeBSD.org>

vtnet: Mark MRG_RXBUF headers as initialized before loading fields

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 3f6ab549 04-Jan-2022 Gleb Smirnoff <glebius@FreeBSD.org>

vtnet: don't leak pfil(9) data on detach

PR: 260667
Submitted by: <ghuckriede blackberry.com>


# 0d3b2bd7 14-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

vtnet: plug set-but-not-used vars

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 710c0556 11-Aug-2021 Mark Johnston <markj@FreeBSD.org>

virtio: Add KMSAN hooks for network and block devices

This ensures that host-written data is marked as initialized.

Sponsored by: The FreeBSD Foundation


# 4044af03 19-Apr-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Fix vtnet TCP lro panic

Differential Revision: https://reviews.freebsd.org/D29900
Reviewed by: hps, kp


# d4697a6b 18-Mar-2021 Michael Tuexen <tuexen@FreeBSD.org>

vtnet: fix TSO for TCP/IPv6

The decision whether a TCP packet is sent over IPv4 or IPv6 was
based on ethertype, which works correctly. In D27926 the criteria
was changed to checking if the CSUM_IP_TSO flag is set in the
csum-flags and then considering it to be TCP/IPv4.
However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6,
where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO.
Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4,
which breaks TSO for TCP/IPv6.
This patch bases the check again on the ethertype.
This fix will be MFC instantly as discussed with re(gjb).

MFC after: instantly
PR: 254366
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D29331


# c1b554c8 22-Feb-2021 Alex Richardson <arichardson@FreeBSD.org>

if_vtnet: Fix pointer-sign and used parameter warnings

Reviewed By: grehan
Differential Revision: https://reviews.freebsd.org/D28726


# 633218ee 20-Jan-2021 Jessica Clarke <jrtc27@FreeBSD.org>

virtio: Reduce boilerplate for device driver module definitions

Rather than have every device register itself for both virtio_pci and
virtio_mmio, provide a VIRTIO_DRIVER_MODULE wrapper to declare both,
merge VIRTIO_SIMPLE_PNPTABLE with VIRTIO_SIMPLE_PNPINFO and make the
latter register for both buses. This also has the benefit of abstracting
away the available transports and their names.

Reviewed by: bryanv
Differential Revision: https://reviews.freebsd.org/D28073


# e6cc42f1 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

virtio: Handle possible failure of virtio_finalize_features()

Try to standardize how drivers negotiate feature and the
function names

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27930


# 2bfab357 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Add counter for received host LRO

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27928


# 475a60ae 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Misc Tx path cleanup

- Add and fix a few error path counters
- Improve sysctl descriptions
- Use flags consistently to determine IPv4 vs IPv6

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27926


# 6b53aeed 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Set lro_nsegs for host LRO packets

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27933


# 33b5433f 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Remove unnecessary TUNABLE_INTs because of CTLFLAG_RDTUN

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27923


# 4f18e23f 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Schedule Rx task if pending items when enabling interrupt

Prior to V1, the driver would enable interrupts and then notify the
host that DRIVER_OK. Since for V1, DRIVER_OK needs to be set before
notifying the virtqueues, there may be items in the queues waiting
to be processed by the time interrupts are enabled.

This fixes a bug where the Rx queue would appear stuck, only being
usable after an interface down/up cycle.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27922


# c3187190 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Disable F_MTU feature if MTU is invalid

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27931


# bd8809df 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Limit allocations of unused virtqueues

For multiqueue, we may use fewer than the provided maximum number of
queues. Try to limit allocations of the unused queues: no interrupts,
no indirect descriptors, and no taskqueues.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27921


# b470419e 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Rework 4be723f63 max multiqueue pairs check

Verify the max_virtqueue_pairs is within the range allowed by
the spec.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27920


# 42343a63 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Add support for software LRO

This useful when running on hosts that support checksum offloading
but not the GUEST_TSO (LRO) feature. Or potentially, some GRO-like
support when doing forwarding.

Only enable SW LRO when the host LRO is not available since both
tends to be harmful, and difficult to enable/disable selectively
with only a single IFCAP_LRO flag.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27919


# 177761e4 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Set the interface max TSO values

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27917


# e36a6b1b 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Add support for CTRL_GUEST_OFFLOADS feature

This allows the Rx checksum and LRO to be modified without a full
reinit of the device.

Remove IFCAP_RXCSUM_IPV6 from the interface capabilities since in
VirtIO Rx checksums are just enabled or disabled for all protocols.

Properly update IFCAP_LRO if LRO is becomes disabled when Rx
checksums are disabled.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27916


# dc9029d8 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Move ioctl handlers into separate functions

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27914 https://reviews.freebsd.org/D27915


# 44559b26 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Cleanup the reinit process

In modern VirtIO, the virtqueues cannot be notified before setting
DRIVER_OK status.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27932


# 32e0493c 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Cleanup the interface setup methods

Defer the ether_ifattach until the interface capabilities
are configured

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27913


# 2520cd38 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Only set IFCAP_JUMBO_MTU when jumbo MTU is supported

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27912


# baa5234f 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Move the Tx interrupt threshold into the Txq structure

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27911


# 05041794 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Defer updating generated MAC address until attached

This improves spec compliance because the driver is not suppose
to notify the device prior to setting the DRIVER_OK status, which
could happen with the VIRTIO_NET_F_CTRL_MAC_ADDR.

The VIRTIO_NET_F_MAC feature should always be negotiated so would
be a rare situation.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27910


# 25dbc30e 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Remove at attach PROMISC handling

This may have been required in an early, early, early version of the
specification but I cannot find any reference to it, and a promiscuous
default seems very odd so remove this code.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27909


# 6a733393 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Support VIRTIO_NET_F_SPEED_DUPLEX

This features lets the guest driver know the speed and duplex of
the "link". Instead of trying to support many media types based
on the possible/likely speeds/duplexes, only use the speed to
set the interface baudrate.

Cleanup ifmedia code to match other drivers.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27908


# aabdf5b6 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Support VIRTIO_NET_F_MTU

This feature lets the guest driver know the maximum MTU size
supported by the host device. If set, use this to limit the
acceptable MTUs, and improve how the receive mbuf cluster size
then is selected.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27907


# fa7ca1e3 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Rx path cleanup

- Fix the NEEDS_CSUM and DATA_VALID checksum flags. The NEEDS_CSUM
checksum is incomplete (partial) so offer a fallback for the driver
to calculate the checksum. Simplify DATA_VALID because we know
the host has validated the checksum.

- Default 4K mbuf clusters for mergeable buffers. May need to
scale this down to 2K clusters in certain configurations such
many queue pairs, big queues (like 4096 in GCP), and low memory.

- Use the MTU when calculated the receive mbuf cluster size
when not doing TSO/LRO. This will need more adjustment once
the MTU feature is supported.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27906


# 5e220811 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

if_vtnet: Add initial modern (V1) support

Very basic support to get packets flowing on modern QEMU but still
several conformance issues remain that will be addressed in later
commits.

First of many passes at cleaning up various accumulated cruft

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27904


# 1cd1ed3f 18-Jan-2021 Bryan Venteicher <bryanv@FreeBSD.org>

Revert: virtio: Support non-legacy network device and queue

And subsequent fix 576b099a.

By adding the mergable header to the vtnet_rx_header structure, the size
was increased by 2 bytes, breaking the alignment of this structure as
described the in preceding comments.

Furthermore, the mergable header does not belong the structure. With the
mergable feature, the header is placed in line with the data, so there is
no need for a separate segment, and misleading to follow the mergable
header with any padding.

The V1 header is effectively identical to mergable header, and the driver
has long supported the mergable feature. Revert this so the later changes
that add V1 support can show how V1 is derived from the existing mergable
buffers support, and to facilitate a later MFC.

Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27855


# bb714db6 10-Jan-2021 Vincenzo Maffione <vmaffione@FreeBSD.org>

netmap: vtnet: enable/disable krings on any interface reinit

See 3d65fd97e85ab807f3b for a detailed explanation.

PR: 252453
MFC after: 1 week


# 068dbf36 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

virtio: clean up empty lines in .c and .h files


# ef6fdb33 15-Jun-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

if_vtnet: let vtnet_rx_vq_intr() and vtnet_rxq_tq_intr() share code

Since the two functions are similar, introduce a common function
(vtnet_rx_vq_process()) to share common code.
This also improves locking, by ensuring vrxs_rescheduled is accessed
under the RXQ lock, and taskqueue_enqueue() is not called under the
lock (therefore avoiding a spurious duplicate lock warning).

Reported by: jrtc27
MFC after: 2 weeks


# 576b099a 14-Jun-2020 Jessica Clarke <jrtc27@FreeBSD.org>

vtnet: Fix regression introduced in r361944

For legacy devices that don't support MrgRxBuf (such as bhyve pre-r358180),
r361944 failed to update the receive handler to account for the additional
padding introduced by the unused num_buffers field that is now always present
in struct vtnet_rx_header. Thus, calculate the padding dynamically based on
vtnet_hdr_size.

PR: 247242
Reported by: thj
Tested by: thj


# 16f224b5 14-Jun-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

netmap: vtnet: fix races in vtnet_netmap_reg()

The nm_register callback needs to call nm_set_native_flags()
or nm_clear_native_flags() once the device has been stopped.
However, in the current implementation this is not true,
as the device is stopped by vtnet_init_locked(). This causes
race conditions where the driver crashes as soon as it
dequeues netmap buffers assuming they are mbufs (or the other
way around).
To fix the issue, we extend vtnet_init_locked() with a second
argument that, if not zero, will set/clear the netmap flags.
This results in a huge simplification of the nm_register
callback itself.
Also, use netmap_reset() to check if a ring is going to be
re-initialized in netmap mode.

MFC after: 1 week


# 66823237 11-Jun-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

netmap: introduce netmap_kring_on()

This function returns NULL if the ring identified by
queue id and direction is in netmap mode. Otherwise
return the corresponding kring.
Use this function to replace vtnet_netmap_queue_on().

MFC after: 1 week


# 8c3988df 08-Jun-2020 Jessica Clarke <jrtc27@FreeBSD.org>

virtio: Support non-legacy network device and queue

The non-legacy interface always defines num_buffers in the header,
regardless of whether VIRTIO_NET_F_MRG_RXBUF, just leaving it unused. We
also need to ensure our virtqueue doesn't filter out VIRTIO_F_VERSION_1
during negotiation, as it supports non-legacy transports just fine. This
fixes network packet transmission on TinyEMU.

Reviewed by: br, brooks (mentor), jhb (mentor)
Approved by: br, brooks (mentor), jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D25132


# f0d8d352 02-Jun-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

netmap: vtnet: call netmap_rx_irq() under VQ lock

The netmap_rx_irq() function normally wakes up user-space threads
waiting for more packets. In this case, it is not necessary to
call it under the driver queue lock. However, if the interface is
attached to a VALE switch, netmap_rx_irq() ends up calling rxsync
on the interface (see netmap_bwrap_intr_notify()). Although
concurrent rxsyncs are serialized through the kring lock
(see nm_kr_tryget()), the lock acquire operation is not blocking.
As a result, it may happen that netmap_rx_irq() is called on
an RX ring while another instance is running, causing the
second call to fail, and received packets stall in the receive VQ.
We fix this issue by calling netmap_irx_irq() under the VQ lock.

MFC after: 1 week


# 1b89d00b 02-Jun-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

netmap: vtnet: honor NM_IRQ_RESCHED

The netmap_rx_irq() function may return NM_IRQ_RESCHED to inform the
driver that more work is pending, and that netmap expects netmap_rx_irq()
to be called again as soon as possible.
This change implements this behaviour in the vtnet driver.

MFC after: 1 week


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# e87c4940 24-Feb-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Although most of the NIC drivers are epoch ready, due to peer pressure
switch over to opt-in instead of opt-out for epoch.

Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.

Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.

Mark several tested drivers as IFF_KNOWSEPOCH.

Reviewed by: hselasky, jeff, bz, gallatin
Differential Revision: https://reviews.freebsd.org/D23674


# 6c3e93cb 11-Feb-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D23518


# 629667a1 11-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Pacify gcc.

Reported by: rlibby


# ed6cbf48 10-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Add pfil(9) hook to vtnet(4).

The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.

However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.


# 29bfe210 08-Jan-2020 Kristof Provost <kp@FreeBSD.org>

vtnet: Pre-allocate debugnet data immediately

Don't wait until the vtnet_debugnet_init() call happens, because at that
point we might already have allocated something from
vtnet_tx_header_zone.

Some systems showed this panic:

vtnet0: link state changed to UP
panic: keg vtnet_tx_hdr initialization after use.
cpuid = 5
time = 1578427700
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004db427f0
vpanic() at vpanic+0x17e/frame 0xfffffe004db42850
panic() at panic+0x43/frame 0xfffffe004db428b0
uma_zone_reserve() at uma_zone_reserve+0xf6/frame 0xfffffe004db428f0
vtnet_debugnet_init() at vtnet_debugnet_init+0x77/frame 0xfffffe004db42930
debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x42/frame 0xfffffe004db42980
do_link_state_change() at do_link_state_change+0x1b3/frame 0xfffffe004db429d0
taskqueue_run_locked() at taskqueue_run_locked+0x178/frame 0xfffffe004db42a30
taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe004db42a50
ithread_loop() at ithread_loop+0x1d6/frame 0xfffffe004db42ab0
fork_exit() at fork_exit+0x80/frame 0xfffffe004db42af0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004db42af0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100011 ]
Stopped at kdb_enter+0x37: movq $0,0x1084eb6(%rip)
db>

Reviewed by: cem, markj
Differential Revision: https://reviews.freebsd.org/D23073


# 7dce5659 21-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Convert to if_foreach_llmaddr() KPI.


# 7790c8c1 17-Oct-2019 Conrad Meyer <cem@FreeBSD.org>

Split out a more generic debugnet(4) from netdump(4)

Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421


# 0f6040f0 03-Jun-2019 Conrad Meyer <cem@FreeBSD.org>

virtio(4): Add PNP match metadata for virtio devices

Register MODULE_PNP_INFO for virtio devices using the newbus PNP information
provided by the previous commit. Matching can be quite simple; existing
probe routines only matched on bus (implicit) and device_type. The same
matching criteria are retained exactly, but is now also available to
devmatch(8).

Reviewed by: bryanv, markj; imp (earlier version)
Differential Revision: https://reviews.freebsd.org/D20407


# 132ea9f2 07-May-2019 Michael Tuexen <tuexen@FreeBSD.org>

Remove non-functional SCTP checksum offload support for virtio.

Checksum offloading for SCTP is not currently specified for virtio.
If the hypervisor announces checksum offloading support, it means TCP
and UDP checksum offload. If an SCTP packet is sent and the host announced
checksum offload support, the hypervisor inserts the IP checksum (16-bit)
at the correct offset, but this is not the right checksum, which is a CRC32c.
This results in all outgoing packets having the wrong checksum and therefore
breaking SCTP based communications.

This patch removes SCTP checksum offloading support from the virtio
network interface.

Thanks to Felix Weinrank for making me aware of the issue.

Reviewed by: bryanv@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20147


# 93ef2969 29-Jan-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

vtnet: fix typo in vtnet_free_taskqueues

Because of a typo, the code was mistakenly resetting the
vtnrx_vq pointer rather than vtntx_tq.

Reviewed by: bryanv
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D19015


# 2e42b74a 14-Nov-2018 Vincenzo Maffione <vmaffione@FreeBSD.org>

vtnet: fix netmap support

netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.

Changelist:
- handle errors returned by virtqueue_enqueue() properly (they were
previously ignored)
- make sure netmap XOR rest of the kernel access each virtqueue.
- compute the number of netmap slots for TX and RX separately, according to
whether indirect descriptors are used or not for a given virtqueue.
- make sure sglist are freed according to their type (mbufs or netmap
buffers)
- add support for mulitiqueue and netmap host (aka sw) rings.
- intercept VQ interrupts directly instead of intercepting them in txq_eof
and rxq_eof. This simplifies the code and makes it easier to make sure
taskqueues are not running for a VQ while it is in netmap mode.
- implement vntet_netmap_config() to cope with changes in the number of queues.

Reviewed by: bryanv
Approved by: gnn (mentor)
MFC after: 3 days
Sponsored by: Sunny Valley Networks
Differential Revision: https://reviews.freebsd.org/D17916


# d7c5a620 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366


# c857c7d5 05-May-2018 Mark Johnston <markj@FreeBSD.org>

Add netdump support to vtnet(4).

Tested with bhyve.

Reviewed by: bryanv, julian
MFC after: 1 month
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15261


# ac2fffa4 21-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by: wosch
PR: 225197


# 26c1d774 13-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

dev: make some use of mallocarray(9).

Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.


# 718cf2cc 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/dev: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# e414c660 13-Feb-2017 Philip Paeps <philip@FreeBSD.org>

vtnet: don't update VLAN filter when parent is not running

Submitted by: Gerrie Roos <groos -at- xiplink -dot- com>
Reviewed by: gnn
Sponsored by: XipLink, Inc.
Differential Revision: https://reviews.freebsd.org/D9573


# 4be723f6 11-Aug-2016 Steven Hartland <smh@FreeBSD.org>

Fix vtnet hang with max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS

Correctly limit npairs passed to vtnet_ctrl_mq_cmd. This ensures that
VQ_ALLOC_INFO_INIT is called with the correct value, preventing the system
from hanging when max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS.

Add new sysctl requested_vq_pairs which allow the user to configure
the requested number of virtqueue pairs. The actual value will still take
into account the system limits.

Also missing sysctls for the current tunables so their values can be seen.

PR: 207446
Reported by: Andy Carrel
MFC after: 3 days
Relnotes: Yes
Sponsored by: Multiplay


# 3fcb1aae 14-May-2016 Kristof Provost <kp@FreeBSD.org>

vtnet: fix panic on unload

Since r276367 added the virtio_mmio support vtnet_modevent() gets called twice.
This resulted in a memory leak during load and a panic on unload.

Count the loads so we only initialise once (just like cxgbe(4)), and only clean
up in the final unload.

PR: 209428
Submitted by: novel@FreeBSD.org
MFC after: 1 week


# 804fc8c8 03-Sep-2015 Marcelo Araujo <araujo@FreeBSD.org>

Lower the compiler warning: unused-but-set-variable.

Approved by: bapt (mentor)
Differential Revision: D3556


# 0fdeab7b 10-Jul-2015 Luigi Rizzo <luigi@FreeBSD.org>

add netmap dependency when compiled as a module


# 581e6970 13-Jun-2015 Kristof Provost <kp@FreeBSD.org>

Fix panic when adding vtnet interfaces to a bridge

vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host). if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR: 200210
Differential Revision: https://reviews.freebsd.org/D2804
Reviewed by: philip (mentor)


# cab10cc1 13-Jun-2015 Bryan Venteicher <bryanv@FreeBSD.org>

Fix typo when deregistering the VLAN unconfig event handler

Submitted by: Masao Uebayashi <uebayasi@tombiinc.com>
MFC after: 3 days


# 4dc78216 29-Apr-2015 John Baldwin <jhb@FreeBSD.org>

Don't free mbufs when stopping an interface in netmap mode.

Currently if you ifconfig down a vtnet interface while it is being used
via netmap, the kernel panics due to trying to treat the cookie values
in the virtio rings as mbufs to be freed. When netmap is enabled, these
cookie values are pointers to something else.

Note that other netmap-aware drivers don't seem to need this as they
store the mbuf pointers in the software rings that mirror the hardware
descriptor rings, and since netmap doesn't touch those, the software
state always has NULL mbuf pointers causing the loops to free mbufs to
not do anything. However, vtnet reuses the same state area for both
netmap and non-netmap mode, so it needs to explicitly avoid looking at
the rings and treating the cookie values as mbufs if netmap is
enabled.

Differential Revision: https://reviews.freebsd.org/D2348
Reviewed by: adrian, bryanv, luigi
MFC after: 1 week
Sponsored by: Norse Corp, Inc.


# ab4c2818 31-Dec-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Add softc flag for when the indirect descriptor feature was negotiated

MFC after: 2 weeks


# 5b32b2fa 31-Dec-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Use the appropriate IPv4 or IPv6 TSO HW assist flag

MFC after: 2 weeks


# e51f2e72 29-Dec-2014 Andrew Turner <andrew@FreeBSD.org>

Attach vtnet to virtio_mmio. Qemu provides this as an option with AArch64.

Sponsored by: The FreeBSD Foundation


# c2529042 01-Dec-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after: 1 month
Sponsored by: Mellanox Technologies


# 9a4dabdc 09-Nov-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Enable LRO by default when available on vtnet interfaces

The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.

PR: 185864
MFC after: 2 weeks


# 84047b19 18-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Provide if_get_counter() method for vtnet(4).
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
and in vtnet_get_counter().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 1bffa951 30-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by: Nginx, Inc.


# 4bf50f18 16-Aug-2014 Luigi Rizzo <luigi@FreeBSD.org>

Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after: 3 days.


# 32487a89 09-Jul-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Rework when the Tx queue completion interrupt is enabled

The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after: 1 month


# bae486f5 15-Jun-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Force two byte alignment for all control message headers

The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.

Obtained from: DragonFlyBSD
MFC after: 1 week


# fd5b3951 15-Jun-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Make the feature negotiation code easier to follow

MFC after: 1 week


# add526c6 15-Jun-2014 Bryan Venteicher <bryanv@FreeBSD.org>

- Remove two write-only local variables
- Remove unused element in the vtnet_rxq structure

MFC after: 1 week


# c26e5fc2 04-Jun-2014 Luigi Rizzo <luigi@FreeBSD.org>

make sure ifp->if_transmit returns 0 if a buffer is enqueued.
A similar fix should be applied to vmxnet, ixgbe, igb, i40e.
(some of them previously reported by Michael Tuexen)

Drivers using if_transmit are correct, and so are most of the
other drivers that reassing if_transmit.

Among other things, this bug causes panics when using netmap emulation
on top of generic drivers.

Approved by: bryanv
MFC after: 3 days


# b245f96c 12-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.

This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 54fb8142 01-Feb-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Use m_defrag() instead of m_collapse() to compact a long mbuf chain

This should be an infrequent occurrence, so remove the per-queue
counters in favor of just global counters in the softc.


# 443c3d0b 01-Feb-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Do not place the sglist used for Rx/Tx on the stack

The sglist segment array has grown to a bit over 512 bytes (on
64-bit system) which is more than ideally should be put on the
stack. Instead allocate an appropriately sized sglist and hang
it off each Rx/Tx queue structure.

Bump the maximum number of Tx segments to 64 to make it unlikely
we'll have defragment an mbuf chain. Our previous count was
rounded up to this value since it is the next power of two, so
effective memory usage should not change.

Also only allocate the maximum number of Tx segments if TSO was
negotiated.


# 9ef6342f 25-Jan-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Check for a full virtqueue in the multiqueue transmit path

With most hosts, we'll negotiate indirect descriptors, so all we
need is one available descriptor to transmit a frame.


# dd6f83a0 25-Jan-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Avoid queue unlock followed by relock when the enable interrupt race is lost

This already happens infrequently, and the hold time is still bounded since
we defer to a taskqueue after a few tries.


# bddddcd5 25-Jan-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Move duplicated transmit start code into a single function


# 5591e479 25-Jan-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Remove stray space


# 94716584 25-Jan-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Also include the mbuf's csum_flags in an assert message


# 1dbb21dc 25-Jan-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Read and write the MAC address in the config space byte by byte


# c3322cb9 28-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# d797300b 05-Oct-2013 Bryan Venteicher <bryanv@FreeBSD.org>

Do not hold the vtnet Rx queue lock when calling up into the stack

This matches other similar drivers and avoids various LOR warnings.

Approved by: re (marius)


# 6e03f319 02-Sep-2013 Bryan Venteicher <bryanv@FreeBSD.org>

Complete any pending Tx frames before attempting the next transmit

Also complete pending frames in the watchdog function when the
EVENT_IDX feature was negotiated just in case the completion
interrupt was postponed.


# 72d9611d 01-Sep-2013 Eitan Adler <eadler@FreeBSD.org>

Fix build with gcc

Reported by: Michael Butler <imb@protected-networks.net>
Reviewed by: jilles


# 8f3600b1 31-Aug-2013 Bryan Venteicher <bryanv@FreeBSD.org>

Import multiqueue VirtIO net driver from my user/bryanv/vtnetmq branch

This is a significant rewrite of much of the previous driver; lots of
misc. cleanup was also performed, and support for a few other minor
features was also added.


# abd6790c 04-Jul-2013 Bryan Venteicher <bryanv@FreeBSD.org>

Merge virtio changes from projects/virtio

Contains projects/virtio commits:

r245738:
virtio: Minor man page tweaks
r246060:
virtio: Cleanup feature description printing
r246306:
virtio: Remove old debugging flag
r247238:
virtio: Remove PRIx64 macros from format strings
r247239:
virtio: Constify some fields
r247240:
virtio: Minor code simplifications
r249962:
virtio: Update to my freebsd.org email address

MFC after: 1 month


# 3dd8d840 04-Jul-2013 Bryan Venteicher <bryanv@FreeBSD.org>

Merge vtnet changes from projects/virtio

Minor changes to the network driver. A multiqueue driver that is
a significant rewrite will be in merged shortly.

Contains projects/virtio commits:

r246058:
vtnet: Move an mbuf ASSERT to the calling function
r246059:
vtnet: Tweak ASSERT message

MFC after: 1 month


# 6632efe4 04-Jul-2013 Bryan Venteicher <bryanv@FreeBSD.org>

Convert VirtIO to use ithreads instead of taskqueues

Contains projects/virtio commits:

r245709:
Each VirtIO device was scheduling its own taskqueue(9) to do the
off-level interrupt handling. ithreads(9) is the more nature way
to do this. The primary motivation for this work to better support
network multiqueue.
r245710:
virtio: Change virtqueue intr handlers to return void
r245711:
virtio_blk: Remove interrupt taskqueue
r245721:
vtnet: Remove interrupt taskqueue
r245722:
virtio_scsi: Remove interrupt taskqueue
r245747:
vtnet: Remove taskqueue fields missed in r245721

MFC after: 1 month


# b059b01e 14-Jun-2013 Bryan Venteicher <bryanv@FreeBSD.org>

Merge r250802 from bryanv/vtnetmq - Fix setting of the Rx filters

QEMU 1.4 made the descriptor requirement stricter - the size of buffer
descriptor must exactly match the number of MAC addresses provided.

PR: kern/178955
MFC after: 5 days


# ac4b6bcd 13-Dec-2012 Bryan Venteicher <bryanv@FreeBSD.org>

virtio: Start taskqueues threads after attach cannot fail

If virtio_setup_intr() failed during boot, we would hang in
taskqueue_free() -> taskqueue_terminate() for all the taskq
threads to terminate. This will never happen since the
scheduler is not running by this point.

Reported by: neel, grehan
Approved by: grehan (mentor)


# c6499ecc 04-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.


# 310dacd0 10-Jul-2012 Peter Grehan <grehan@FreeBSD.org>

Various VirtIO improvements

PCI:
- Properly handle interrupt fallback from MSIX to MSI to legacy.
The host may not have sufficient resources to support MSIX,
so we must be able to fallback to legacy interrupts.
- Add interface to get the (sub) vendor and device IDs.
- Rename flags to VTPCI_FLAG_* like other VirtIO drivers.
Block:
- No longer allocate vtblk_requests from separate UMA zone.
malloc(9) from M_DEVBUF is sufficient. Assert segment counts
at allocation.
- More verbose error and debug messages.
Network:
- Remove stray write once variable.
Virtqueue:
- Shuffle code around in preparation of converting the mb()s to
the appropriate atomic(9) operations.
- Only walk the descriptor chain when freeing if INVARIANTS is
defined since the result is only KASSERT()ed.

Submitted by: Bryan Venteicher (bryanv@daemoninthecloset.org)


# b8a58707 13-Apr-2012 Peter Grehan <grehan@FreeBSD.org>

Catch up with Bryan Venteicher's virtio git repo:

a8af6270bd96be6ccd86f70b60fa6512b710e4f0
virtio_blk: Include function name in panic string

cbdb03a694b76c5253d7ae3a59b9995b9afbb67a
virtio_balloon: Do the notify outside of the lock

By the time we return from virtqueue_notify(), the descriptor
will be in the used ring so we shouldn't have to sleep.

10ba392e60692529a5cbc1e9987e4064e0128447
virtio: Use DEVMETHOD_END

80cbcc4d6552cac758be67f0c99c36f23ce62110
virtqueue: Add support for VIRTIO_F_RING_EVENT_IDX

This can be used to reduce the number of guest/host and
host/guest interrupts by delaying the interrupt until a
certain index value is reached.

Actual use by the network driver will come along later.

8fc465969acc0c58477153e4c3530390db436c02
virtqueue: Simplify virtqueue_nused()

Since the values just wrap naturally at UINT16_MAX, we
can just subtract the two values directly, rather than
doing 2's complement math.

a8aa22f25959e2767d006cd621b69050e7ffb0ae
virtio_blk: Remove debugging crud from 75dd732a

There seems to be an issue with Qemu (or FreeBSD VirtIO) that sets
the PCI register space for the device config to bogus values. This
only seems to happen after unloading and reloading the module.

d404800661cb2a9769c033f8a50b2133934501aa
virtio_blk: Use better variable name

75dd732a97743d96e7c63f7ced3c2169696dadd3
virtio_blk: Partially revert 92ba40e65

Just use the virtqueue to determine if any requests are
still inflight.

06661ed66b7a9efaea240f99f414c368f1bbcdc7
virtio_blk: error if allowed too few segments

Should never happen unless the host provides use with a
bogus seg_max value.

4b33e5085bc87a818433d7e664a0a2c8f56a1a89
virtio_blk: Sort function declarations

426b9f5cac892c9c64cc7631966461514f7e08c6
virtio_blk: Cleanup whitespace

617c23e12c61e3c2233d942db713c6b8ff0bd112
virtio_blk: Call disk_err() on error'd completed requests

081a5712d4b2e0abf273be4d26affcf3870263a9
virtio_blk: ASSERT the ready and inflight request queues are empty

a9be2631a4f770a84145c18ee03a3f103bed4ca8
virtio_blk: Simplify check for too many segments

At the cost of a small style violation.

e00ec09da014f2e60cc75542d0ab78898672d521
virtio_blk: Add beginnings of suspend/resume

Still not sure if we need to virtio_stop()/virtio_reinit()
the device before/after a suspend.

Don't start additional IO when marked as suspending.

47c71dc6ce8c238aa59ce8afd4bda5aa294bc884
virtio_blk: Panic when dealt an unhandled BIO cmd

1055544f90fb8c0cc6a2395f5b6104039606aafe
virtio_blk: Add VQ enqueue/dequeue wrappers

Wrapper functions managed the added/removing to the in-flight
list of requests.

Normally biodone() any completed IO when draining the virtqueue.

92ba40e65b3bb5e4acb9300ece711f1ea8f3f7f4
virtio_blk: Add in-flight list of requests

74f6d260e075443544522c0833dc2712dd93f49b
virtio_blk: Rename VTBLK_FLAG_DETACHING to VTBLK_FLAG_DETACH

7aa549050f6fc6551c09c6362ed6b2a0728956ef
virtio_blk: Finish all BIOs through vtblk_finish_bio()

Also properly set bio_resid in the case of errors. Most geom_disk
providers seem to do the same.

9eef6d0e6f7e5dd362f71ba097f2e2e4c3744882
Added function to translate VirtIO status to error code

ef06adc337f31e1129d6d5f26de6d8d1be27bcd2
Reset dumping flag when given unexpected parameters

393b3e390c644193a2e392220dcc6a6c50b212d9
Added missing VTBLK_LOCK() in dump handler

Obtained from: Bryan Venteicher bryanv at daemoninthecloset dot org


# 336f459c 05-Dec-2011 Peter Grehan <grehan@FreeBSD.org>

Catch up with Bryan Venteicher's virtio Hg repo:

c162516
Remove vtblk_sector_size

c162515
Wrap long license lines

c162514
Remove vtblk_unit

c162513
Wrap long lines in the license.

c162512
Remove verbose messages when link goes up/down.

A similar message is printed elsewhere as a result of
if_link_state_change().

c162511
Explicity compare pointer to NULL

c162510
Allocate the mac filter table at attach time.

c162509
Add real BSD licenses to the header files copied from Linux.

The chases upstream changes made in Linux awhile ago.

c162508
Only notify if we actually dequeued something.

c162507
Change a couple of if () { KASSERT(...) } to just KASSERTs.

In non-debug kernels, the if() { } probably get optomized
away, but I guess this is clearer.

c162506
Remove VIRTIO_BLK_F_TOPOLOGY fields in the config.

TOPOLOGY has since been removed from the spec, and the FreeBSD
didn't really do anything with the fields anyways.

c162505
Move vtblk_enqueue_request() outside the locks when getting the ident.

c162504
Remove soon to be uneeded trylock during dump [1].
http://lists.freebsd.org/pipermail/freebsd-current/2011-November/029226.html

c162503
Remove emtpy line

c162502
Drop frame if cannot allocate a vtnet_tx_header.

If we don't, we set OACTIVE, but if there are no
other frames in flight, vtnet_txeof() will never
be called to unset OACTIVE. The interface would
have to be down/up'ed in order to become usable.

We could be cuter here and only do this if the
virtqueue is emtpy, but its probably not worth
the complication.

c162501
Start mbuf replacement loop at 1 for clarity

Obtained from: Bryan Venteicher bryanv at daemoninthecloset dot org


# 10b59a9b 17-Nov-2011 Peter Grehan <grehan@FreeBSD.org>

Import virtio base, PCI front-end, and net/block/balloon drivers.
Tested on Qemu/KVM, VirtualBox, and BHyVe.

Currently built as modules-only on i386/amd64. Man pages not yet hooked
up, pending review.

Submitted by: Bryan Venteicher bryanv at daemoninthecloset dot org
Reviewed by: bz
MFC after: 4 weeks or so