History log of /freebsd-current/usr.sbin/bhyve/pci_virtio_net.c
Revision Date Author Comments
# 4d65a7c6 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

usr.sbin: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# b3e76948 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 0dc159ce 01-Jun-2023 Elyes Haouas <ehaouas@noos.fr>

bhyve: Fix typos

Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/653


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 6a284cac 19-Jan-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx argument from PCI device model methods.

Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096


# 3b5e5ce8 23-Oct-2022 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: Handle snapshots of unconfigured virtio-net devices

In case of device reset or not configured - features_negotiated is not
set, calling calling pci_vtnet_neg_features is wrong and resume gets
"Segmentation fault".

Reviewed by: markj
Sponsored by: vStack
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36244


# 98d920d9 08-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Annotate unused function parameters

MFC after: 1 week


# 6cb26162 29-Sep-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Use designated initializers for virtio_consts tables

This is easier to read and addresses some compiler warnings.

One might expect these tables to be read-only but it seems that the
snapshot/restore code may modify them.

MFC after: 2 weeks


# 37045dfa 16-Aug-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Mark variables and functions as static where appropriate

Mark them const as well when it makes sense to do so. No functional
change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# c2fa905c 26-Dec-2021 Toomas Soome <tsoome@FreeBSD.org>

bhyve: clean up trailing whitespaces

Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681


# b0139127 30-Mar-2021 Ka Ho Ng <khng@FreeBSD.org>

bhyve: change vq_getchain to return iovecs in both directions

The old prototype requires callers to inspect flags of each descriptors
to get the starting position of host-writable iovecs.

vq_getchain() is changed to return a virtio request with the number of
host-readable iovecs and host-writable iovecs instead. Callers can avoid
boilerplate code of getting the start offset of host-writable iovecs.

Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Reviewed by: afedorov
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29433


# 621b5090 26-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Refactor configuration management in bhyve.

Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.

All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.

After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.

The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.

- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.

Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035


# 54ac6f72 16-Mar-2021 Ka Ho Ng <khng300@gmail.com>

bhyve: virtio shares definitions between sys/dev/virtio

Definitions inside usr.sbin/bhyve/virtio.h are thrown away.
Definitions in sys/dev/virtio are used instead.

This reduces code duplication.

Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29084


# ea57b2d7 17-Dec-2020 Aleksandr Fedorov <afedorov@FreeBSD.org>

[bhyve] virtio-net: Do not allow receiving packets until features have been negotiated.

Enforce the requirement that the RX callback cannot be called after a reset until the features have been negotiated.
This fixes a race condition where the receive callback is called during a device reset.

Reviewed by: vmaffione, grehan
Approved by: vmaffione (mentor)
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D27381


# 5bebe923 08-May-2020 Aleksandr Fedorov <afedorov@FreeBSD.org>

bhyve: Pass the full string of options to the network backends.

Reviewed by: vmaffione
Approved by: vmaffione (mentor)
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D24735


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# 1ff57e3a 07-Apr-2020 Aleksandr Fedorov <afedorov@FreeBSD.org>

Add VIRTIO_NET_F_MTU flag support for the bhyve virtio-net device.
The flag can be enabled using the new 'mtu' option:
bhyve -s X:Y:Z,virtio-net,[tapN|valeX:N],mtu=9000

Reported by: vmaffione, jhb
Approved by: vmaffione (mentor)
Differential Revision: https://reviews.freebsd.org/D23971


# f92bb8c1 20-Feb-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: enable virtio-net mergeable rx buffers for tap(4)

This patch adds a new netbe_peek_recvlen() function to the net
backend API. The new function allows the virtio-net receive code
to know in advance how many virtio descriptors chains will be
needed to receive the next packet. As a result, the implementation
of the virtio-net mergeable rx buffers feature becomes efficient,
so that we can enable it also with the tap(4) backend. For the
tap(4) backend, a bounce buffer is introduced to implement the
peeck_recvlen() callback, which implies an additional packet copy
on the receive datapath. In the future, it should be possible to
remove the bounce buffer (and so the additional copy), by
obtaining the length of the next packet from kevent data.

Reviewed by: grehan, aleksandr.fedorov@itglobal.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23472


# 66c662b0 12-Feb-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: move virtio-net header processing to pci_virtio_net

This patch cleans up the API between the net frontends (e1000,
virtio-net) and the net backends (tap and netmap).
We move the virtio-net header stripping/prepending to the
virtio-net code, where this functionality belongs.
In this way, the netbe_send() and netbe_recv() signatures
can have const struct iov * rather than struct iov *.

Reviewed by: grehan, bcr, aleksandr.fedorov@itglobal.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23342


# 332eff95 08-Jan-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add wrapper for debug printf statements

Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.

Reviewed by: grehan, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22657


# 79c1428e 02-Dec-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: uniform printf format string newlines

Some of the printf statements only use LF to get a newline. However, a CR character is also required for the serial console to print debug logs in a nice way.
Fix those code locations that only use LF, by adding a CR character.

Reviewed by: markj, aleksandr.fedorov@itglobal.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22552


# d70b2069 19-Nov-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: virtio-net: disable receive until features are negotiated

This patch fixes a race condition where the receive callback is called
while the device is being reset. Since the rx_merge variable may change
during reset, the receive callback may operate inconsistently with what
the guest expects.
Also, get rid of the unused rx_vhdrlen variable.

PR: 242023
Reported by: aleksandr.fedorov@itglobal.com
Reviewed by: markj, jhb
MFC with: r354552
Differential Revision: https://reviews.freebsd.org/D22440


# d55e0373 08-Nov-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add support for virtio-net mergeable rx buffers

Mergeable rx buffers is a virtio-net feature that allows the hypervisor
to use multiple RX descriptor chains to receive a single receive packet.
Without this feature, a TSO-enabled guest is compelled to publish only
64K (or 32K) long chains, and each of these large buffers is consumed
to receive a single packet, even a very short one. This is a waste of
memory, as a RX queue has room for 256 chains, which means up to 16MB
of buffer memory for each (single-queue) vtnet device.
With the feature on, the guest can publish 2K long chains, and the
hypervisor will merge them as needed.

This change also enables the feature in the netmap backend, which
supports virtio-net offloads. We plan to add support for the
tap backend too.
Note that differently from QEMU/KVM, here we implement one-copy receive,
while QEMU uses two copies.

Reviewed by: jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21007


# 3e11768e 03-Nov-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add backend rx backpressure to virtio-net

If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().

Reviewed by: bryanv, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20987


# c7cb7db8 11-Jul-2019 Sean Chittenden <seanc@FreeBSD.org>

usr.sbin/bhyve: free resources when erroring out of pci_vtnet_init()

Coverity CID: 1402978
Approved by: vmaffione
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D20912


# 0ff7076b 06-Jul-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: abstraction for network backends

Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.

Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659


# 5c2b348a 18-Jun-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: vtnet: fix locking on receive

The vsc_rx_ready and the RX virtqueue is protected by the rx_mtx lock.
However, pci_vtnet_ping_rxq() (currently called only once after each
device reset) accesses those without acquiring the lock.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20609


# 4f7c3b7b 13-Jun-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: move common code to net_utils.c

Both virtio_net and e82545 network frontends have code to validate and
generate MAC addresses. These functionalities are replicated in the two
files, so we move them in a separate compilation unit.

Reviewed by: rgrimes, bryanv, imp, kevans
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20626


# 17e9052c 11-Jun-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: virtio: introduce vq_kick_enable() and vq_kick_disable()

The VirtIO standard supports two schemes for notification suppression:
a notification enable bit and a more sophisticated one (event_idx) that
also supports delayed notifications. Currently bhyve fully supports
only the first scheme. This patch hides the notification suppression
internals by means of two inline routines, vq_kick_enable() and
vq_kick_disable(), and makes the code more readable.
Moreover, further improve readability by replacing the call to mb()
with a call to atomic_thread_fence_seq_cst(), which is already used
in virtio.c

Reviewed by: pmooney_pfmooney.com, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20581


# f3b1307e 08-Jun-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: vtnet: simplify thread synchronization

On vtnet device reset it is necessary to wait for threads to stop TX and
RX processing. However, the rx_in_progress variable (used for to wait for
RX processing to stop) is actually useless, and can be removed. Acquiring
and releasing the RX lock is enough to synchronize correctly. Moreover,
it is possible to reset the device while holding both TX and RX locks, so
that the "resetting" variable becomes unnecessary for the RX thread, and
can be protected by the TX lock (instead of being volatile).

Reviewed by: jhb, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20543


# abfa3c39 15-Jan-2019 Marcelo Araujo <araujo@FreeBSD.org>

Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.

It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.

Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744


# f7224b70 13-Jun-2018 Marcelo Araujo <araujo@FreeBSD.org>

Fix style(9) space vs tab.

Reviewed by: jhb
MFC after: 3 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15768


# 1de7b4b8 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

various: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 00ef17be 14-Feb-2017 Bartek Rutkowski <robak@FreeBSD.org>

Capsicum support for bhyve(8).

Adds Capsicum sandboxing to bhyve.

Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by: grehan, oshogbo
Approved by: emaste, grehan
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D8290


# 69ab3091 14-Mar-2016 George V. Neville-Neil <gnn@FreeBSD.org>

Fix typo: nmd->cur_tx_ring should be used in pci_vtnet_netmap_writev()
The buffer length should be checked to avoid overflow, but there
is no API to get the slot length, so the hardcoded value is used.
Return the currently-first request chain back to the available
queue if there are no more packets.
Report the link as up if we managed to open vale port.
Use consistent coding style.

Submitted by: btw
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5595


# 5ffa1d26 10-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Fix bhyve(1) operation on vmnet devices, broken in r293459.


# b6020475 08-Jan-2016 George V. Neville-Neil <gnn@FreeBSD.org>

Add netmap support for bhyve

Submitted by: btw
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D4826


# 1e306308 01-Oct-2015 Peter Grehan <grehan@FreeBSD.org>

- Increase the max number of indirect descriptors to match
the largest that the Windows virtio driver can send down

- Always advertize indirect descriptors. The Illumos virtio
driver won't attach unless this capability is seen.

Reviewed by: neel


# 604b5210 13-May-2015 Peter Grehan <grehan@FreeBSD.org>

Set the subvendor field in config space to the vendor ID.
This is required by the Windows virtio drivers to correctly
match a device.

Submitted by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks


# 79f1cdb4 06-May-2015 Alexander Motin <mav@FreeBSD.org>

Add memory barrier to r281764.

While race at this point may cause only a single packet delay and so was
not really reproduced, it is better to not have it at all.

MFC after: 1 week


# 910280e5 20-Apr-2015 Alexander Motin <mav@FreeBSD.org>

Report link as up if tap device is not specified (black hole).

MFC after: 2 weeks


# f2c58daa 20-Apr-2015 Alexander Motin <mav@FreeBSD.org>

Report link as up only if we managed to open tap device.

It would be cool to report tap device status, but it has no such API.

MFC after: 2 weeks


# d9a66983 20-Apr-2015 Alexander Motin <mav@FreeBSD.org>

Disable RX/TX queues notifications when not needed.

This reduces CPU load and doubles iperf throughput, reaching 2-3Gbit/s.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# fed2d5ed 26-Mar-2015 Peter Grehan <grehan@FreeBSD.org>

Move legacy interrupt allocation for virtio devices to common code.
There are a number of assumptions about legacy interrupts always
being available in virtio so don't allow back-ends to make the
decision to support them.

This fixes the issue seen with virtio-rnd on OpenBSD. MSI-x vectors
were not being used, and the virtio-rnd backend wasn't allocating a
legacy interrupt resulting in a bhyve assert and guest exit.

Reported by: Julian Hsiao, madoka at nyanisore dot net
Reviewed by: neel
MFC after: 1 week


# 7315946b 15-Mar-2015 Alexander Motin <mav@FreeBSD.org>

Fix networking problem after r280026.

I've missed that network driver sometimes returns taken request back to
available queue without processing. Add new helper function for that case.

Reported by: flo
MFC after: 2 weeks


# fdb7e97f 15-Mar-2015 Alexander Motin <mav@FreeBSD.org>

Modify virtqueue helpers added in r253440 to allow queuing.

Original virtqueue design allows queued and out-of-order processing, but
helpers added in r253440 suppose only direct blocking in-order one.
It could be fine for network, etc., but it is a huge limitation for storage
devices.


# 82560f19 09-Sep-2014 Peter Grehan <grehan@FreeBSD.org>

Allow vtnet operation without merged rx buffers.

NetBSD's virtio-net implementation doesn't negotiate
the merged rx-buffers feature. To support this, check
to see if the feature was negotiated, and then adjust
the operation of the receive path accordingly by using
a larger iovec, and a smaller rx header.
In addition, ignore writes to the (read-only) status byte.

Tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta.

Reviewed by: neel, tychon
Phabric: D745
MFC after: 3 days


# e18f344b 08-Sep-2014 Peter Grehan <grehan@FreeBSD.org>

Add a callback to be notified about negotiated features.

Submitted by: luigi
Obtained from: Vincenzo Maffione, Universita` di Pisa
MFC after: 3 days


# 994f858a 22-Apr-2014 Xin LI <delphij@FreeBSD.org>

Use calloc() in favor of malloc + memset.

Reviewed by: neel


# 3cbf3585 29-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Enhance the support for PCI legacy INTx interrupts and enable them in
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
that share a slot attempt to distribute their INTx interrupts across
the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
and when the INTx DIS bit is set in a function's PCI command register.
Either assert or deassert the associated I/O APIC pin when the
state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.

Submitted by: neel (7)
Reviewed by: neel
MFC after: 2 weeks


# 7f5487ac 05-Nov-2013 Peter Grehan <grehan@FreeBSD.org>

Add the VM name to the process name with setproctitle().
Remove the VM name from some of the thread-naming calls
since it is now in the proc title.
Slightly modify the thread-naming for the net and block
threads.

This improves readability when using top/ps with the -a
and -H options on a system with a large number of bhyve VMs.

Requested by: Michael Dexter
Reviewed by: neel
MFC after: 4 weeks


# 062b878f 17-Oct-2013 Peter Grehan <grehan@FreeBSD.org>

Changes required for OpenBSD/amd64:

- Allow a hostbridge to be created with AMD as a vendor.
This passes the OpenBSD check to allow the use of MSI
on a PCI bus.
- Enable the i/o interrupt section of the mptable, and
populate it with unity ISA mappings. This allows the
'legacy' IRQ mappings of the PCI serial port to be
set up. Delete unused print routine that was obscuring code.
- Use the '-W' option to enable virtio single-vector MSI
rather than an environment variable. Update the virtio
net/block drivers to query this flag when setting up
interrupts.: bhyverun.c
- Fix the arithmetic used to derive the century byte in
RTC CMOS, as well as encoding it in BCD.

Reviewed by: neel
MFC after: 3 days


# ba41c3c1 17-Jul-2013 Peter Grehan <grehan@FreeBSD.org>

Major rework of the virtio code. Split out common parts, and modify
the net/block devices accordingly.

Submitted by: Chris Torek torek at torek dot net
Reviewed by: grehan


# a38e2a64 03-Jul-2013 Peter Grehan <grehan@FreeBSD.org>

Support an optional "mac=" parameter to virtio-net config, to allow
users to set the MAC address for a device.

Clean up some obsolete code in pci_virtio_net.c

Allow an error return from a PCI device emulation's init routine
to be propagated all the way back to the top-level and result in
the process exiting.

Submitted by: Dinakar Medavaram dinnu sun at gmail (original version)


# b1f31245 02-May-2013 Neel Natu <neel@FreeBSD.org>

Implement the NOTIFY_ON_EMPTY capability in the virtio-net device.

If this capability is negotiated by the guest then the device will
generate an interrupt when it runs out of available tx/rx descriptors.

Reviewed by: grehan
Obtained from: NetApp


# 3b207d1e 29-Apr-2013 Neel Natu <neel@FreeBSD.org>

Reset some more softc state when the guest resets the virtio network device.

Obtained from: NetApp


# 2a80be7b 29-Apr-2013 Neel Natu <neel@FreeBSD.org>

Use a separate mutex for the receive path instead of overloading the softc
mutex for this purpose.

Reviewed by: grehan


# 88d1272e 27-Apr-2013 Neel Natu <neel@FreeBSD.org>

Get rid of the 'vsc_rxpend' state - it doesn't serve any purpose because we
drop any frames that arrive while the device is starved for receive buffers.

This makes the receive path to only execute in context of the receive thread
and allows for further simplification.

Reviewed by: grehan


# 199fee4e 25-Apr-2013 Peter Grehan <grehan@FreeBSD.org>

Use a thread for the processing of virtio tx descriptors rather
than blocking the vCPU thread. This improves bulk data performance
by ~30-40% and doesn't harm req/resp time for stock netperf runs.

Future work will use a thread pool rather than a thread per tx queue.

Submitted by: Dinakar Medavaram
Reviewed by: neel, grehan
Obtained from: NetApp


# b060ba50 18-Mar-2013 Neel Natu <neel@FreeBSD.org>

Simplify the assignment of memory to virtual machines by requiring a single
command line option "-m <memsize in MB>" to specify the memory size.

Prior to this change the user needed to explicitly specify the amount of
memory allocated below 4G (-m <lowmem>) and the amount above 4G (-M <highmem>).

The "-M" option is no longer supported by 'bhyveload' and 'bhyve'.

The start of the PCI hole is fixed at 3GB and cannot be directly changed
using command line options. However it is still possible to change this in
special circumstances via the 'vm_set_lowmem_limit()' API provided by
libvmmapi.

Submitted by: Dinakar Medavaram (initial version)
Reviewed by: grehan
Obtained from: NetApp


# 1e7d750c 15-Mar-2013 Neel Natu <neel@FreeBSD.org>

Change the type of 'ndesc' from 'int' to 'uint16_t' so that descriptor index
wraparound is handled correctly.

The gory details are available here:
http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-March/001119.html

This fixes a regression introduced in r247871.

Pointed out by: Bruce Evans, Chris Torek


# 6be7c5e3 06-Mar-2013 Peter Grehan <grehan@FreeBSD.org>

Simplify virtio ring num-available calculation.

Submitted by: Chris Torek, torek at torek dot net


# 91039bb2 28-Feb-2013 Neel Natu <neel@FreeBSD.org>

Specify the length of the mapping requested from 'paddr_guest2host()'.

This seems prudent to do in its own right but it also opens up the possibility
of not having to mmap the entire guest address space in the 'bhyve' process
context.

Discussed with: grehan
Obtained from: NetApp


# aa12663f 31-Jan-2013 Neel Natu <neel@FreeBSD.org>

Fix a bug in the passthru implementation where it would assume that all
devices are MSI-X capable. This in turn would lead it to treat bar 0 as
the MSI-X table bar even if the underlying device did not support MSI-X.

Fix this by providing an API to query the MSI-X table index of the emulated
device. If the underlying device does not support MSI-X then this API will
return -1.

Obtained from: NetApp


# c9b4e987 29-Jan-2013 Neel Natu <neel@FreeBSD.org>

Add support for MSI-X interrupts in the virtio network device and make that
the default.

The current behavior of advertising a single MSI vector can be requested by
setting the environment variable "BHYVE_USE_MSI" to "true". The use of MSI
is not compliant with the virtio specification and will be eventually phased
out.

Submitted by: Gopakumar T
Obtained from: NetApp


# e285ef8d 12-Dec-2012 Peter Grehan <grehan@FreeBSD.org>

Rename fbsdrun.* -> bhyverun.*

bhyve is intended to be a generic hypervisor, and not FreeBSD-specific.

(renaming internal routines will come later)

Reviewed by: neel
Obtained from: NetApp


# 31f78e49 12-Dec-2012 Peter Grehan <grehan@FreeBSD.org>

Properly reset the tx/rx rings when a guest requests a device reset.

Obtained from: NetApp


# 100ace48 12-Dec-2012 Peter Grehan <grehan@FreeBSD.org>

Create unique MAC addresses for virtio devices that are
created with non-zero PCI function numbers.

Remove obsolete reference to CFE.

Obtained from: NetApp


# 4d1e669c 19-Oct-2012 Peter Grehan <grehan@FreeBSD.org>

Rework how guest MMIO regions are dealt with.

- New memory region interface. An RB tree holds the regions,
with a last-found per-vCPU cache to deal with the common case
of repeated guest accesses to MMIO registers in the same page.

- Support memory-mapped BARs in PCI emulation.

mem.c/h - memory region interface

instruction_emul.c/h - remove old region interface.
Use gpa from EPT exit to avoid a tablewalk to
determine operand address. Determine operand size
and use when calling through to region handler.

fbsdrun.c - call into region interface on paging
exit. Distinguish between instruction emul error
and region not found

pci_emul.c/h - implement new BAR callback api.
Split BAR alloc routine into routines that
require/don't require the BAR phys address.

ioapic.c
pci_passthru.c
pci_virtio_block.c
pci_virtio_net.c
pci_uart.c - update to new BAR callback i/f

Reviewed by: neel
Obtained from: NetApp


# 6214e48c 07-Jun-2011 Peter Grehan <grehan@FreeBSD.org>

Allow access to the device's config area with any size i/o access at any
offset. This is now spec-compliant.


# 366f6083 12-May-2011 Peter Grehan <grehan@FreeBSD.org>

Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
bhyve - user-space sequencer and i/o emulation
vmmctl - dump of hypervisor register state
libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
Joe CaraDonna
Peter Snyder
Jeff Heller
Sandeep Mann
Steve Miller
Brian Pawlowski