History log of /freebsd-current/usr.sbin/bhyve/pci_e82545.c
Revision Date Author Comments
# 4d65a7c6 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

usr.sbin: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 0f735657 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx member from struct vm_snapshot_meta.

This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel. For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument. As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38125


# 6a284cac 19-Jan-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx argument from PCI device model methods.

Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096


# 78c2cd83 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove unused vcpu argument from PCI read/write methods.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37652


# e7cd5fff 28-Nov-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Fix sign compare warnings in the e1000 device model.

Adding a bare constant to a uint16_t promotes to a signed int which
triggers these warnings. Changing the constant to be explicitly
unsigned instead promotes the expression to unsigned int.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37485


# ed721684 23-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Address some signed/unsigned comparison warnings

MFC after: 1 week


# cea34d07 25-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Address signed/unsigned comparison warnings in the e1000 model

No functional change intended.

MFC after: 1 week


# 03f7ccab 25-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Avoid arithmetic on void pointers

No functional change intended.

MFC after: 1 week


# eb805f4e 23-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Annotate an unused function as such

No functional change intended.

MFC after: 1 week


# 63898728 22-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Avoid arithmetic on void pointers

No functional change intended.

MFC after: 1 week


# 98d920d9 08-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Annotate unused function parameters

MFC after: 1 week


# 7afe342d 29-Aug-2022 John Baldwin <jhb@FreeBSD.org>

bhyve e1000: Sanitize transmit ring indices.

When preparing to transmit pending packets, ensure that the head (TDH)
and tail (TDT) indices are in bounds. Note that validating values
when they are written is not sufficient along as the transmit length
(TDLEN) could be changed turning a value that was valid when written
into an out of bounds value.

While here, add further restrictions to the head register (TDH). The
manual states that writing to this value while transmit is enabled can
cause unexpected behavior and that it should only be written after a
reset. As such, ignore attempts to write while transmit is active,
and also ignore writes of non-zero values. Later e1000 chipsets have
this register as read-only.

Also ignore any attempts to transmit packets if the transmit ring's
size is zero.

PR: 264567
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36269


# fa46f370 17-Aug-2022 John Baldwin <jhb@FreeBSD.org>

bhyve e1000: Skip packets with a small header.

Certain operations such as checksum insertion and VLAN insertion
require the device model to rewrite the packet header. The first step
in rewriting the packet header is to copy the existing packet header
from the source packet. This copy is done by copying data from an
iovec array that corresponds to the S/G entries described by transmit
descriptors. However, if the total packet length is smaller than the
headers that need to be copied as the initial template, this copy can
overflow the iovec array and use garbage values as the source pointer
to memcpy. The PR used a single descriptor with a length of 0 in its
PoC.

To fix, track the total packet length and drop requests to transmit
packets whose payload is smaller than the required header length.

While here, fix another issue where the final descriptor could have an
invalid length (too short) that could underflow 'len' when stripping
the checksum. Skip those requests instead, too.

PR: 264372
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: grehan, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36182


# 37045dfa 16-Aug-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Mark variables and functions as static where appropriate

Mark them const as well when it makes sense to do so. No functional
change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# b0aa20be 05-Apr-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: validate e82545 checksum offset field

Reported by: Mehdi Talbi, Synacktiv


# c2fa905c 26-Dec-2021 Toomas Soome <tsoome@FreeBSD.org>

bhyve: clean up trailing whitespaces

Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681


# 2616ee60 06-Dec-2021 Robert Wing <rew@FreeBSD.org>

bhyve: fix -Wunused-but-set-variable warning

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D33306


# 621b5090 26-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Refactor configuration management in bhyve.

Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.

All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.

After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.

The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.

- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.

Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035


# 60dc6bee 16-Oct-2020 Ryan Moeller <freqlabs@FreeBSD.org>

bhyve: Update TX descriptor base address and host mapping on change

bhyve sometimes segfaults when using an e1000 NIC with a Windows guest.

We are only updating our tdba and cached host mapping when the low address
register is written and when tx is set enabled, but not when the high address
or length registers are written. It is observed that Windows 10 is occasionally
enabling tx first then writing the registers in the order low, high, len. This
leaves us with a bogus base address and mapping, which causes a segfault later
when we try to copy from a descriptor that has unpredictable garbage in a
pointer.

Updating the address and mapping when any of those registers change seems to fix
that particular issue.

Reviewed by: mav, grehan (bhyve)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26798


# 5bebe923 08-May-2020 Aleksandr Fedorov <afedorov@FreeBSD.org>

bhyve: Pass the full string of options to the network backends.

Reviewed by: vmaffione
Approved by: vmaffione (mentor)
Sponsored by: vstack.com
Differential Revision: https://reviews.freebsd.org/D24735


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# 66c662b0 12-Feb-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: move virtio-net header processing to pci_virtio_net

This patch cleans up the API between the net frontends (e1000,
virtio-net) and the net backends (tap and netmap).
We move the virtio-net header stripping/prepending to the
virtio-net code, where this functionality belongs.
In this way, the netbe_send() and netbe_recv() signatures
can have const struct iov * rather than struct iov *.

Reviewed by: grehan, bcr, aleksandr.fedorov@itglobal.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23342


# 332eff95 08-Jan-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add wrapper for debug printf statements

Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.

Reviewed by: grehan, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22657


# 79c1428e 02-Dec-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: uniform printf format string newlines

Some of the printf statements only use LF to get a newline. However, a CR character is also required for the serial console to print debug logs in a nice way.
Fix those code locations that only use LF, by adding a CR character.

Reviewed by: markj, aleksandr.fedorov@itglobal.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22552


# 3e11768e 03-Nov-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add backend rx backpressure to virtio-net

If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().

Reviewed by: bryanv, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20987


# ed9ffd2f 05-Aug-2019 John Baldwin <jhb@FreeBSD.org>

Validate guest-supplied length of headers for TSO transmit requests.

When transmitting a large TCP packet, the final transmit descriptor
includes the length of the protocol headers to be duplicated on each
segment. The device model was trusting the guest-supplied value
without validating it. A value of zero would result in the guest
being able to indirect a garbage pointer on the stack to overwrite
arbitrary memory in the bhyve process. A value that was non-zero but
too small for the requested parameters resulted in the device model
reading and writing values beyond the end of the on-stack buffer used
to hold the template header.

To fix, validate the supplied length and drop requests to transmit
packets that would overflow the header buffer. While here, initialize
the header pointer to NULL as a preventive measure so that any access
to an unallocated template header crashes they hypervisor
deterministically.

While here, only read the TCP sequence number if the packet being
split is a TCP packet. The e1000 logic supports a segmentation of UDP
frames, and while UDP segmentation requires this part of the header to
be valid (so there is no buffer overflow), only reading the field when
needed is cleaner.

admbugs: 918
Reported by: Reno Robert <renorobert@gmail.com>
Reviewed by: markj
Approved by: so
Security: CVE-2019-5609


# 0ff7076b 06-Jul-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: abstraction for network backends

Bhyve can currently emulate two virtual NICs, namely virtio-net and e1000,
and connect to the host network through two backends, namely tap and netmap.
However, there is no interface between virtual NIC functionalities and
backend functionalities. As a result, the backend code is duplicated between
the two virtual NIC implementations and also within the same virtual NIC.
Also, e1000 cannot currently use netmap as a backend.
This patch introduces a network backend API between virtio-net/e1000 and
tap/netmap, to improve code reuse and add missing functionalities.
Virtual NICs and backends can negotiate virtio-net features, such as checksum
offload and TSO. If the backend supports the features, it will propagate this
information to the guest, so that the latter can make use of them. Currently,
only netmap VALE ports support the features, but support should be added to
tap in the future.

Reviewed by: jhb, bryanv
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20659


# 4f7c3b7b 13-Jun-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: move common code to net_utils.c

Both virtio_net and e82545 network frontends have code to validate and
generate MAC addresses. These functionalities are replicated in the two
files, so we move them in a separate compilation unit.

Reviewed by: rgrimes, bryanv, imp, kevans
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20626


# abfa3c39 15-Jan-2019 Marcelo Araujo <araujo@FreeBSD.org>

Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.

It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.

Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744


# 989e062b 10-Jul-2018 Marcelo Araujo <araujo@FreeBSD.org>

Improve bhyve exit(3) error code.

The bhyve(8) exit status indicates how the VM was terminated:

0 rebooted
1 powered off
2 halted
3 triple fault

The problem is when we have wrappers around bhyve that parses the exit
error code and gets an exit(1) for an error but interprets it as "powered off".
So to mitigate this issue and makes it less error prone for third part
applications, I have added a new exit code 4 that is "exited due to an error".

For now the bhyve(8) exit status are:
0 rebooted
1 powered off
2 halted
3 triple fault
4 exited due to an error

Reviewed by: @jhb
MFC after: 2 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D16161


# f7224b70 13-Jun-2018 Marcelo Araujo <araujo@FreeBSD.org>

Fix style(9) space vs tab.

Reviewed by: jhb
MFC after: 3 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15768


# ce80faa4 12-Jun-2018 Marcelo Araujo <araujo@FreeBSD.org>

Add SPDX tags to bhyve(8).

Discussed with: rgrimes, pfg and mav.
Obtained from: TrueOS
MFC after: 4 weeks.
Sponsored by: iXsystems Inc.


# 558e4950 28-Jul-2017 Ryan Libby <rlibby@FreeBSD.org>

bhyve/pci_e82545.c: squelch gcc warning for noreturn procedure

Gcc complained that e82545_tx_thread has a return type declared but
doesn't return anything. Annotate the procedure with _Noreturn.

Reviewed by: grehan
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11774


# 00ef17be 14-Feb-2017 Bartek Rutkowski <robak@FreeBSD.org>

Capsicum support for bhyve(8).

Adds Capsicum sandboxing to bhyve.

Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by: grehan, oshogbo
Approved by: emaste, grehan
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D8290


# 9287c032 29-Aug-2016 Marcelo Araujo <araujo@FreeBSD.org>

Invert calloc(3) argument order.

Reviewed by: grehan, mav
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D7613


# ee7230f4 16-Jul-2016 Alexander Motin <mav@FreeBSD.org>

Increase I82545_MAX_TXSEGS from 20 to 64 and add checks for it.

There seems no hard limit on number of segments per packet in the chip,
and 20 appeared insufficient. Hope 64 will be enough, but if not -- add
check to report that and drop the packet instead of corrupting stack.


# e95b7573 12-Jul-2016 Alexander Motin <mav@FreeBSD.org>

Make unknown register reads predictable.

Reported by: Coverity
CID: 1357525


# a88b19f9 12-Jul-2016 Alexander Motin <mav@FreeBSD.org>

Add missing breaks in I/O BAR read/write.

This could be important if any guest actually used those registers.

Reported by: Coverity
CID: 1357519, 1357520


# 9e749f25 09-Jul-2016 Alexander Motin <mav@FreeBSD.org>

Add emulation for Intel e1000 (e82545) network adapter.

The code was successfully tested with FreeBSD, Linux, Solaris and Windows
guests. This interface is predictably slower (about 2x) then virtio-net,
but it is very helpful for guests not supporting virtio-net by default.

Thanks to Jeremiah Lott and Peter Grehan for doing original heavy lifting.