History log of /freebsd-current/usr.sbin/bhyve/bhyverun.h
Revision Date Author Comments
# 390e4498 29-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Fix handling of -r

Just make "restore_file" a global variable so that it can be set by the
MD option handler.

Reviewed by: corvink
Reported by: bdrewery
Fixes: 981f9f7495bb ("bhyve: Push option parsing down into bhyverun_machdep.c")
Differential Revision: https://reviews.freebsd.org/D44974


# 981f9f74 03-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Push option parsing down into bhyverun_machdep.c

After a couple of attempts I think this is the cleanest approach despite
the expense of some code duplication. Quite a few of the single-letter
bhyve options are x86-specific.

I think that going forward we should strongly discourage the addition of
new options and instead configure guests using the more general
configuration file syntax.

Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D41753


# f82af74c 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move most early initialization into an MD routine

Prior to initializing PCI devices, main() calls a number of
initialization routines, many of which are amd64-specific. Move this
list of calls to bhyverun_machdep.c. Similarly, add an MD function to
handle late initialization.

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40989


# e20b74da 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move vcpu initialization into a MD source file

- Make handling of x86 config options, like x86.x2apic, conditional to
amd64.
- Move fbsdrun_set_capabilities() and spinup_vcpu() to a new file,
bhyverun_machdep.c. The moved code is all highly x86 specific.

I'm not sure how best to handle the namespace. I'm using "bhyve_" for
MD functions called from MI code. We also have "fbsdrun_" for some MI
routines that are typically called from MD code. The file name is
prefixed by "bhyverun_".

Reviewed by: corvink
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40987


# 72f9c9d8 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Split vmexit handling into a separate file

Put it in amd64, since most of it is MD and won't be used on arm64. Add
a bit of glue to bhyverun.h to make CPU startup and shutdown work
without having to export more global variables. AP startup will be
reworked further in a future revision.

This makes bhyverun.c much more machine-independent.

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40556


# 4f2bd402 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Start moving machine-dependent code into subdirectories

In preparation for an arm64 port, make an easy change which puts some
machine-dependent code in its own directory.

Going forward, code which is only used on one platform should live in a
MD directory. We should strive to layer modules in such a way as to
avoid polluting shared code with lots of ifdefs. For some existing
files this will take some effort.

task_switch.c and fwctl.c are an easy place to start: the former is very
x86-specific, and the latter provides an I/O port interface which can't
be used on anything other than x86. (fwcfg as implemented has the same
problem, but QEMU also supports a MMIO fwcfg interface.) So I propose
that we start by simply making those files conditional.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40501


# b3e76948 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# e17eca32 23-May-2023 Mark Johnston <markj@FreeBSD.org>

vmm: Avoid embedding cpuset_t ioctl ABIs

Commit 0bda8d3e9f7a ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI. This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.

Rework IPI reporting to avoid this problem. Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure. This data is
only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
vmexit info structure, and make VM_RUN copy it out separately. Zero
out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.

PR: 271330
Reviewed by: corvink, jhb (previous versions)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40113


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 7d9ef309 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

libvmmapi: Add a struct vcpu and use it in most APIs.

This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi. For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server. The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38124


# 461663dd 21-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Simplify setting vCPU capabilities.

- Enable VM_CAP_IPI_EXIT in fbsdrun_set_capabilities along with other
capabilities enabled on all vCPUs.

- Don't call fbsdrun_set_capabilities a second time on the BSP in
spinup_vcpu.

- To preserve previous behavior, don't unconditionally enable
unrestricted guest mode on the BSP (this unbreaks single-vCPU guests
on Nehalem systems, though supporting such setups is of dubious
value). Other places that enbale UG on the BSP are careful to check
the result of the operation and fail if it is not available.

- Don't set any capabilities in spinup_ap(). These are now all
redundant with earlier settings from spinup_vcpu().

- While here, axe a stale comment from fbsdrun_addcpu(). This
function is now always called from the main thread for all vCPUs.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37642


# 3b6cb9b4 08-Sep-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Avoid shadowing global variables in bhyverun.c

- Rename the global cores/sockets/threads to cpu_cores/sockets/threads.
This way, num_vcpus_allowed() doesn't shadow them.
- The global maxcpus is unused, remove it for the same reason.

MFC after: 1 week


# f703dc0e 23-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Put the prototype for vmexit_task_switch() in a header

No functional change intended.

MFC after: 1 week


# 9cc9abf4 07-Sep-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: create all vcpus on startup

vcpus could be restarted by the guest by sending an INIT SIPI SIPI
sequence to a vcpu. That's not supported by bhyve yet but it will be
supported in a future commit. So, create the vcpu threads only once on
startup to make restarting a vcpu easier.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35621
Sponsored by: Beckhoff Automation GmbH & Co. KG


# 621b5090 26-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Refactor configuration management in bhyve.

Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.

All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.

After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.

The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.

- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.

Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# 3facfc75 25-Apr-2019 Rodney W. Grimes <rgrimes@FreeBSD.org>

Make bhyve SMBIOS table topology aware

When the CPU Topology was added to bhyve in r332298 the SMBIOS table was
missed, this table passes topology information to the system and was still
using the old concept of each vCPU is a socket with 1 core and 1 thread.
This code did not even try to use the old sysctl information to adjust
this data.

Correct that by building a proper SMBios table, mapping the > 254 cases to
0 per the SMBios 2.6 specification that is claimed by the structure.

Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: bde and/or phk (mentor), jhb (maintainer)
MFC: 3 days
Differential Revision: https://reviews.freebsd.org/D18998


# 1de7b4b8 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

various: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 7e12dfe5 06-Jul-2016 Enji Cooper <ngie@FreeBSD.org>

Fix CTASSERT issue in a more clean way

- Replace all CTASSERT macro instances with static_assert's.
- Remove the WRAPPED_CTASSERT macro; it's now an unnecessary obfuscation.
- Localize all static_assert's to the structures being tested.
- Sort some headers per-style(9).

Approved by: re (hrs)
Differential Revision: https://reviews.freebsd.org/D7130
MFC after: 1 week
X-MFC with: r302364
Reviewed by: ed, grehan (maintainer)
Submitted by: ed
Sponsored by: EMC / Isilon Storage Division


# d087a399 17-Jan-2015 Neel Natu <neel@FreeBSD.org>

Simplify instruction restart logic in bhyve.

Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.

Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.

bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())

Differential Revision: https://reviews.freebsd.org/D1526
Reviewed by: grehan
MFC after: 2 weeks


# afd5e8ba 25-Jul-2014 Neel Natu <neel@FreeBSD.org>

Simplify the meaning of return values from the inout handlers. After this
change 0 means success and non-zero means failure.

This also helps to eliminate VMEXIT_POWEROFF and VMEXIT_RESET as return values
from VM-exit handlers.

CR: D480
Reviewed by: grehan, jhb


# 3d5444c8 16-Jul-2014 Neel Natu <neel@FreeBSD.org>

Add emulation for legacy x86 task switching mechanism.

FreeBSD/i386 uses task switching to handle double fault exceptions and this
change enables that to work.

Reported by: glebius


# 0826d045 20-Mar-2014 Neel Natu <neel@FreeBSD.org>

Use 'cpuset_t' to represent the vcpus active in a virtual machine.


# af5bfc53 04-Mar-2014 Tycho Nightingale <tychon@FreeBSD.org>

Add SMBIOS support.

A new option, -U, can be used to set the UUID in the System
Information (Type 1) structure. Manpage fix to follow.

Approved by: grehan (co-mentor)


# 062b878f 17-Oct-2013 Peter Grehan <grehan@FreeBSD.org>

Changes required for OpenBSD/amd64:

- Allow a hostbridge to be created with AMD as a vendor.
This passes the OpenBSD check to allow the use of MSI
on a PCI bus.
- Enable the i/o interrupt section of the mptable, and
populate it with unity ISA mappings. This allows the
'legacy' IRQ mappings of the PCI serial port to be
set up. Delete unused print routine that was obscuring code.
- Use the '-W' option to enable virtio single-vector MSI
rather than an environment variable. Update the virtio
net/block drivers to query this flag when setting up
interrupts.: bhyverun.c
- Fix the arithmetic used to derive the century byte in
RTC CMOS, as well as encoding it in BCD.

Reviewed by: neel
MFC after: 3 days


# 49cc03da 16-Oct-2013 Neel Natu <neel@FreeBSD.org>

Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Reviewed by: grehan
MFC after: 3 days


# 94c3b3bf 04-Oct-2013 Peter Grehan <grehan@FreeBSD.org>

Remove obsolete cmd-line options and code associated with
these.
The mux-vcpus option may return at some point, given it's utility
in finding bhyve (and FreeBSD) bugs.

Approved by: re@ (blanket)
Discussed with: neel@


# b060ba50 18-Mar-2013 Neel Natu <neel@FreeBSD.org>

Simplify the assignment of memory to virtual machines by requiring a single
command line option "-m <memsize in MB>" to specify the memory size.

Prior to this change the user needed to explicitly specify the amount of
memory allocated below 4G (-m <lowmem>) and the amount above 4G (-M <highmem>).

The "-M" option is no longer supported by 'bhyveload' and 'bhyve'.

The start of the PCI hole is fixed at 3GB and cannot be directly changed
using command line options. However it is still possible to change this in
special circumstances via the 'vm_set_lowmem_limit()' API provided by
libvmmapi.

Submitted by: Dinakar Medavaram (initial version)
Reviewed by: grehan
Obtained from: NetApp


# 91039bb2 28-Feb-2013 Neel Natu <neel@FreeBSD.org>

Specify the length of the mapping requested from 'paddr_guest2host()'.

This seems prudent to do in its own right but it also opens up the possibility
of not having to mmap the entire guest address space in the 'bhyve' process
context.

Discussed with: grehan
Obtained from: NetApp


# e285ef8d 12-Dec-2012 Peter Grehan <grehan@FreeBSD.org>

Rename fbsdrun.* -> bhyverun.*

bhyve is intended to be a generic hypervisor, and not FreeBSD-specific.

(renaming internal routines will come later)

Reviewed by: neel
Obtained from: NetApp