History log of /freebsd-current/usr.sbin/bhyve/bhyverun.c
Revision Date Author Comments
# 1787871a 13-May-2024 Pierre Pronchery <pierre@freebsdfoundation.org>

bhyve: avoid resource leak

In bhyve_parse_config_option(), a string is allocated and passed to
nvlist_add_string() but not free'd afterwards.

Reported by: Coverity
CID: 1544049
Sponsored by: The FreeBSD Foundation

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1234


# 390e4498 29-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Fix handling of -r

Just make "restore_file" a global variable so that it can be set by the
MD option handler.

Reviewed by: corvink
Reported by: bdrewery
Fixes: 981f9f7495bb ("bhyve: Push option parsing down into bhyverun_machdep.c")
Differential Revision: https://reviews.freebsd.org/D44974


# 981f9f74 03-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Push option parsing down into bhyverun_machdep.c

After a couple of attempts I think this is the cleanest approach despite
the expense of some code duplication. Quite a few of the single-letter
bhyve options are x86-specific.

I think that going forward we should strongly discourage the addition of
new options and instead configure guests using the more general
configuration file syntax.

Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D41753


# ff50e9d5 03-Apr-2024 Andrew Turner <andrew@freebsd.org>

bhyve: Add bhyverun and vmexit handlers for arm64

Reviewed by: corvink, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41006


# 4d65a7c6 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

usr.sbin: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 7de58287 17-Oct-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: Remove init_snapshot() and initialize static vars

vCPU threads are starting before init_snapshot() is called. That can lead
to corruption of vcpu_lock userspace mutex (snapshot.c) and then VM hangs
in acquiring that mutex.

init_snapshot() initializes only static variables (mutex, cv) and that
code can be optimized and removed.

Fixes: 9a9a248964696 ("bhyve: init checkput before caph_enter")
Reviewed by: markj
MFC after: 1 week
Sponsored by: vStack


# b0936440 16-Oct-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Replace many fprintf(stderr, ...) calls with EPRINTLN

EPRINTLN handles newlines appropriately when stdout/stderr have been
reused as the backend for a serial port.

For bhyverun.c itself, the rule this attempts to follow is to use
regular fprintf/perror/warn/err prior to init_pci() (which is when
serial ports are configured) and to switch to EPRINTLN afterwards.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D42182


# 7228ad8d 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move the vm_inject_fault() implementation to vmexit.c

This function isn't generic and has a different signature on arm64. No
functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40991


# f82af74c 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move most early initialization into an MD routine

Prior to initializing PCI devices, main() calls a number of
initialization routines, many of which are amd64-specific. Move this
list of calls to bhyverun_machdep.c. Similarly, add an MD function to
handle late initialization.

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40989


# e20b74da 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move vcpu initialization into a MD source file

- Make handling of x86 config options, like x86.x2apic, conditional to
amd64.
- Move fbsdrun_set_capabilities() and spinup_vcpu() to a new file,
bhyverun_machdep.c. The moved code is all highly x86 specific.

I'm not sure how best to handle the namespace. I'm using "bhyve_" for
MD functions called from MI code. We also have "fbsdrun_" for some MI
routines that are typically called from MD code. The file name is
prefixed by "bhyverun_".

Reviewed by: corvink
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40987


# ca2cda98 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Make gdb support optional

Add a BHYVE_GDB_SUPPORT make variable that can be set by per-arch
makefiles. When set, BHYVE_GDB is defined and can be used as a
preprocessor predicate. Use it to guard gdb stub calls in MI code.

The arm64 bhyve port currently does not have a functional gdb stub, but
that's not critical to landing the port, so this mechanism slightly
reduces the friction of adding support for a new platform.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40986


# 31cf78c9 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Make most I/O port handling specific to amd64

- The qemu_fwcfg interface, as implemented, is I/O port-based, but QEMU
implements an MMIO interface that we'll eventually want to port for
arm64.
- Retain support for I/O space PCI BARs, simply treat them like MMIO
BARs for most purposes, similar to what the arm64 kernel does. Such
BARs are created by virtio devices.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40741


# 55c13f6e 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move legacy PCI interrupt handling under amd64/

Specifically, move IO-APIC, LPC and PIRQ routing code under amd64/.

Use ifdefs to conditionally compile related code in other files. In
particular, legacy PCI interrupt handling is now compiled only on amd64.
This is not too invasive, but suggestions for a more modular approach
would be appreciated.

I am not sure why qemu fwcfg handling is tied to LPC, and I suspect it
should be decoupled. In this commit I just apply an ifdef hammer, but
we will eventually want fwcfg on arm64 as well.

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40739


# 75d1e855 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move power management code to amd64/

This implements various x86-specific interfaces. No functional change
intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40735


# a7f6c2ff 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move the RTC driver to amd64/

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40734


# 548b1122 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move MSR emulation into amd64/

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40733


# 72f9c9d8 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Split vmexit handling into a separate file

Put it in amd64, since most of it is MD and won't be used on arm64. Add
a bit of glue to bhyverun.h to make CPU startup and shutdown work
without having to export more global variables. AP startup will be
reworked further in a future revision.

This makes bhyverun.c much more machine-independent.

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40556


# a1642451 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move kernemu to amd64/

This code handles instruction emulation for accesses to various
amd64-specific MMIO regions.

No functional change intended.

Reviewed by: corvink, jhb, emaste
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40554


# 4fe5b70c 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move more amd64-specific code under amd64/

mptable and the e820 are both rather amd64-specific and can be moved
easily.

In the case of e820, move the registration with qemu_fwcfg into e820.c,
as it simplifies bhyverun.c a bit and I can't see any downsides.

No functional change intended.

Reviewed by: corvink, jhb, emaste
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40552


# f927afc1 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move some more amd64-specific drivers to their own subdir

No functional change intended.

Reviewed by: corvink, jhb, emaste
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40551


# 4f2bd402 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Start moving machine-dependent code into subdirectories

In preparation for an arm64 port, make an easy change which puts some
machine-dependent code in its own directory.

Going forward, code which is only used on one platform should live in a
MD directory. We should strive to layer modules in such a way as to
avoid polluting shared code with lots of ifdefs. For some existing
files this will take some effort.

task_switch.c and fwctl.c are an easy place to start: the former is very
x86-specific, and the latter provides an I/O port interface which can't
be used on anything other than x86. (fwcfg as implemented has the same
problem, but QEMU also supports a MMIO fwcfg interface.) So I propose
that we start by simply making those files conditional.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40501


# 6a0e7f90 08-Sep-2023 Corvin Köhne <corvink@FreeBSD.org>

bhyve: always generate ACPI tables

Most systems don't work properly without sane ACPI tables. Therefore,
we're always generating them.

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D41778


# 6f7e9779 27-Jul-2022 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add config option to load ACPI tables into memory

For backward compatibility, the ACPI tables are loaded into the guest
memory. Windows scans the memory, finds the ACPI tables and uses them.
It ignores the ACPI tables provided by the UEFI. We are patching the
ACPI tables in the guest memory, so that's mostly fine. However, Windows
will break when the ACPI tables become to large or when we add entries
which can't be patched by bhyve. One example of an unpatchable entry, is
a TPM log. The TPM log has to be allocated by the guest firmware. As the
address of the TPM log is unpredictable, bhyve can't assign it in the
memory version of the ACPI tables. Additionally, this makes it
impossible for bhyve to calculate a correct checksum of the table.

By default ACPI tables are still loaded into guest memory for backward
compatibility. The new acpi_tables_in_memory config value can be set to
false to avoid this behaviour.

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39979


# 67c26eb2 07-Oct-2021 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add cmdline option for TPM emulation

At the moment, only a TPM passthru is supported. The cmdline looks like:

-l tpm,passthru,/dev/tpm0

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D32961


# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# b3e76948 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# af20d58e 05-Jul-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Remove an unneeded vm_get_register() call in main()

At one point the RIP value was passed to fbsdrun_addcpu(), but this is
no longer the case. No functional change intended.

Reviewed by: jhb, corvink
Sponsored by: Innovate UK
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D40988


# 0855749d 17-Jul-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Fix whitespace in bhyverun.c

No functional change intended.

MFC after: 1 week


# 1e8d0c6c 21-Jun-2023 Corvin Köhne <corvink@FreeBSD.org>

Revert "bhyve: add command line parameter and parsing for migration"

Unfortunately, this feature didn't receive much feedback in the past.
However, after committing this, some people came up and complain that
this feature requires some more discussion before upstreaming it.
Additionally, it wasn't a good idea to start this new feature by adding
a new command line parameter as it fixes the user interface.

This reverts commit c9fdd4f3cc18c03683de85318ba8d318f96b58c4.


# eb9fac0e 19-Jun-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Refactor vmexit_suspend() a bit

Move some of its logic into fbsdrun_deletecpu(). This makes it easier
to split vmexit handlers into a separate file, which in turn makes
landing arm64 support easier. Also increase the scope of the mutex and
use it to synchronize updates to the vcpu mask. No functional change
intended.

Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40573


# 15c1f0cc 19-Jun-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Register hlt and pause vmexit handlers unconditionally

These exit handlers might not be used if the corresponding VM
capabilities are not set, but there is no harm in putting them into the
handler table regardless. Doing so simplifies initialization code,
makes it easier to split vmexit handlers into a separate file, and lets
us declare the handler table as const.

Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40572


# c9fdd4f3 19-Jun-2023 Mihai Burcea <mihaiburcea15@gmail.com>

bhyve: add command line parameter and parsing for migration

This covers warm and live migration.

Reviewed by: corvink
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D34717


# b10d65a4 15-May-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: rename 'user_dev' with 'devices'

Bhyve don't use 'user' specifier for emulated devices. And
using 'user' adds duality.

Reviewed by: corvink, rew
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D40106


# e1390215 14-Jun-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Remove special no-op handling for I/O port 0x488

This appears to have been reserved for some kind of debug hook, but it's
not implemented and appears never to have been used.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40555


# bb177010 11-Jun-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vestigial support for setting max vCPUs.

The kernel part of the hypervisor is not going to support per-VM maxcpu
limits. The topology is only used to control the values returned by
CPUID leaves for which max vCPUs is not relevant.

Reviewed by: corvink, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D37176


# 0dc159ce 01-Jun-2023 Elyes Haouas <ehaouas@noos.fr>

bhyve: Fix typos

Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/653


# 80764cdb 23-May-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Remove the exitcode stats structure

This structure isn't used for anything, and only counts a subset of
vmexit types. Moreover, it is not accurate since there is no
synchronization between vcpu threads. Simply remove it.

No functional change intended.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40245


# e17eca32 23-May-2023 Mark Johnston <markj@FreeBSD.org>

vmm: Avoid embedding cpuset_t ioctl ABIs

Commit 0bda8d3e9f7a ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI. This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.

Rework IPI reporting to avoid this problem. Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure. This data is
only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
vmexit info structure, and make VM_RUN copy it out separately. Zero
out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.

PR: 271330
Reviewed by: corvink, jhb (previous versions)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40113


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# ca14781c 08-Sep-2021 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add cmdline option for user defined fw_cfg items

Some guest allow to configure themself by fw_cfg. E.g. Fedora CoreOs can
be provisioned by adding a JSON file as fw_cfg item.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38338


# 16f23f75 09-Sep-2021 Corvin Köhne <corvink@FreeBSD.org>

bhyve: pass E820 table to guest

E820 table will be used to report valid RAM ranges and reserve special
memory areas like graphics memory for GPU passthrough.

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39550


# 0f735657 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx member from struct vm_snapshot_meta.

This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel. For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument. As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38125


# 7d9ef309 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

libvmmapi: Add a struct vcpu and use it in most APIs.

This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi. For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server. The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38124


# ef0ac973 22-Mar-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Sleep briefly in the VMEXIT_DEBUG handler

As of commit 0bda8d3e9f7a ("vmm: permit some IPIs to be handled by
userspace") and commit 9cc9abf409cc ("bhyve: create all vcpus on
startup"), we have a misbehaviour where AP vCPU threads spin until they
receive a SIPI. In particular, since they are "suspended", they simply
call the VMEXIT_DEBUG handler in a loop, but the handler is a no-op by
default.

This is tricky to fix since the gdb stub isn't aware of whether a given
vCPU is supposed to be running. For 13.2's sake, introduce a simple
workaround wherein the VMEXIT_DEBUG handler sleeps for a short period.
This ensures that host CPU usage remains sane when VMs are starting
without penalizing users of VMEXIT_DEBUG too much.

Reviewed by: corvink, jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39174


# d85147f3 18-Aug-2021 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add cmdline option to enable qemu's fwcfg

Let the user decide if he wants to use bhyve's fwctl or qemu's fwcfg. He
can set the interface by adding a fwcfg option to bootrom:

-l bootrom,<path/to/rom>,fwcfg=bhyve
-l bootrom,<path/to/rom>,fwcfg=qemu

Reviewed by: markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38337


# 9a9a2489 06-Mar-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: init checkput before caph_enter

init_checkpoint_thread binds to a socket. Bhyve isn't allowed to do that
after caph_enter.

Reviewed by: corvink, markj
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D38857


# d213429e 06-Mar-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: exit with EX_OSERR if init checkpoint or restore time failed

Reviewed by: corvink, markj
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D38872


# 956171d5 01-Mar-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: remove redundant variable

Reviewed by: corvink,markj
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D38836


# 9ff3e8b7 28-Feb-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: fix resume for vms with guest_ncpus > 1

This error occurs because vm->vcpu[1] has not been allocated yet when
vm_snapshot_vm() is called.

To fix this, move spinup_vcpu() before restore code.

Reviewed by: corvink, markj
MFC after: 2 weeks
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D38477


# 6a284cac 19-Jan-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx argument from PCI device model methods.

Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096


# 7224a96a 21-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Tidy vCPU pthread startup.

Set the thread affinity in fbsdrun_start_thread next to where the
thread name is set. This keeps all the pthread initialization
operations at the start of a thread in one place.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37646


# 84874437 21-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Don't access vcpumap[vcpu] directly in parse_cpuset().

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37645


# a20c00c6 21-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Allocate struct vm_exit on the stack in vm_loop.

The global vmexit[] array is no longer needed to smuggle the rip
value from fbsdrun_addcpu() to vm_loop().

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37644


# ceb0d0b0 21-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove some no-op code for setting RIP.

fbsdrun_addcpu() read the current vCPU's RIP register from the kernel
via vm_get_register() to pass along through some layers to vm_loop()
which then set the register via vm_set_register(). However, this is
just always setting the value back to itself.

Reviewed by: corvink
Differential Revision: https://reviews.freebsd.org/D37643


# 461663dd 21-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Simplify setting vCPU capabilities.

- Enable VM_CAP_IPI_EXIT in fbsdrun_set_capabilities along with other
capabilities enabled on all vCPUs.

- Don't call fbsdrun_set_capabilities a second time on the BSP in
spinup_vcpu.

- To preserve previous behavior, don't unconditionally enable
unrestricted guest mode on the BSP (this unbreaks single-vCPU guests
on Nehalem systems, though supporting such setups is of dubious
value). Other places that enbale UG on the BSP are careful to check
the result of the operation and fail if it is not available.

- Don't set any capabilities in spinup_ap(). These are now all
redundant with earlier settings from spinup_vcpu().

- While here, axe a stale comment from fbsdrun_addcpu(). This
function is now always called from the main thread for all vCPUs.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37642


# 007d9ca5 21-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove handler for VM_EXITCODE_SPINUP_AP.

Since commit 0bda8d3e9f7a, bhyve always enables VM_EXITCODE_IPI exits
instead, so this handler is no longer used.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37640


# ed721684 23-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Address some signed/unsigned comparison warnings

MFC after: 1 week


# e008f5be 25-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Fix a typo in a function name

MFC after: 1 week


# 3b6cb9b4 08-Sep-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Avoid shadowing global variables in bhyverun.c

- Rename the global cores/sockets/threads to cpu_cores/sockets/threads.
This way, num_vcpus_allowed() doesn't shadow them.
- The global maxcpus is unused, remove it for the same reason.

MFC after: 1 week


# fb7ce0a9 24-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Use the new vm_limit_rights() interface

This addresses a compiler warning arising from the fact that bhyve
needs to cast away a const qualifier in order to call free().

No functional change intended.

Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D37099


# f703dc0e 23-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Put the prototype for vmexit_task_switch() in a header

No functional change intended.

MFC after: 1 week


# 4a1c23a7 22-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Address some warnings in bhyverun.c

- Annotate unused parameters as such.
- Avoid shadowing the global "vmexit".

No functional change intended.

MFC after: 1 week


# 0bda8d3e 07-Sep-2022 Corvin Köhne <CorvinK@beckhoff.com>

vmm: permit some IPIs to be handled by userspace

Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and STARTUP IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D35623
Sponsored by: Beckhoff Automation GmbH & Co. KG


# 65b8109b 08-Sep-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Address some warnings in bhyverun.c

- Add const and __unused qualifiers where appropriate.
- Localize some global variables.
- Consistently spell vmexit state as "vme" in vmexit handlers, to avoid
shadowing the global vm_exit state array.
- Similarly, avoid shadowing "optarg".

MFC after: 2 weeks


# 10c6af34 13-Sep-2022 Filipe da Silva Santos <contact@mail.shiori.com.br>

bhyve: Fix build when BHYVE_SNAPSHOT is set

Fixes: 9cc9abf409cc ("bhyve: create all vcpus on startup")
Sponsored by: Beckhoff Automation GmbH & Co. KG
X-MFC-With: 9cc9abf409cc


# 3fc17484 09-Sep-2022 Emmanuel Vadot <manu@FreeBSD.org>

Revert "vmm: permit some IPIs to be handled by userspace"

This reverts commit a5a918b7a906eaa88e0833eac70a15989d535b02.

This cause some problem with vm using bhyveload.

Reported by: pho, kp


# a5a918b7 07-Sep-2022 Corvin Köhne <CorvinK@beckhoff.com>

vmm: permit some IPIs to be handled by userspace

Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and Startup IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35623
Sponsored by: Beckhoff Automation GmbH & Co. KG


# 9cc9abf4 07-Sep-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: create all vcpus on startup

vcpus could be restarted by the guest by sending an INIT SIPI SIPI
sequence to a vcpu. That's not supported by bhyve yet but it will be
supported in a future commit. So, create the vcpu threads only once on
startup to make restarting a vcpu easier.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35621
Sponsored by: Beckhoff Automation GmbH & Co. KG


# e16b709e 16-Jun-2022 James Mintram <me@jamesrm.com>

bhyve: Report an error for invalid UUIDs.

Reviewed by: rgrimes, grehan, jhb
Differential Revision: https://reviews.freebsd.org/D30050


# afd4f7fa 10-Mar-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve/usage: memory size is not in MB

For backward compatibility, the memory size will be interpreted in MB if
it's smaller than1 MB and has no suffix. Nowadays, the -m switch accepts
more than just MB. Respect it in the usage message.

Differential Revision: https://reviews.freebsd.org/D34506
Reviewed by: grehan
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month


# c76e4b89 09-Mar-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Use vm_get_topology to query kernel's maximum vCPU count.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D34493


# fd6f9294 09-Mar-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Don't force an upper bound on vCPUs when parsing pinning.

Even today it is possible to specify pinning for a vCPU higher than
the configured number of CPUs but lower than VM_MAXCPU without raising
an error.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D34492


# 7261f821 09-Mar-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Allocate dynamic arrays to hold per-VCPU state.

This avoids hardcoding VM_MAXCPU in userspace.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D34491


# 730510dc 09-Mar-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Allocate mmio_hint array based on number of guest CPUs.

This avoids an instance of hardcoding VM_MAXCPU in userspace.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D34489


# ad3da829 24-Feb-2022 Andy Fiddaman <andy@omniosce.org>

bhyve: plug memory leak in topology_parse()

Reviewed by: jhb, rew
Differential Revision: https://reviews.freebsd.org/D34301


# 19eaa01b 20-Jan-2022 Michael Reifenberger <mr@FreeBSD.org>

Append Keyboard Layout specified option for using VNC.
Part two: Append bhyve -K option for specified keyboard layout
with layout setting files every languages.
Since the cmd option '-k' was used in the meantime
it was changed to '-K'

PR: 246121
Submitted by: koinec@yahoo.co.jp
Reviewed by: grehan@
Differential Revision: https://reviews.freebsd.org/D29473

MFC after: 4 weeks


# c2fa905c 26-Dec-2021 Toomas Soome <tsoome@FreeBSD.org>

bhyve: clean up trailing whitespaces

Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681


# f656df58 12-Oct-2021 Mateusz Piotrowski <0mp@FreeBSD.org>

bhyve: Update usage and synopsis for the -k flag

Let's make it clear to users that -k is for configuration files.
Also, point to bhyve_config(5) in the paragraph describing the flag.

Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32467


# 2cdff991 19-Aug-2021 Mariusz Zaborski <oshogbo@FreeBSD.org>

byhve: add option to specify IP address for gdb

Allow user to specify the IP address available for gdb debugger.

Reviewed by: jhb, grehan, rgrimes, bcr (man pages)
Differential Revision: https://reviews.freebsd.org/D29607


# fdbc86cf 15-May-2021 Robert Wing <rew@FreeBSD.org>

bhyve/snapshot: split up mutex/cond initialization from socket creation

Move initialization of the mutex/condition variables required by the
save/restore feature to their own function.

The unix domain socket that facilitates communication between bhyvectl
and bhyve doesn't rely on these variables in order to be functional.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D30281


# b6a572d0 18-Apr-2021 Mateusz Piotrowski <0mp@FreeBSD.org>

bhyve: Improve the option description in the usage message

- Sort options as suggested by style(9)
- Capitalize some words like CPU and HLT
- Add a missing description for the -G flag

MFC after: 2 weeks


# 03c3e5e4 18-Apr-2021 Mateusz Piotrowski <0mp@FreeBSD.org>

bhyve: Fix synopsis in the usage message

In particular:
- Sort short options to align with style(9)
- Add two missing flags: -G and -r
- Drop unnecessary angle brackets for consistency
- Rename the "vm" argument to vmname for consistency with the manual
page

MFC after: 2 weeks


# 621b5090 26-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Refactor configuration management in bhyve.

Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.

All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.

After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.

The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.

- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.

Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035


# c4df8cbf 23-Dec-2020 Robert Wing <rew@FreeBSD.org>

Remove bvmconsole and bvmdebug.

Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.

This also removes the '-b' and '-g' flag from bhyve(8). These two flags were
marked deprecated in r368519.

Reviewed by: grehan, kevans
Approved by: kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D27490


# 92f73099 10-Dec-2020 Robert Wing <rew@FreeBSD.org>

Add deprecation notice for bvmconsole and bvmdebug

Now that bhyve(8) supports UART, bvmconsole and bvmdebug are no longer needed.

Mark the '-b' and '-g' flag as deprecated for bhyve(8).

These will be removed in 13.

Reviewed by: jhb, grehan
Approved by: kevans (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27519


# 887d46ef 19-Nov-2020 Peter Grehan <grehan@FreeBSD.org>

Advance RIP after userspace instruction decode

Add update to RIP after a userspace instruction decode (as is done for
the in-kernel counterpart of this case).

Submitted by: adam_fenn.io
Reviewed by: cem, markj
Approved by: grehan (bhyve)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27243


# 0a1016f9 24-Jun-2020 Pawel Biernacki <kaktus@FreeBSD.org>

bhyve: allow for automatic destruction on power-off

Introduce -D flag that allows for the VM to be destroyed on guest initiated
power-off by the bhyve(8) process itself.
This is quality of life change that allows for simpler deployments without
the need for bhyvectl --destroy.

Requested by: swills
Reviewed by: 0mp (manpages), grehan, kib, swills
Approved by: kib (mentor)
MFC after: 2 weeks
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25414


# 4daa95f8 24-Jun-2020 Conrad Meyer <cem@FreeBSD.org>

bhyve(8): For prototyping, reattempt decode in userspace

If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. In this scenario,
reset decoder state and try again.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24464


# 8a68ae80 15-May-2020 Conrad Meyer <cem@FreeBSD.org>

vmm(4), bhyve(8): Expose kernel-emulated special devices to userspace

Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).

Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24525


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# 52c39ee6 14-Apr-2020 Conrad Meyer <cem@FreeBSD.org>

bhyve(8): Minor cosmetic niceties in instemul failure

Print the failed instruction stream as a contiguous stream of hex. This
is closer to something you could throw at a disassembler than 0xHH 0xHH
0xHH.

Also, use the debug.h 'raw' stdio-aware printf helper to avoid the
cascading
line
effect.


# 9cb339cc 14-Apr-2020 Conrad Meyer <cem@FreeBSD.org>

bhyve(8): Add VM Generation Counter ACPI device

Add an implementatation of the 'Virtual Machine Generation ID' spec to
Bhyve. The spec provides a randomly generated GUID (at bhyve start) in
device memory, along with an ACPI device with _CID VM_Gen_Counter and ADDR
evaluating to a Package pointing at that GUID.

A GPE is defined which Notifies the ACPI Device when the generation changes
(such as when a snapshot is rolled back). At this time, Bhyve does not
support snapshotting, so the GPE is never actually raised.

Suggested by: rpokala
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D23165


# bb30b08e 14-Apr-2020 Conrad Meyer <cem@FreeBSD.org>

bhyve(8): Add bootrom allocation abstraction

To allow more general use of the bootrom region, separate initialization from
allocation, and allocation from loading a file.

The bootrom segment is the high 16MB of the low 4GB region.

Each allocation in the segment creates a new mapping with specified protection.
By default, allocation begins at the low end of the range. However, the
BOOTROM_ALLOC_TOP flag is provided to locate a provided bootrom in the high
region it is expected to be in.

The existing ROM-file loading code is refactored to use the new interface.

Reviewed by: grehan (earlier version)
Differential Revision: https://reviews.freebsd.org/D24422


# 332eff95 08-Jan-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add wrapper for debug printf statements

Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.

Reviewed by: grehan, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22657


# cbd03a9d 13-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Support software breakpoints in the debug server on Intel CPUs.

- Allow the userland hypervisor to intercept breakpoint exceptions
(BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to
enable this feature. These exceptions are reported to userland via
a new VM_EXITCODE_BPT that includes the length of the original
breakpoint instruction. If userland wishes to pass the exception
through to the guest, it must be explicitly re-injected via
vm_inject_exception().

- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH
pseudo-register. Injecting a BP# on Intel requires setting this to
the length of the breakpoint instruction. AMD SVM currently ignores
writes to this register (but reports success) and fails to read it.

- Rework the per-vCPU state tracked by the debug server. Rather than
a single 'stepping_vcpu' global, add a structure for each vCPU that
tracks state about that vCPU ('stepping', 'stepped', and
'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is
currently reporting an event. Event handlers for MTRAP and
breakpoint exits loop until the associated event is reported to the
debugger.

Breakpoint events are discarded if the breakpoint is not present
when a vCPU resumes in the breakpoint handler to retry submitting
the breakpoint event.

- Maintain a linked-list of active breakpoints in response to the GDB
'Z0' and 'z0' packets.

Reviewed by: markj (earlier version)
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D20309


# dd583143 12-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Fix a mismerge in r355683 and remove the local gdb_port from main.


# cd333f15 12-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Don't call into the debug server if it isn't configured.

Reviewed by: markj (as part of a larger diff)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D20309


# 4edc7f41 31-Jan-2019 Marcelo Araujo <araujo@FreeBSD.org>

Revert r343634:
Mostly a cosmetic change to replace strlen with strnlen.

Requested by: kib and imp


# 7722142b 31-Jan-2019 Marcelo Araujo <araujo@FreeBSD.org>

Mostly a cosmetic change to replace strlen with strnlen.

Obtained from: Project ACRN
MFC after: 2 weeks


# abfa3c39 15-Jan-2019 Marcelo Araujo <araujo@FreeBSD.org>

Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.

It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.

Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744


# 8d56c805 27-Oct-2018 Yuri Pankov <yuripv@FreeBSD.org>

Provide basic descriptions for VMX exit reason (from "Intel 64 and IA-32
Architectures Software Developer’s Manual Volume 3"). Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).

Reviewed by: jhb, rgrimes, 0mp
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D17531


# 657d2158 22-Aug-2018 Marcelo Araujo <araujo@FreeBSD.org>

Add -s "help" and -l "help" to print a list of supported PCI and LPC devices.

For tools that uses bhyve such like libvirt, it is important to be able to
probe what features are supported by the given bhyve binary.

To give more context, libvirt probes bhyve's capabilities in a not very
effective way:
- Running 'bhyve -h' and parsing output.
- To detect devices, it runs 'bhyve -s 0,dev' for every each device and
parses error output to identify if the device is supported or not.

PR: 2101111
Submitted by: novel
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems Inc.


# dcbebe85 02-Aug-2018 Mariusz Zaborski <oshogbo@FreeBSD.org>

bhyve: set title before entering capability mode

PR: 230082
Submitted by: Yuichiro NAITO <naito.yuichiro@gmail.com>


# 989e062b 10-Jul-2018 Marcelo Araujo <araujo@FreeBSD.org>

Improve bhyve exit(3) error code.

The bhyve(8) exit status indicates how the VM was terminated:

0 rebooted
1 powered off
2 halted
3 triple fault

The problem is when we have wrappers around bhyve that parses the exit
error code and gets an exit(1) for an error but interprets it as "powered off".
So to mitigate this issue and makes it less error prone for third part
applications, I have added a new exit code 4 that is "exited due to an error".

For now the bhyve(8) exit status are:
0 rebooted
1 powered off
2 halted
3 triple fault
4 exited due to an error

Reviewed by: @jhb
MFC after: 2 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D16161


# 7672a014 19-Jun-2018 Mariusz Zaborski <oshogbo@FreeBSD.org>

Convert `cap_enter() < 0 && errno != ENOSYS` to `caph_enter() < 0`.

No functional change intended.


# f7224b70 13-Jun-2018 Marcelo Araujo <araujo@FreeBSD.org>

Fix style(9) space vs tab.

Reviewed by: jhb
MFC after: 3 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15768


# 13ee81be 25-May-2018 Marcelo Araujo <araujo@FreeBSD.org>

We don't need check if str is NULL as free(3) will handle NULL
argument.

Reported by: kib@


# 635a2c89 25-May-2018 Marcelo Araujo <araujo@FreeBSD.org>

After a long discussion about assert(3), we gonna use a HardenedBSD
approach to chek strdup(3) memory allocation.

Submitted by: Shaw Webb <shawn.webb@hardenedbsd.org>
Reported by: brooks
Obtained from: HardenedBSD


# ea089f8c 24-May-2018 Marcelo Araujo <araujo@FreeBSD.org>

Fix a memory leak on topology_parse().

strdup(3) allocates memory for a copy of the string, does the copy and
returns a pointer to it. If there is no sufficient memory NULL is returned
and the global errno is set to ENOMEM.
We do a sanity check to see if it was possible to allocate enough memory.

Also as we allocate memory, we need to free this memory used. Or it will
going out of scope leaks the storage it points to.

Reviewed by: rgrimes
MFC after: 3 weeks.
X-MFC: r332298
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15550


# d96ee3e0 16-May-2018 Rodney W. Grimes <rgrimes@FreeBSD.org>

Add missing newline to end of -c usage string .

Pointy hat: me
Submitted by: novel
Approved by: bde(mentor), grehan (maintainer)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D15421


# cd377eb3 01-May-2018 John Baldwin <jhb@FreeBSD.org>

Initial debug server for bhyve.

This commit adds a new debug server to bhyve. Unlike the existing -g
option which provides an efficient connection to a debug server
running in the guest OS, this debug server permits inspection and
control of the guest from within the hypervisor itself without
requiring any cooperation from the guest. It is similar to the debug
server provided by qemu.

To avoid conflicting with the existing -g option, a new -G option has
been added that accepts a TCP port. An IPv4 socket is bound to this
port and listens for connections from debuggers. In addition, if the
port begins with the character 'w', the hypervisor will pause the
guest at the first instruction until a debugger attaches and
explicitly continues the guest. Note that only a single debugger can
attach to a guest at a time.

Virtual CPUs are exposed to the remote debugger as threads. General
purpose register values can be read for each virtual CPU. Other
registers cannot currently be read, and no register values can be
changed by the debugger.

The remote debugger can read guest memory but not write to guest
memory. To facilitate source-level debugging of the guest, memory
addresses from the debugger are treated as virtual addresses (rather
than physical addresses) and are resolved to a physical address using
the active virtual address translation of the current virtual CPU.
Memory reads should honor memory mapped I/O regions, though the debug
server does not attempt to honor any alignment or size constraints
when accessing MMIO.

The debug server provides limited support for controlling the guest.
The guest is suspended when a debugger is attached and resumes when a
debugger detaches. A debugger can suspend a guest by sending a Ctrl-C
request (e.g. via Ctrl-C in GDB). A debugger can also continue a
suspended guest while remaining attached. Breakpoints are not yet
supported. Single stepping is supported on Intel CPUs that support
MTRAP VM exits, but is not available on other systems.

While the current debug server has limited functionality, it should
at least be usable for basic debugging now. It is also a useful
checkpoint to serve as a base for adding additional features.

Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D15022


# 01d822d3 08-Apr-2018 Rodney W. Grimes <rgrimes@FreeBSD.org>

Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).

The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.

Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.

bhyvectl --get-cpu-topology option added.

Reviewed by: grehan (maintainer, earlier version),
Reviewed by: bcr (manpages)
Approved by: bde (mentor), phk (mentor)
Tested by: Oleg Ginzburg <olevole@olevole.ru> (cbsd)
MFC after: 1 week
Relnotes: Y
Differential Revision: https://reviews.freebsd.org/D9930


# 1de7b4b8 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

various: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 00ef17be 14-Feb-2017 Bartek Rutkowski <robak@FreeBSD.org>

Capsicum support for bhyve(8).

Adds Capsicum sandboxing to bhyve.

Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by: grehan, oshogbo
Approved by: emaste, grehan
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D8290


# 28323add 08-Nov-2016 Bryan Drewery <bdrewery@FreeBSD.org>

Fix improper use of "its".

Sponsored by: Dell EMC Isilon


# 9c9eaf63 05-Jul-2016 Enji Cooper <ngie@FreeBSD.org>

Fix gcc warnings

- Remove -Wunused-but-set-variable (newcpu)
- Always return VMEXIT_CONTINUE as the code always set retval to that value.

Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7119
MFC after: 1 week
Reported by: Jenkins
Reviewed by: grehan (maintainer)
Sponsored by: EMC / Isilon Storage Division


# 6ee52c65 26-Jun-2016 Roman Bogorodskiy <novel@FreeBSD.org>

bhyve: improve memory size documentation

A couple of minor memory size option related nits:

- use common name 'memsize' (instead of 'max-size' or just 'size')
- bhyve: update usage with memsize unit suffix, drop legacy "MB"
unit
- bhyveload: update usage with memsize unit suffix
- bhyve(8): document default size
- bhyveload(8): use memsize formatting like it's done
in bhyve(8)

Reviewed by: wblock, grehan
Approved by: re (kib), wblock, grehan
Differential Revision: https://reviews.freebsd.org/D6952


# cc398e21 31-Dec-2015 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove unused variable after r292981 to unbreak the build.


# 3ec1cff5 31-Dec-2015 Marcelo Araujo <araujo@FreeBSD.org>

Clean up unused-but-set-variable spotted by gcc-4.9.

Reviewed by: grehan
Approved by: rodrigc (mentor)
Differential Revision: https://reviews.freebsd.org/D4734


# 68dd37f7 22-Oct-2015 Enji Cooper <ngie@FreeBSD.org>

Exit with a user-friendly message instead of tripping an assert
if vm_activate_cpu(..) fails when called from fbsdrun_addcpu(..)

MFC after: 1 week
PR: 203884
Reviewed by: grehan
Submitted by: William Orr <will@worrbase.com>


# 88ac6958 02-Oct-2015 Peter Grehan <grehan@FreeBSD.org>

Simple sysctl-like firmware query interface. Similar in operation
to the qemu one, and uses the same i/o ports but with different
messaging. Requires the 'bootrom' option to be enabled.

This is used by UEFI (and potentially other BIOSs/firmware) to
request information from bhyve. Currently, only the number of
vCPUs is made available, with more to follow.

A very large thankyou to Ben Perrault who helped out testing
an earlier version of this, and bhyve/Windows in general.

Reviewed by: tychon
Discussed with: neel
Sponsored by: Nahanni Systems


# 9b1aa8d6 18-Jun-2015 Neel Natu <neel@FreeBSD.org>

Restructure memory allocation in bhyve to support "devmem".

devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.

devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.

Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).

Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks


# 248e6799 28-May-2015 Neel Natu <neel@FreeBSD.org>

Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.

MFC after: 2 weeks


# 77afcadd 16-Apr-2015 Neel Natu <neel@FreeBSD.org>

If the number of guest vcpus is less than '1' then flag it as an error.

MFC after: 1 week


# 3b65fbe4 15-Apr-2015 Tycho Nightingale <tychon@FreeBSD.org>

Prior to aborting due to an ioport error, it is always interesting to
see what the guest's %rip is.

Reviewed by: grehan


# 703e4974 01-Apr-2015 Tycho Nightingale <tychon@FreeBSD.org>

Prior to aborting due to an instruction emulation error, it is always
interesting to see what the guest's %rip and instruction bytes are.

Reviewed by: grehan


# c9747678 23-Feb-2015 Neel Natu <neel@FreeBSD.org>

Add "-u" option to bhyve(8) to indicate that the RTC should maintain UTC time.

The default remains localtime for compatibility with the original device model
in bhyve(8). This is required for OpenBSD guests which assume that the RTC
keeps UTC time.

Reviewed by: grehan
Pointed out by: Jason Tubnor (jason@tubnor.net)
MFC after: 2 weeks


# d087a399 17-Jan-2015 Neel Natu <neel@FreeBSD.org>

Simplify instruction restart logic in bhyve.

Keep track of the next instruction to be executed by the vcpu as 'nextrip'.
As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should
start execution.

Also, instruction restart happens implicitly via 'vm_inject_exception()' or
explicitly via 'vm_restart_instruction()'. The APIs behave identically in
both kernel and userspace contexts. The main beneficiary is the instruction
emulation code that executes in both contexts.

bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length'
as readonly:
- Restarting an instruction is now done by calling 'vm_restart_instruction()'
as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout())
- Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP
as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch())

Differential Revision: https://reviews.freebsd.org/D1526
Reviewed by: grehan
MFC after: 2 weeks


# c3498942 19-Sep-2014 Neel Natu <neel@FreeBSD.org>

Restructure the MSR handling so it is entirely handled by processor-specific
code. There are only a handful of MSRs common between the two so there isn't
too much duplicate functionality.

The VT-x code has the following types of MSRs:

- MSRs that are unconditionally saved/restored on every guest/host context
switch (e.g., MSR_GSBASE).

- MSRs that are restored to guest values on entry to vmx_run() and saved
before returning. This is an optimization for MSRs that are not used in
host kernel context (e.g., MSR_KGSBASE).

- MSRs that are emulated and every access by the guest causes a trap into
the hypervisor (e.g., MSR_IA32_MISC_ENABLE).

Reviewed by: grehan


# bbadcde4 13-Sep-2014 Neel Natu <neel@FreeBSD.org>

Set the 'vmexit->inst_length' field properly depending on the type of the
VM-exit and ultimately on whether nRIP is valid. This allows us to update
the %rip after the emulation is finished so any exceptions triggered during
the emulation will point to the right instruction.

Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability
is available. The effective segment field in EXITINFO1 is not valid without
this capability.

Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the
VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging.

Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2
fields in bhyve(8).

Reviewed by: Anish Gupta (akgupt3@gmail.com)
Reviewed by: grehan


# afd5e8ba 25-Jul-2014 Neel Natu <neel@FreeBSD.org>

Simplify the meaning of return values from the inout handlers. After this
change 0 means success and non-zero means failure.

This also helps to eliminate VMEXIT_POWEROFF and VMEXIT_RESET as return values
from VM-exit handlers.

CR: D480
Reviewed by: grehan, jhb


# d37f2adb 23-Jul-2014 Neel Natu <neel@FreeBSD.org>

Fix fault injection in bhyve.

The faulting instruction needs to be restarted when the exception handler
is done handling the fault. bhyve now does this correctly by setting
'vmexit[vcpu].inst_length' to zero so the %rip is not advanced.

A minor complication is that the fault injection APIs are used by instruction
emulation code that is shared by vmm.ko and bhyve. Thus the argument that
refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to
be loosely typed as a 'void *'.


# d665d229 22-Jul-2014 Neel Natu <neel@FreeBSD.org>

Emulate instructions emitted by OpenBSD/i386 version 5.5:
- CMP REG, r/m
- MOV AX/EAX/RAX, moffset
- MOV moffset, AX/EAX/RAX
- PUSH r/m


# 091d4532 19-Jul-2014 Neel Natu <neel@FreeBSD.org>

Handle nested exceptions in bhyve.

A nested exception condition arises when a second exception is triggered while
delivering the first exception. Most nested exceptions can be handled serially
but some are converted into a double fault. If an exception is generated during
delivery of a double fault then the virtual machine shuts down as a result of
a triple fault.

vm_exit_intinfo() is used to record that a VM-exit happened while an event was
being delivered through the IDT. If an exception is triggered while handling
the VM-exit it will be treated like a nested exception.

vm_entry_intinfo() is used by processor-specific code to get the event to be
injected into the guest on the next VM-entry. This function is responsible for
deciding the disposition of nested exceptions.


# 3d5444c8 16-Jul-2014 Neel Natu <neel@FreeBSD.org>

Add emulation for legacy x86 task switching mechanism.

FreeBSD/i386 uses task switching to handle double fault exceptions and this
change enables that to work.

Reported by: glebius


# 64fe7235 27-Jun-2014 Neel Natu <neel@FreeBSD.org>

Add post-mortem debugging for "EPT Misconfiguration" VM-exit. This error
is hard to reproduce so try to collect all the breadcrumbs when it happens.

Reviewed by: grehan


# cde1f5b8 27-Jun-2014 John Baldwin <jhb@FreeBSD.org>

Sort command flags in usage output and the manpages.


# 5749449d 26-Jun-2014 John Baldwin <jhb@FreeBSD.org>

- Document -b to enable the bvmcons console (but mark it as deprecated
similar to -g.)
- Document -U to set the SMBIOS UUID.
- Add missing options to the usage output and to the manpage Synopsis.
- Don't claim that bvmdebug is amd64-only (it is also a device, not an
option).


# 95ebc360 31-May-2014 Neel Natu <neel@FreeBSD.org>

Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doing
it implicitly in vmm.ko.

Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus
and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and
"--get-suspended-cpus" options.

This is in preparation for being able to reset virtual machine state without
having to destroy and recreate it.


# d17b5104 22-May-2014 Neel Natu <neel@FreeBSD.org>

Add emulation of the "outsb" instruction. NetBSD guests use this to write to
the UART FIFO.

The emulation is constrained in a number of ways: 64-bit only, doesn't check
for all exception conditions, limited to i/o ports emulated in userspace.

Some of these constraints will be relaxed in followup commits.

Requested by: grehan
Reviewed by: tychon (partially and a much earlier version)


# b3e9732a 15-May-2014 John Baldwin <jhb@FreeBSD.org>

Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
8 steerable pins configured via config space access to byte-wide
registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
pin and a PCI interrupt router pin. When a PCI INTx interrupt is
asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
8259A pins (ISA IRQs) and initialize the interrupt line config register
for the corresponding PCI function with the ISA IRQ as this matches
existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
PRT tables and return the appropriate table based on the configured
routing configuration. Note that if the lpc device is not configured, no
routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
the DSDT. iasl(8) will fill in the actual number of elements, and
this makes it simpler to generate a Package with a variable number of
elements.

Reviewed by: tycho


# 0dd10c00 13-May-2014 Neel Natu <neel@FreeBSD.org>

Don't include the guest memory segments in the bhyve(8) process core dump.
This has not added a lot of value when debugging bhyve issues while greatly
increasing the time and space required to store the core file.

Passing the "-C" option to bhyve(8) will change the default and dump guest
memory in the core dump.

Requested by: grehan
Reviewed by: grehan


# ee2dbd02 12-May-2014 Neel Natu <neel@FreeBSD.org>

abort(3) the process in response to a VMEXIT_ABORT. This usually happens in
response to an unhandled VM exit or an unexpected error so a core is useful.

Remove unused macro VMEXIT_SWITCH.

Reviewed by: grehan


# 9b6155a2 05-May-2014 Neel Natu <neel@FreeBSD.org>

Modify the "-p" option to be more flexible when associating a 'vcpu' with
a 'hostcpu'. The new format of the argument string is "vcpu:hostcpu".

This allows pinning a subset of the vcpus if desired.

It also allows pinning a vcpu to more than a single 'hostcpu'.

Submitted by: novel (initial version)


# 06782425 05-May-2014 Neel Natu <neel@FreeBSD.org>

Remove misleading "addcpu" in an error message emitted by fbsdrun_deletecpu().

Pointed out by: novel


# b100acf2 01-May-2014 Neel Natu <neel@FreeBSD.org>

Don't allow MPtable generation if there are multiple PCI hierarchies. This is
because there isn't a standard way to relay this information to the guest OS.

Add a command line option "-Y" to bhyve(8) to inhibit MPtable generation.

If the virtual machine is using PCI devices on buses other than 0 then it can
still use ACPI tables to convey this information to the guest.

Discussed with: grehan@


# e50ce2aa 01-May-2014 Neel Natu <neel@FreeBSD.org>

Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.

When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.

Tested by executing "halt" inside a RHEL7-beta guest.

Discussed with: grehan@
Reviewed by: jhb@, tychon@


# c6a0cc2e 29-Apr-2014 Neel Natu <neel@FreeBSD.org>

Some Linux guests will implement a 'halt' by disabling the APIC and executing
the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and
converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit
the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was
spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET).

This functionality was broken in r263780 in a way that made it impossible
to kill the bhyve(8) process because it would loop forever in
vm_handle_suspend().

Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from
a Linux guest will appear to be hung but this is consistent with the
behavior on bare metal. The guest can be rebooted by using the bhyvectl
options '--force-reset' or '--force-poweroff'.

Reviewed by: grehan@


# f0fdcfe2 28-Apr-2014 Neel Natu <neel@FreeBSD.org>

Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with: grehan@


# d42ea573 25-Apr-2014 Tycho Nightingale <tychon@FreeBSD.org>

Provide a very basic stub for the 8042 PS/2 keyboard controller.

Reviewed by: jhb
Approved by: neel (co-mentor)


# b15a09c0 26-Mar-2014 Neel Natu <neel@FreeBSD.org>

Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called
from any context i.e., it is not required to be called from a vcpu thread. The
ioctl simply sets a state variable 'vm->suspend' to '1' and returns.

The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the
vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The
suspend handler waits until all 'vm->active_cpus' have transitioned to
'vm->suspended_cpus' before returning to userspace.

Discussed with: grehan


# 0826d045 20-Mar-2014 Neel Natu <neel@FreeBSD.org>

Use 'cpuset_t' to represent the vcpus active in a virtual machine.


# af5bfc53 04-Mar-2014 Tycho Nightingale <tychon@FreeBSD.org>

Add SMBIOS support.

A new option, -U, can be used to set the UUID in the System
Information (Type 1) structure. Manpage fix to follow.

Approved by: grehan (co-mentor)


# dc506506 25-Feb-2014 Neel Natu <neel@FreeBSD.org>

Queue pending exceptions in the 'struct vcpu' instead of directly updating the
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.

The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt

In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.

Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.

Discussed with: grehan, jhb
Reviewed by: jhb


# 52e5c8a2 19-Feb-2014 Neel Natu <neel@FreeBSD.org>

Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with: grehan@


# 3cbf3585 29-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Enhance the support for PCI legacy INTx interrupts and enable them in
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
that share a slot attempt to distribute their INTx interrupts across
the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
and when the INTx DIS bit is set in a function's PCI command register.
Either assert or deassert the associated I/O APIC pin when the
state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.

Submitted by: neel (7)
Reviewed by: neel
MFC after: 2 weeks


# d2bc4816 27-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Remove support for legacy PCI devices. These haven't been needed since
support for LPC uart devices was added and it conflicts with upcoming
patches to add PCI INTx support.

Reviewed by: neel


# 0492757c 01-Jan-2014 Neel Natu <neel@FreeBSD.org>

Restructure the VMX code to enter and exit the guest. In large part this change
hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used
to enter guest context and vmx_exit_guest() is used to transition back into
host context.

Fix a longstanding race where a vcpu interrupt notification might be ignored
if it happens after vmx_inject_interrupts() but before host interrupts are
disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with
host interrupts disabled to prevent this.

Suggested by: grehan@


# 6450da07 24-Dec-2013 John Baldwin <jhb@FreeBSD.org>

Support soft power-off via the ACPI S5 state for bhyve guests.
- Implement the PM1_EVT and PM1_CTL registers required by ACPI.
The PM1_EVT register is mostly a dummy as bhyve doesn't support any
of the hardware-initiated events. The only bit of PM1_CNT that is
implemented are the sleep request bits (SPL_EN and SLP_TYP) which
request a graceful power off for S5. In particular, for S5, bhyve
exits with a non-zero value which terminates the loop in vmrun.sh.
- Emulate the Reset Control register at I/O port 0xcf9 and advertise
it as the reset register via ACPI.
- Advertise an _S5 package.
- Extend the in/out interface to allow an in/out handler to request
that the hypervisor trigger a reset or power-off.
- While here, note that all vCPUs in a guest support C1 ("hlt").

Reviewed by: neel (earlier version)


# f80330a8 22-Dec-2013 Neel Natu <neel@FreeBSD.org>

Add a parameter to 'vcpu_set_state()' to enforce that the vcpu is in the IDLE
state before the requested state transition. This guarantees that there is
exactly one ioctl() operating on a vcpu at any point in time and prevents
unintended state transitions.

More details available here:
http://lists.freebsd.org/pipermail/freebsd-virtualization/2013-December/001825.html

Reviewed by: grehan
Reported by: Markiyan Kushnir (markiyan.kushnir at gmail.com)
MFC after: 3 days


# 851d84f1 19-Dec-2013 Neel Natu <neel@FreeBSD.org>

Add an option to ignore accesses by the guest to unimplemented MSRs.

Also, ignore a couple of SandyBridge uncore PMC MSRs that Centos 6.4 writes
to during boot.

Reviewed by: grehan


# 1c052192 07-Dec-2013 Neel Natu <neel@FreeBSD.org>

If a vcpu disables its local apic and then executes a 'HLT' then spin down the
vcpu and destroy its thread context. Also modify the 'HLT' processing to ignore
pending interrupts in the IRR if interrupts have been disabled by the guest.
The interrupt cannot be injected into the guest in any case so resuming it
is futile.

With this change "halt" from a Linux guest works correctly.

Reviewed by: grehan@
Tested by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# 565bbb86 12-Nov-2013 Neel Natu <neel@FreeBSD.org>

Move the ioapic device model from userspace into vmm.ko. This is needed for
upcoming in-kernel device emulations like the HPET.

The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
manipulate the ioapic pin state.

Discussed with: grehan@
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# 7f5487ac 05-Nov-2013 Peter Grehan <grehan@FreeBSD.org>

Add the VM name to the process name with setproctitle().
Remove the VM name from some of the thread-naming calls
since it is now in the proc title.
Slightly modify the thread-naming for the net and block
threads.

This improves readability when using top/ps with the -a
and -H options on a system with a large number of bhyve VMs.

Requested by: Michael Dexter
Reviewed by: neel
MFC after: 4 weeks


# a1a4cbea 30-Oct-2013 Neel Natu <neel@FreeBSD.org>

Make the virtual ioapic available unconditionally in a bhyve virtual machine.

This is in preparation for moving the ioapic device model from userspace to
vmm.ko.

Reviewed by: grehan


# ea7f1c8c 28-Oct-2013 Neel Natu <neel@FreeBSD.org>

Add support for PCI-to-ISA LPC bridge emulation. If the LPC bus is attached
to a virtual machine then we implicitly create COM1 and COM2 ISA devices.

Prior to this change the only way of attaching a COM port to the virtual
machine was by presenting it as a PCI device that is mapped at the legacy
I/O address 0x3F8 or 0x2F8.

There were some issues with the original approach:
- It did not work at all with UEFI because UEFI will reprogram the PCI device
BARs and remap the COM1/COM2 ports at non-legacy addresses.
- OpenBSD GENERIC kernel does not create a /dev/console because it expects
the uart device at the legacy 0x3F8/0x2F8 address to be an ISA device.
- It was functional with a FreeBSD guest but caused the console to appear
on /dev/ttyu2 which was not intuitive.

The uart emulation is now independent of the bus on which it resides. Thus it
is possible to have uart devices on the PCI bus in addition to the legacy
COM1/COM2 devices behind the LPC bus.

The command line option to attach ISA COM1/COM2 ports to a virtual machine is
"-s <bus>,lpc -l com1,stdio".

The command line option to create a PCI-attached uart device is:
"-s <bus>,uart[,stdio]"

The command line option to create PCI-attached COM1/COM2 device is:
"-S <bus>,uart[,stdio]". This style of creating COM ports is deprecated.

Discussed with: grehan
Reviewed by: grehan
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)

M share/examples/bhyve/vmrun.sh
AM usr.sbin/bhyve/legacy_irq.c
AM usr.sbin/bhyve/legacy_irq.h
M usr.sbin/bhyve/Makefile
AM usr.sbin/bhyve/uart_emul.c
M usr.sbin/bhyve/bhyverun.c
AM usr.sbin/bhyve/uart_emul.h
M usr.sbin/bhyve/pci_uart.c
M usr.sbin/bhyve/pci_emul.c
M usr.sbin/bhyve/inout.c
M usr.sbin/bhyve/pci_emul.h
M usr.sbin/bhyve/inout.h
AM usr.sbin/bhyve/pci_lpc.c
AM usr.sbin/bhyve/pci_lpc.h


# b5331f4d 23-Oct-2013 Neel Natu <neel@FreeBSD.org>

Tidy usage messages for bhyve and bhyveload.

Submitted by: jhb


# 062b878f 17-Oct-2013 Peter Grehan <grehan@FreeBSD.org>

Changes required for OpenBSD/amd64:

- Allow a hostbridge to be created with AMD as a vendor.
This passes the OpenBSD check to allow the use of MSI
on a PCI bus.
- Enable the i/o interrupt section of the mptable, and
populate it with unity ISA mappings. This allows the
'legacy' IRQ mappings of the PCI serial port to be
set up. Delete unused print routine that was obscuring code.
- Use the '-W' option to enable virtio single-vector MSI
rather than an environment variable. Update the virtio
net/block drivers to query this flag when setting up
interrupts.: bhyverun.c
- Fix the arithmetic used to derive the century byte in
RTC CMOS, as well as encoding it in BCD.

Reviewed by: neel
MFC after: 3 days


# 49cc03da 16-Oct-2013 Neel Natu <neel@FreeBSD.org>

Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Reviewed by: grehan
MFC after: 3 days


# 200758f1 08-Oct-2013 Neel Natu <neel@FreeBSD.org>

Parse the memory size parameter using expand_number() to allow specifying
the memory size more intuitively (e.g. 512M, 4G etc).

Submitted by: rodrigc
Reviewed by: grehan
Approved by: re (blanket)


# e70cb911 08-Oct-2013 Dimitry Andric <dim@FreeBSD.org>

After r256062, the static function fbsdrun_get_next_cpu() in
usr.sbin/bhyve/bhyverun.c is no longer used, so remove it to silence a
gcc warning.

Approved by: re (glebius)


# 4a06a0fe 08-Oct-2013 Neel Natu <neel@FreeBSD.org>

Change the behavior of bhyve such that the gdb listening port is opt-in
rather than opt-out.

Prior to this change if the "-g" option was not specified then a listening
socket for tunneling gdb packets would be opened at port 6466. If a second
virtual machine is fired up, also without the "-g" option, then that would
fail because there is already a listener on port 6466.

After this change if a gdb tunnel port needs to be created it needs to be
explicitly specified with a "-g <portnum>" command line option.

Reviewed by: grehan@
Approved by: re@ (blanket)


# 318224bb 05-Oct-2013 Neel Natu <neel@FreeBSD.org>

Merge projects/bhyve_npt_pmap into head.

Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
Bit Position Interpreted By
PG_V 52 software (accessed bit emulation handler)
PG_RW 53 software (dirty bit emulation handler)
PG_A 0 hardware (aka EPT_PG_RD)
PG_M 1 hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by: re
Discussed with: grehan
Reviewed by: alc, kib
Tested by: pho


# 94c3b3bf 04-Oct-2013 Peter Grehan <grehan@FreeBSD.org>

Remove obsolete cmd-line options and code associated with
these.
The mux-vcpus option may return at some point, given it's utility
in finding bhyve (and FreeBSD) bugs.

Approved by: re@ (blanket)
Discussed with: neel@


# 8b271170 18-Jul-2013 Peter Grehan <grehan@FreeBSD.org>

Sanity-check the vm exitcode, and exit the process if it's out-of-bounds
or there is no registered handler.

Submitted by: Bela Lubkin bela dot lubkin at tidalscale dot com


# 9d6be09f 10-Jul-2013 Peter Grehan <grehan@FreeBSD.org>

Implement RTC CMOS nvram. Init some fields that are used
by FreeBSD and UEFI.
Tested with nvram(4).

Reviewed by: neel


# a38e2a64 03-Jul-2013 Peter Grehan <grehan@FreeBSD.org>

Support an optional "mac=" parameter to virtio-net config, to allow
users to set the MAC address for a device.

Clean up some obsolete code in pci_virtio_net.c

Allow an error return from a PCI device emulation's init routine
to be propagated all the way back to the top-level and result in
the process exiting.

Submitted by: Dinakar Medavaram dinnu sun at gmail (original version)


# b05c77ff 25-Apr-2013 Neel Natu <neel@FreeBSD.org>

Gripe if some <slot,function> tuple is specified more than once instead of
silently overwriting the previous assignment.

Gripe if the emulation is not recognized instead of silently ignoring the
emulated device.

If an error is detected by pci_parse_slot() then exit from the command line
parsing loop in main().

Submitted by (initial version): Chris Torek (chris.torek@gmail.com)


# 0e2ca4e6 10-Apr-2013 Neel Natu <neel@FreeBSD.org>

Need to call init_mem() to really initialize the MMIO range lookups.

This was working by accident because:
- the RB_HEADs were being initialized to zero as part of BSS
- the pthread_rwlock functions were implicitly initializing the lock object

Obtained from: NetApp


# b060ba50 18-Mar-2013 Neel Natu <neel@FreeBSD.org>

Simplify the assignment of memory to virtual machines by requiring a single
command line option "-m <memsize in MB>" to specify the memory size.

Prior to this change the user needed to explicitly specify the amount of
memory allocated below 4G (-m <lowmem>) and the amount above 4G (-M <highmem>).

The "-M" option is no longer supported by 'bhyveload' and 'bhyve'.

The start of the PCI hole is fixed at 3GB and cannot be directly changed
using command line options. However it is still possible to change this in
special circumstances via the 'vm_set_lowmem_limit()' API provided by
libvmmapi.

Submitted by: Dinakar Medavaram (initial version)
Reviewed by: grehan
Obtained from: NetApp


# 91039bb2 28-Feb-2013 Neel Natu <neel@FreeBSD.org>

Specify the length of the mapping requested from 'paddr_guest2host()'.

This seems prudent to do in its own right but it also opens up the possibility
of not having to mmap the entire guest address space in the 'bhyve' process
context.

Discussed with: grehan
Obtained from: NetApp


# 485b3300 11-Feb-2013 Neel Natu <neel@FreeBSD.org>

Implement guest vcpu pinning using 'pthread_setaffinity_np(3)'.

Prior to this change pinning was implemented via an ioctl (VM_SET_PINNING)
that called 'sched_bind()' on behalf of the user thread.

The ULE implementation of 'sched_bind()' bumps up 'td_pinned' which in turn
runs afoul of the assertion '(td_pinned == 0)' in userret().

Using the cpuset affinity to implement pinning of the vcpu threads works with
both 4BSD and ULE schedulers and has the happy side-effect of getting rid
of a bunch of code in vmm.ko.

Discussed with: grehan


# 24be8623 19-Jan-2013 Neel Natu <neel@FreeBSD.org>

Use <vmname> in a consistent manner in usage messages output by 'bhyve',
'bhyveload' and 'bhyvectl'.

Pointed out by: joel@


# 5f0677d3 03-Jan-2013 Neel Natu <neel@FreeBSD.org>

The "unrestricted guest" capability is a feature of Intel VT-x that allows
the guest to execute real or unpaged protected mode code - bhyve relies on
this feature to execute the AP bootstrap code.

Get rid of the hack that allowed bhyve to support SMP guests on processors
that do not have the "unrestricted guest" capability. This hack was entirely
FreeBSD-specific and would not work with any other guest OS.

Instead, limit the number of vcpus to 1 when executing on processors without
"unrestricted guest" capability.

Suggested by: grehan
Obtained from: NetApp


# e285ef8d 12-Dec-2012 Peter Grehan <grehan@FreeBSD.org>

Rename fbsdrun.* -> bhyverun.*

bhyve is intended to be a generic hypervisor, and not FreeBSD-specific.

(renaming internal routines will come later)

Reviewed by: neel
Obtained from: NetApp