History log of /freebsd-current/usr.sbin/bhyve/pci_emul.c
Revision Date Author Comments
# ec8a394d 11-Apr-2024 Elyes Haouas <ehaouas@noos.fr>

usr.sbin: Remove repeated words

Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/887


# 95b948c1 19-Feb-2024 Jessica Clarke <jrtc27@jrtc27.com>

bhyve: Fix arm64 PCI I/O range to match FDT

This is supposed to combine with the memory range to make one contiguous
block, as is laid out in the FDT, so make this match what the OS is told
and thus actually configures.

Also drop the confusing leading zero from all three of these constants
that is making these 9 rather than 8 hex digits long (as one would
expect for a 32-bit address).

Reviewed by: jhb
MFC after: 2 weeks
Obtained from: CheriBSD


# 0efad4ac 16-Feb-2024 Jessica Clarke <jrtc27@jrtc27.com>

bhyve: Support legacy PCI interrupts on arm64

This allows us to remove various #ifdef hacks and enable building more
PCI devices.

Note that a hole is left in the interrupt mapping for the RTC rather
than having the two core devices straddle the PCIe interrupts. QEMU's
virt machine also takes this approach.

Reviewed by: jhb
MFC after: 2 weeks
Obtained from: CheriBSD


# dc6a00f2 03-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Use vm_raise_msi() instead of vm_lapic_msi()

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D41740


# f286f746 03-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Add PCI mappings for arm64

- The extended config space and BAR ranges are listed in the FDT.
- Avoid referencing I/O ports in ACPI tables. Currently the arm64 port
does not support ACPI in any case.

Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D41739


# fc98569f 03-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Do not compile PCI passthrough support on arm64

Some required kernel functionality is not yet implemented.

For now this means that one cannot specify host PCI register values, but
that functionality is only used by amd64-specific device models for now.
Note that this limitation is rather artificial; it arises only because
pci_host_read_config() lives in pci_passthru.c.

Reviewed by: corvink, andrew, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D41738


# e497fe86 02-Apr-2024 Mark Johnston <markj@FreeBSD.org>

bhyve: Use vm_get_highmem_base() instead of hard-coding the value

This reduces the coupling between libvmmapi (which creates the highmem
segment) and bhyve, in preparation for the arm64 port.

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40992


# 4d65a7c6 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

usr.sbin: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 31cf78c9 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Make most I/O port handling specific to amd64

- The qemu_fwcfg interface, as implemented, is I/O port-based, but QEMU
implements an MMIO interface that we'll eventually want to port for
arm64.
- Retain support for I/O space PCI BARs, simply treat them like MMIO
BARs for most purposes, similar to what the arm64 kernel does. Such
BARs are created by virtio devices.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40741


# 55c13f6e 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Move legacy PCI interrupt handling under amd64/

Specifically, move IO-APIC, LPC and PIRQ routing code under amd64/.

Use ifdefs to conditionally compile related code in other files. In
particular, legacy PCI interrupt handling is now compiled only on amd64.
This is not too invasive, but suggestions for a more modular approach
would be appreciated.

I am not sure why qemu fwcfg handling is tied to LPC, and I suspect it
should be decoupled. In this commit I just apply an ifdef hammer, but
we will eventually want fwcfg on arm64 as well.

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40739


# 01d53c34 03-Oct-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Improve pcifd function naming

read_config() and write_config() are externally visible, so give them
more descriptive names. No functional change intended.

MFC after: 1 week
Sponsored by: Innovate UK


# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# b3e76948 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 0dea4f06 11-Jul-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Deduplicate some code in modify_bar_registration()

No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40877


# f4841d8a 28-Jun-2023 Mark Johnston <markj@FreeBSD.org>

bhyve: Rename a pci_cfgrw() parameter

pci_cfgrw() may be called via a write to the extended config space,
which is memory-mapped. In this case, the name "eax" is misleading.
Give it a more generic name. No functional change intended.

Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D40732


# 6632a0a4 16-Aug-2021 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add helper to create a bootorder

Qemu's fwcfg allows to define a bootorder. Therefore, the hypervisor has
to create a fwcfg item named bootorder, which has a newline seperated
list of boot entries. Qemu's OVMF will pick up the bootorder and applies
it.

Add the moment, bhyve's OVMF doesn't support a custom bootorder by
qemu's fwcfg. However, in the future bhyve will gain support for qemu's
OVMF. Additonally, we can port relevant parts from qemu's to bhyve's
OVMF implementation.

Reviewed by: jhb, markj
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39284


# 381ef27d 15-May-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: use pci_next() to save/restore pci devices

Current snapshot implementation doesn't support multiple devices with
similar type. For example, two virtio-blk or two CD-ROM-s, etc.

So the following configuration cannot be restored.

bhyve \
-s 3,virtio-blk,disk.img \
-s 4,virtio-blk,disk2.img

In some cases it is restored silently, but doesn't work. In some cases
it fails during restore stage.

This commit fixes that issue.

Reviewed by: corvink, rew
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D40109


# 14c80457 15-May-2023 Vitaliy Gusev <gusev.vitaliy@gmail.com>

bhyve: add bus, slot and func to device name

Each device needs a unique identifier to store and restore snapshots
properly. Adding the pci bsf information to the device name creates a
unique identifier as a bsf can't be occupied twice.

Reviewed by: corvink
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D40107


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# ffaed739 06-Feb-2023 Corvin Köhne <corvink@FreeBSD.org>

bhyve: add helper to read PCI IDs from bhyve config

Changing the PCI IDs is valuable in some situations. The Intel GOP
driver requires that some PCI IDs of the LPC bridge are aligned with the
physical values of the host LPC bridge. Another use case are oracles
virtio driver. They require different subvendor ID than the default one.
For that reason, create a helper which makes it easy to read PCI IDs
from bhyve config. Additionally, this helper ensures that all emulation
devices are using the same config keys.

Reviewed by: jhb
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38402


# 7d9ef309 24-Mar-2023 John Baldwin <jhb@FreeBSD.org>

libvmmapi: Add a struct vcpu and use it in most APIs.

This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi. For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server. The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38124


# 6a284cac 19-Jan-2023 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove vmctx argument from PCI device model methods.

Most of these arguments were unused. Device models which do need
access to the vmctx in one of these methods can obtain it from the
pi_vmctx member of the pci_devinst argument instead.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D38096


# 08b05de1 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove the unused vcpu argument from all of the I/O port handlers.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37653


# 78c2cd83 09-Dec-2022 John Baldwin <jhb@FreeBSD.org>

bhyve: Remove unused vcpu argument from PCI read/write methods.

Reviewed by: corvink, markj
Differential Revision: https://reviews.freebsd.org/D37652


# ed721684 23-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Address some signed/unsigned comparison warnings

MFC after: 1 week


# c9faf698 22-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Fix some warnings in the snapshot code

- Qualify unexported symbols with "static".
- Drop some unnecessary and incorrect casts.
- Avoid arithmetic on void pointers.
- Avoid signed/unsigned comparisons in loops which use nitems() as a
bound.

No functional change intended.

MFC after: 1 week


# 07d82562 08-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Make pci_bars local to pci_emul.c

MFC after: 1 week


# 98d920d9 08-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Annotate unused function parameters

MFC after: 1 week


# 37045dfa 16-Aug-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Mark variables and functions as static where appropriate

Mark them const as well when it makes sense to do so. No functional
change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 75ce327a 16-Aug-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Use "void" instead of empty parameter lists

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 8ac8adda 01-Apr-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: avoid uninitialized variable

Reviewed by: markj
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reported-by: Andy Fiddaman <andy@omniosce.org>
Differential Revision: https://reviews.freebsd.org/D34688


# 45ddbf21 01-Apr-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: avoid overflow of BAR index

At the moment, writes to BAR registers that aren't 4 byte aligned are
ignored. So, there's no overflow yet. Nevertheless, if this behaviour
changes in the future, it could unintentionally, introduce a buffer
overflow. Additionally, some compiler or tools will detect this
potential overflow and complain about it.

Reviewed by: markj
Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com>
Reported-by: Andy Fiddaman <andy@omniosce.org>
Differential Revision: https://reviews.freebsd.org/D34689


# e47fe318 10-Mar-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: add ROM emulation

Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).

It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.

Differential Revision: https://reviews.freebsd.org/D33129
Sponsored by: Beckhoff Automation GmbH & Co. KG
MFC after: 1 month


# 7d55d295 03-Jan-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: add more slop to 64 bit BARs

Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables
based on this allocation. If the guest decides to relocate those BARs
above 4 GB, it could lead to mismatching ACPI tables. Especially
when using OVMF with enabled bus enumeration it could cause
issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS
may be unable to recover from this situation and disables some PCI
devices because their BARs are located outside of the MMIO space
reported by ACPI. Avoid this situation by giving the guest more
space for relocating BARs.

Let's be paranoid. The available space for BARs below 4 GB is 512 MB
large. Use a slop of 512 MB. It'll allow the guest to relocate all
BARs below 4 GB to an address above 4 GB. We could run into issues
when we exceeding the memlimit above 4 GB. However, this space has
a size of 32 GB. Even when using many PCI device with large BARs
like framebuffer or when using multiple PCI busses, it's very
unlikely that we run out of space due to the large slop.
Additionally, this situation will occur on startup and not at runtime
which is much better.

Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D33118


# 01f9362e 03-Jan-2022 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: enumerate BARs by size

E.g. Framebuffers can require large space and BARs need to be aligned
by their size. If BARs aren't allocated by size, it'll cause much
fragmentation of the MMIO space. Reduce fragmentation by ordering
the BAR allocation on their size to reduce the risk of
OUT_OF_MMIO_SPACE issues.

Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D28278


# c2fa905c 26-Dec-2021 Toomas Soome <tsoome@FreeBSD.org>

bhyve: clean up trailing whitespaces

Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681


# fc7207c8 22-Nov-2021 Emmanuel Vadot <manu@FreeBSD.org>

bhyve: Fix compile

We need err.h

Fixes: 5cf21e48ccf11 ("bhyve: use a fixed 32 bit BAR base address")
Sponsored by: Bechoff Automation GmbH & Co. KG


# 5cf21e48 22-Nov-2021 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: use a fixed 32 bit BAR base address

OVMF always uses 0xC0000000 as base address for 32 bit PCI MMIO space.
For that reason, we should use that address too.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D31051
Sponsored by: Beckhoff Automation GmbH & Co. KG


# 4a4053e1 22-Nov-2021 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: move 64 bit BAR location to match OVMF assumptions

OVMF will fail, if large 64 bit BARs are used. GCD-Map doesn't cover
64 bit addresses of BARs.
OVMF assumes that 64 bit addresses of BARS are located on next 32 GB
boundary behind Top of High RAM.

This patch moves 64 bit BARs on next 32 GB boundary behind Top of High
RAM to match OVMF assumptions.

Differential Revision: https://reviews.freebsd.org/D27970
Sponsored by: Beckhoff Automation GmbH & Co. KG


# e87a6f3e 18-Nov-2021 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: use physical lobits for BARs of passthru devices

Tell the guest whether a BAR uses prefetched memory or not for
passthru devices by using the same lobits as the physical device.

Reviewed by: grehan
Sponsored by: Beckhoff Autmation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D32685


# 77bc75c7 16-Oct-2021 Mark Johnston <markj@FreeBSD.org>

bhyve: Fix the WITH_BHYVE_SNAPSHOT build

Note, this breaks compatibility with snapshots generated by older builds
of bhyve(8).

Fixes: 7fa233534736 ("bhyve: Map the MSI-X table unconditionally for passthrough")
Reported by: Greg V <greg@unrelenting.technology>
Reviewed by: grehan, bz
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32523


# 1b0e2f0b 15-Oct-2021 Corvin Köhne <CorvinK@beckhoff.com>

bhyve: ignore low bits of CFGADR

Bhyve could emulate wrong PCI registers.
In the best case, the guest reads wrong registers and the device driver would
report some errors.
In the worst case, the guest writes to wrong PCI registers and could brick
hardware when using PCI passthrough.

According to Intels specification, low bits of CFGADR should be
ignored. Some OS like linux may rely on it. Otherwise, bhyve could
emulate a wrong PCI register.

E.g.
If linux would like to read 2 bytes from offset 0x02, following would
happen.
linux:
outl 0x80000002 at CFGADR
inw at CFGDAT + 2
bhyve:
cfgoff = 0x80000002 & 0xFF = 0x02
coff = cfgoff + (port - CFGDAT) = 0x02 + 0x02 = 0x04
Bhyve would emulate the register at offset 0x04 not 0x02.

Reviewed By: #bhyve, grehan
Differential Revision: https://reviews.freebsd.org/D31819
Sponsored by: Beckhoff Automation GmbH & Co. KG


# f8a6ec2d 18-Mar-2021 D Scott Phillips <scottph@FreeBSD.org>

bhyve: support relocating fbuf and passthru data BARs

We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.

We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.

[1]: https://github.com/freebsd/uefi-edk2/pull/9/

Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066


# 621b5090 26-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Refactor configuration management in bhyve.

Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.

All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.

After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.

The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.

- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.

Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035


# 038f5c7b 11-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

bhyve: remove a hack to map all 8G BARs 1:1

Suggested and reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27186


# 670b364b 11-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

bhyve: increase allowed size for 64bit BAR allocation below 4G from 32 to 128 MB.

Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27095


# 9922872b 11-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

bhyve: avoid allocating BARs above the end of supported physical addresses.

Read CPUID leaf 0x8000008 to determine max supported phys address and
create BAR region right below it, reserving 1/4 of the supported guest
physical address space to the 64bit BARs mappings.

PR: 250802 (although the issue from PR is not fixed by the change)
Noted and reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27095


# 21368498 25-May-2020 Peter Grehan <grehan@FreeBSD.org>

Fix pci-passthru MSI issues with OpenBSD guests

- Return 2 x 16-bit registers in the correct byte order
for a 4-byte read that spans the CMD/STATUS register.
This reversal was hiding the capabilities-list, which prevented
the MSI capability from being found for XHCI passthru.

- Reorganize MSI/MSI-x config writes so that a 4-byte write at the
capability offset would have the read-only portion skipped.
This prevented MSI interrupts from being enabled.

Reported and extensively tested by Anatoli (me at anatoli dot ws)

PR: 245392
Reported by: Anatoli (me at anatoli dot ws)
Reviewed by: jhb (bhyve)
Approved by: jhb, bz (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24951


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# 7840d1c4 27-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Update the cached MSI state when any MSI capability register is written.

bhyve uses cached copies of the MSI capability registers to generate
MSI interrupts for device models. Previously, these cached fields
were only set when the MSI capability control register was updated.
The Linux kernel recently adopted a change to deal with races in MSI
interrupt delivery that writes to the MSI capability address and data
registers to alter the destination of MSI interrupts without writing
to the MSI capability control register. bhyve was not updating its
cached registers for these writes and continued to send interrupts
with the old data value to the old address. Fix this by recomputing
the cached values for every write to any MSI capability register.

Reported by: Jason Tubnor, Ryan Moeller
Reported by: Marc Dionne (bisected the Linux kernel commit)
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24593


# 332eff95 08-Jan-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add wrapper for debug printf statements

Add printf() wrapper to use CR/CRLF terminators depending on whether
stdio is mapped to a tty open in raw mode.
Try to use the wrapper everywhere.
For now we leave the custom DPRINTF/WPRINTF defined by device
models, but we may remove them in the future.

Reviewed by: grehan, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22657


# 412d13d5 24-Oct-2019 Jung-uk Kim <jkim@FreeBSD.org>

Catch up with ACPICA 20191018.

PR: 241467
XMFC with: r353764


# 0026d8cc 12-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Remove a spurious break when setting up a 64-bit memory BAR.

This was causing 'enbit' to not be initialized in this case.

CID: 1401924
Reported by: Coverity
MFC after: 1 week


# 129f93c5 07-Jun-2019 Chuck Tuffli <chuck@FreeBSD.org>

bhyve: Add PCIe Integrated Endpoint capability

The NVMe CAM driver reports the PCIe Link Capability and Status for
devices. For emulated bhyve NVMe devices, this looks like:

nda0: nvme version 1.3 x63 (max x63) lanes PCIe Gen15 (max Gen15) link

The driver outputs this because the emulated device doesn't include the
PCIe Capability structure. The NVMe specification requires these
registers, so the fix is to add this set of capability registers to the
emulated device.

Note that PCI Express devices that are integrated into the Root Complex
(i.e. Bus 0x0) do not have to support the Link Capability or Status
registers. Windows will fail to start (i.e. Code 10) devices that appear
to be part of the Root Complex but report being a PCI Express Endpoint.
So also add a check to pci_emul_add_pciecap() to check if the device is
integrated and change the device type.

Reviewed by: imp, ken, araujo, jhb, rgrimes
Approved by: imp (mentor), ken (mentor), jhb (maintainer)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19904


# 56282675 07-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Keep the shadow PCIR_COMMAND synced with the real one for pass through.

This ensures that bhyve properly recognizes when decoding is disabled
for BARs on passthru devices. To properly handle writes to the
register, export a pci_emul_cmd_changed function from pci_emul.c that
the pass through device model invokes for config writes that change
PCIR_COMMAND.

Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20531


# 2729c9bb 07-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Enable memory and I/O decoding in PCI devices on demand.

Rather than uncoditionally setting the MEMEN and PORTEN bits in
PCIR_COMMAND for PCI devices, set the respective bit when the first
BAR of a given type is added to the device. This more closely matches
what firmware does on bare metal.

BUSMASTEREN is still set unconditionally. Eventually this bit should
move into the device models as not all device models need this set.

Reviewed by: rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20530


# df61066e 31-May-2019 John Baldwin <jhb@FreeBSD.org>

Whitespace cleanups, no functional change.


# f0dfbccc 13-Apr-2019 Chuck Tuffli <chuck@FreeBSD.org>

Revert r345171 pending review

Backing out commit pending further discussion on the PCIe version
supported by pseudo (i.e. emulated) devices. See Differential for
details.

Reviewed by: imp
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D19580


# 2ba64075 14-Mar-2019 Chuck Tuffli <chuck@FreeBSD.org>

Fix bhyve PCIe capability emulation

PCIe devices starting with version 1.1 must set the Role-Based Error
Reporting bit.

And while we're in the neighborhood, generalize the code assigning the
device type.

Reviewed by: imp, araujo, rgrimes
Approved by: imp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19580


# 657d2158 22-Aug-2018 Marcelo Araujo <araujo@FreeBSD.org>

Add -s "help" and -l "help" to print a list of supported PCI and LPC devices.

For tools that uses bhyve such like libvirt, it is important to be able to
probe what features are supported by the given bhyve binary.

To give more context, libvirt probes bhyve's capabilities in a not very
effective way:
- Running 'bhyve -h' and parsing output.
- To detect devices, it runs 'bhyve -s 0,dev' for every each device and
parses error output to identify if the device is supported or not.

PR: 2101111
Submitted by: novel
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems Inc.


# f7224b70 13-Jun-2018 Marcelo Araujo <araujo@FreeBSD.org>

Fix style(9) space vs tab.

Reviewed by: jhb
MFC after: 3 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15768


# 92046bf1 22-May-2018 Marcelo Araujo <araujo@FreeBSD.org>

Revert: r334016
Revert for now this change, it in somehow breaks init_pci.


# b5e3928d 21-May-2018 Marcelo Araujo <araujo@FreeBSD.org>

We must free the variable str.

Spotted by: clang's static analyzer
Submitted by: Tom Rix <trix_juniper.net>
Reviewed by: grehan
MFC after: 4 weeks
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D10009


# 1de7b4b8 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

various: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 1b4496d0 14-Jul-2016 Alexander Motin <mav@FreeBSD.org>

Make PCI interupts allocation static when using bootrom (UEFI).

This makes factual interrupt routing match one shipped with UEFI firmware.
With old firmware this make legacy interrupts work reliable for functions 0
of PCI slots 3-6. Updated UEFI image fixes problem completely.


# 7e12dfe5 06-Jul-2016 Enji Cooper <ngie@FreeBSD.org>

Fix CTASSERT issue in a more clean way

- Replace all CTASSERT macro instances with static_assert's.
- Remove the WRAPPED_CTASSERT macro; it's now an unnecessary obfuscation.
- Localize all static_assert's to the structures being tested.
- Sort some headers per-style(9).

Approved by: re (hrs)
Differential Revision: https://reviews.freebsd.org/D7130
MFC after: 1 week
X-MFC with: r302364
Reviewed by: ed, grehan (maintainer)
Submitted by: ed
Sponsored by: EMC / Isilon Storage Division


# edb60334 05-Jul-2016 Enji Cooper <ngie@FreeBSD.org>

Fix gcc warnings

Add `WRAPPED_CTASSERT` macro by annotating CTASSERTs with __unused
to deal with -Wunused-local-typedefs warnings from gcc 4.8+.
All other compilers (clang, etc) use CTASSERT as-is. A more generic
solution for this issue will be proposed after ^/stable/11 is forked.

Consolidate all CTASSERTs under one block instead of inlining them in
functions.

Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7119
MFC after: 1 week
Reported by: Jenkins
Reviewed by: grehan (maintainer)
Sponsored by: EMC / Isilon Storage Division


# 9f3dba68 13-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

bhyve: consider the bogus case of a negative bar idx.

This is a followup to r297472 to squelch Coverity.

CID: 1194319


# 6e43f3ed 31-Mar-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

pci_emul_dior(): fix uninitialized scalar variable.

Prevent from returning an unitialized value in case the
ior size is unknown.

CID: 1194319
Reviewed by: grehan


# d74fdc6a 30-Dec-2015 Marcelo Araujo <araujo@FreeBSD.org>

Clean up unused-but-set-variable spotted by gcc-4.9.

Reviewed by: grehan
Approved by: bapt (mentor)
Differential Revision: https://reviews.freebsd.org/D4735


# 463a577b 20-Oct-2015 Eitan Adler <eadler@FreeBSD.org>

Fix a ton of speelling errors

arc lint is helpful

Reviewed By: allanjude, wblock, #manpages, chris@bsdjunk.com
Differential Revision: https://reviews.freebsd.org/D3337


# fd4e0d4c 01-May-2015 Neel Natu <neel@FreeBSD.org>

Advertise an additional memory BAR in the "dummy" device emulation.

This is useful for testing the MOVS emulation when both the source and
destination addresses are in the MMIO space.

MFC after: 1 week


# 54335630 24-Apr-2015 Neel Natu <neel@FreeBSD.org>

Don't allow guest to modify readonly bits in the PCI config 'status' register.

Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks


# 12a6eb99 07-Aug-2014 Neel Natu <neel@FreeBSD.org>

Support PCI extended config space in bhyve.

Add the ACPI MCFG table to advertise the extended config memory window.

Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted
or moved in the guest's address space. The PCI extended config space is an
example of an immutable memory range.

Add emulation for the "movzw" instruction. This instruction is used by FreeBSD
to read a 16-bit extended config space register.

CR: https://phabric.freebsd.org/D505
Reviewed by: jhb, grehan
Requested by: tychon


# be679db4 23-Jun-2014 Neel Natu <neel@FreeBSD.org>

Provide APIs to directly get 'lowmem' and 'highmem' size directly.

Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).


# 67b6ffaa 09-Jun-2014 Tycho Nightingale <tychon@FreeBSD.org>

r267169 should apply to 64-bit BARs as well.

Reviewed by: neel


# b6ae8b05 06-Jun-2014 Tycho Nightingale <tychon@FreeBSD.org>

Some devices (e.g. Intel AHCI and NICs) support quad-word access to
register pairs where two 32-bit registers make up a larger logical
size. Support those access by splitting the quad-word into two
double-words.

Reviewed by: grehan


# b3e9732a 15-May-2014 John Baldwin <jhb@FreeBSD.org>

Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
8 steerable pins configured via config space access to byte-wide
registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
pin and a PCI interrupt router pin. When a PCI INTx interrupt is
asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
8259A pins (ISA IRQs) and initialize the interrupt line config register
for the corresponding PCI function with the ISA IRQ as this matches
existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
PRT tables and return the appropriate table based on the configured
routing configuration. Note that if the lpc device is not configured, no
routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
the DSDT. iasl(8) will fill in the actual number of elements, and
this makes it simpler to generate a Package with a variable number of
elements.

Reviewed by: tycho


# b100acf2 01-May-2014 Neel Natu <neel@FreeBSD.org>

Don't allow MPtable generation if there are multiple PCI hierarchies. This is
because there isn't a standard way to relay this information to the guest OS.

Add a command line option "-Y" to bhyve(8) to inhibit MPtable generation.

If the virtual machine is using PCI devices on buses other than 0 then it can
still use ACPI tables to convey this information to the guest.

Discussed with: grehan@


# fcbec691 25-Apr-2014 Peter Grehan <grehan@FreeBSD.org>

Respect and track the enable bit in the PCI configuration address word.
Ignore writes, and return 0xff's, on config accesses when not set.
Behaviour now matches that seen on h/w.

Found with a NetBSD/amd64 guest.

Reviewed by: tychon
MFC after: 3 weeks


# 994f858a 22-Apr-2014 Xin LI <delphij@FreeBSD.org>

Use calloc() in favor of malloc + memset.

Reviewed by: neel


# 7a902ec0 18-Feb-2014 Neel Natu <neel@FreeBSD.org>

Add a check to validate that memory BARs of passthru devices are 4KB aligned.

Also, the MSI-x table offset is not required to be 4KB aligned so take this
into account when computing the pages occupied by the MSI-x tables.


# a96b8b80 17-Feb-2014 John Baldwin <jhb@FreeBSD.org>

Tweak the handling of PCI capabilities in emulated devices to remove
the non-standard zero capability list terminator. Instead, track
the start and end of the most recently added capability and use that
to adjust the previous capability's next pointer when a capability is
added and to determine the range of config registers belonging to
PCI capability registers.

Reviewed by: neel


# d84882ca 14-Feb-2014 Neel Natu <neel@FreeBSD.org>

Allow PCI devices to be configured on all valid bus numbers from 0 to 255.

This is done by representing each bus as root PCI device in ACPI. The device
implements the _BBN method to return the PCI bus number to the guest OS.

Each PCI bus keeps track of the resources that is decodes for devices
configured on the bus: i/o, mmio (32-bit) and mmio (64-bit). These windows
are advertised to the guest via the _CRS object of the root device.

Bus 0 is treated specially since it consumes the I/O ports to access the
PCI config space [0xcf8-0xcff]. It also decodes the legacy I/O ports that
are consumed by devices on the LPC bus. For this reason the LPC bridge can
be configured only on bus 0.

The bus number can be specified using the following command line option
to bhyve(8): "-s <bus>:<slot>:<func>,<emul>[,<config>]"

Discussed with: grehan@
Reviewed by: jhb@


# 3cbf3585 29-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Enhance the support for PCI legacy INTx interrupts and enable them in
the virtio backends.
- Add a new ioctl to export the count of pins on the I/O APIC from vmm
to the hypervisor.
- Use pins on the I/O APIC >= 16 for PCI interrupts leaving 0-15 for
ISA interrupts.
- Populate the MP Table with I/O interrupt entries for any PCI INTx
interrupts.
- Create a _PRT table under the PCI root bridge in ACPI to route any
PCI INTx interrupts appropriately.
- Track which INTx interrupts are in use per-slot so that functions
that share a slot attempt to distribute their INTx interrupts across
the four available pins.
- Implicitly mask INTx interrupts if either MSI or MSI-X is enabled
and when the INTx DIS bit is set in a function's PCI command register.
Either assert or deassert the associated I/O APIC pin when the
state of one of those conditions changes.
- Add INTx support to the virtio backends.
- Always advertise the MSI capability in the virtio backends.

Submitted by: neel (7)
Reviewed by: neel
MFC after: 2 weeks


# d2bc4816 27-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Remove support for legacy PCI devices. These haven't been needed since
support for LPC uart devices was added and it conflicts with upcoming
patches to add PCI INTx support.

Reviewed by: neel


# e6c8bc29 02-Jan-2014 John Baldwin <jhb@FreeBSD.org>

Rework the DSDT generation code a bit to generate more accurate info about
LPC devices. Among other things, the LPC serial ports now appear as
ACPI devices.
- Move the info for the top-level PCI bus into the PCI emulation code and
add ResourceProducer entries for the memory ranges decoded by the bus
for memory BARs.
- Add a framework to allow each PCI emulation driver to optionally write
an entry into the DSDT under the \_SB_.PCI0 namespace. The LPC driver
uses this to write a node for the LPC bus (\_SB_.PCI0.ISA).
- Add a linker set to allow any LPC devices to write entries into the
DSDT below the LPC node.
- Move the existing DSDT block for the RTC to the RTC driver.
- Add DSDT nodes for the AT PIC, the 8254 ISA timer, and the LPC UART
devices.
- Add a "SuperIO" device under the LPC node to claim "system resources"
aling with a linker set to allow various drivers to add IO or memory
ranges that should be claimed as a system resource.
- Add system resource entries for the extended RTC IO range, the registers
used for ACPI power management, the ELCR, PCI interrupt routing register,
and post data register.
- Add various helper routines for generating DSDT entries.

Reviewed by: neel (earlier version)


# 4f8be175 16-Dec-2013 Neel Natu <neel@FreeBSD.org>

Add an API to deliver message signalled interrupts to vcpus. This allows
callers treat the MSI 'addr' and 'data' fields as opaque and also lets
bhyve implement multiple destination modes: physical, flat and clustered.

Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)
Reviewed by: grehan@


# ac7304a7 22-Nov-2013 Neel Natu <neel@FreeBSD.org>

Add an ioctl to assert and deassert an ioapic pin atomically. This will be used
to inject edge triggered legacy interrupts into the guest.

Start using the new API in device models that use edge triggered interrupts:
viz. the 8254 timer and the LPC/uart device emulation.

Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# 565bbb86 12-Nov-2013 Neel Natu <neel@FreeBSD.org>

Move the ioapic device model from userspace into vmm.ko. This is needed for
upcoming in-kernel device emulations like the HPET.

The ioctls VM_IOAPIC_ASSERT_IRQ and VM_IOAPIC_DEASSERT_IRQ are used to
manipulate the ioapic pin state.

Discussed with: grehan@
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# c8afb9bc 06-Nov-2013 Neel Natu <neel@FreeBSD.org>

Fix an off-by-one error when iterating over the emulated PCI BARs.

Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)


# ea7f1c8c 28-Oct-2013 Neel Natu <neel@FreeBSD.org>

Add support for PCI-to-ISA LPC bridge emulation. If the LPC bus is attached
to a virtual machine then we implicitly create COM1 and COM2 ISA devices.

Prior to this change the only way of attaching a COM port to the virtual
machine was by presenting it as a PCI device that is mapped at the legacy
I/O address 0x3F8 or 0x2F8.

There were some issues with the original approach:
- It did not work at all with UEFI because UEFI will reprogram the PCI device
BARs and remap the COM1/COM2 ports at non-legacy addresses.
- OpenBSD GENERIC kernel does not create a /dev/console because it expects
the uart device at the legacy 0x3F8/0x2F8 address to be an ISA device.
- It was functional with a FreeBSD guest but caused the console to appear
on /dev/ttyu2 which was not intuitive.

The uart emulation is now independent of the bus on which it resides. Thus it
is possible to have uart devices on the PCI bus in addition to the legacy
COM1/COM2 devices behind the LPC bus.

The command line option to attach ISA COM1/COM2 ports to a virtual machine is
"-s <bus>,lpc -l com1,stdio".

The command line option to create a PCI-attached uart device is:
"-s <bus>,uart[,stdio]"

The command line option to create PCI-attached COM1/COM2 device is:
"-S <bus>,uart[,stdio]". This style of creating COM ports is deprecated.

Discussed with: grehan
Reviewed by: grehan
Submitted by: Tycho Nightingale (tycho.nightingale@pluribusnetworks.com)

M share/examples/bhyve/vmrun.sh
AM usr.sbin/bhyve/legacy_irq.c
AM usr.sbin/bhyve/legacy_irq.h
M usr.sbin/bhyve/Makefile
AM usr.sbin/bhyve/uart_emul.c
M usr.sbin/bhyve/bhyverun.c
AM usr.sbin/bhyve/uart_emul.h
M usr.sbin/bhyve/pci_uart.c
M usr.sbin/bhyve/pci_emul.c
M usr.sbin/bhyve/inout.c
M usr.sbin/bhyve/pci_emul.h
M usr.sbin/bhyve/inout.h
AM usr.sbin/bhyve/pci_lpc.c
AM usr.sbin/bhyve/pci_lpc.h


# 2a8d400a 09-Oct-2013 Peter Grehan <grehan@FreeBSD.org>

Allow a 4-byte write to PCI config space to overlap
the 2 read-only bytes at the start of a PCI capability.
This is the sequence that OpenBSD uses when enabling
MSI interrupts, and works fine on real h/w.

In bhyve, convert the 4 byte write to a 2-byte write to
the r/w area past the first 2 r/o bytes of a capability.

Reviewed by: neel
Approved by: re@ (blanket)


# 318224bb 05-Oct-2013 Neel Natu <neel@FreeBSD.org>

Merge projects/bhyve_npt_pmap into head.

Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
Bit Position Interpreted By
PG_V 52 software (accessed bit emulation handler)
PG_RW 53 software (dirty bit emulation handler)
PG_A 0 hardware (aka EPT_PG_RD)
PG_M 1 hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by: re
Discussed with: grehan
Reviewed by: alc, kib
Tested by: pho


# 6a52209f 27-Aug-2013 Neel Natu <neel@FreeBSD.org>

Allow single byte reads of the emulated MSI-X tables. This is not required
by the PCI specification but needed to dump MMIO space from "ddb" in the
guest.


# 50dc0db3 15-Aug-2013 Peter Grehan <grehan@FreeBSD.org>

Fix ordering of legacy IRQ reservations.

Submitted by: Jeremiah Lott jlott at averesystems dot com


# a38e2a64 03-Jul-2013 Peter Grehan <grehan@FreeBSD.org>

Support an optional "mac=" parameter to virtio-net config, to allow
users to set the MAC address for a device.

Clean up some obsolete code in pci_virtio_net.c

Allow an error return from a PCI device emulation's init routine
to be propagated all the way back to the top-level and result in
the process exiting.

Submitted by: Dinakar Medavaram dinnu sun at gmail (original version)


# 34d244ed 01-Jul-2013 Peter Grehan <grehan@FreeBSD.org>

Fix up option parsing to allow a colon in the config section.
Clean up some other unnecessary code.

Submitted by: Dinakar Medavaram dinnu sun at gmail
Reviewed by: neel


# 75543036 27-Jun-2013 Peter Grehan <grehan@FreeBSD.org>

Allow the PCI config address register to be read. The Linux
kernel does this. Also remove an unused header file.

Submitted by: tycho nightingale at pluribusnetworks com
Reviewed by: neel


# b05c77ff 25-Apr-2013 Neel Natu <neel@FreeBSD.org>

Gripe if some <slot,function> tuple is specified more than once instead of
silently overwriting the previous assignment.

Gripe if the emulation is not recognized instead of silently ignoring the
emulated device.

If an error is detected by pci_parse_slot() then exit from the command line
parsing loop in main().

Submitted by (initial version): Chris Torek (chris.torek@gmail.com)


# 9f08548d 16-Apr-2013 Neel Natu <neel@FreeBSD.org>

Setup accesses to the memory hole below 4GB to return all 1's on read and
consume all writes without any side effects.

Obtained from: NetApp


# 028d9311 09-Apr-2013 Neel Natu <neel@FreeBSD.org>

Improve PCI BAR emulation:
- Respect the MEMEN and PORTEN bits in the command register
- Allow the guest to reprogram the address decoded by the BAR

Submitted by: Gopakumar T
Obtained from: NetApp


# b060ba50 18-Mar-2013 Neel Natu <neel@FreeBSD.org>

Simplify the assignment of memory to virtual machines by requiring a single
command line option "-m <memsize in MB>" to specify the memory size.

Prior to this change the user needed to explicitly specify the amount of
memory allocated below 4G (-m <lowmem>) and the amount above 4G (-M <highmem>).

The "-M" option is no longer supported by 'bhyveload' and 'bhyve'.

The start of the PCI hole is fixed at 3GB and cannot be directly changed
using command line options. However it is still possible to change this in
special circumstances via the 'vm_set_lowmem_limit()' API provided by
libvmmapi.

Submitted by: Dinakar Medavaram (initial version)
Reviewed by: grehan
Obtained from: NetApp


# 0ab13648 21-Feb-2013 Peter Grehan <grehan@FreeBSD.org>

Add the ability to have a 'fallback' search for memory ranges.
These set of ranges will be looked at if a standard memory
range isn't found, and won't be installed in the cache.
Use this to implement the memory behaviour of the PCI hole on
x86 systems, where writes are ignored and reads always return -1.
This allows breakpoints to be set when issuing a 'boot -d', which
has the side effect of accessing the PCI hole when changing the
PTE protection on kernel code, since the pmap layer hasn't been
initialized (a bug, but present in existing FreeBSD releases so
has to be handled).

Reviewed by: neel
Obtained from: NetApp


# 74f80b23 15-Feb-2013 Neel Natu <neel@FreeBSD.org>

Advertise PCI-E capability in the hostbridge device presented to the guest.

FreeBSD wants to see this capability in at least one device in the PCI
hierarchy before it allows use of MSI or MSI-X.

Obtained from: NetApp


# aa12663f 31-Jan-2013 Neel Natu <neel@FreeBSD.org>

Fix a bug in the passthru implementation where it would assume that all
devices are MSI-X capable. This in turn would lead it to treat bar 0 as
the MSI-X table bar even if the underlying device did not support MSI-X.

Fix this by providing an API to query the MSI-X table index of the emulated
device. If the underlying device does not support MSI-X then this API will
return -1.

Obtained from: NetApp


# c9b4e987 29-Jan-2013 Neel Natu <neel@FreeBSD.org>

Add support for MSI-X interrupts in the virtio network device and make that
the default.

The current behavior of advertising a single MSI vector can be requested by
setting the environment variable "BHYVE_USE_MSI" to "true". The use of MSI
is not compliant with the virtio specification and will be eventually phased
out.

Submitted by: Gopakumar T
Obtained from: NetApp


# e285ef8d 12-Dec-2012 Peter Grehan <grehan@FreeBSD.org>

Rename fbsdrun.* -> bhyverun.*

bhyve is intended to be a generic hypervisor, and not FreeBSD-specific.

(renaming internal routines will come later)

Reviewed by: neel
Obtained from: NetApp


# 90415e0b 26-Oct-2012 Neel Natu <neel@FreeBSD.org>

Ignore PCI configuration accesses to all bus numbers other than PCI bus 0.

Obtained from: NetApp


# fbfc1c76 26-Oct-2012 Peter Grehan <grehan@FreeBSD.org>

Remove mptable generation code from libvmmapi and move it to bhyve.
Firmware tables require too much knowledge of system configuration,
and it's difficult to pass that information in general terms to a library.
The upcoming ACPI work exposed this - it will also livein bhyve.

Also, remove code specific to NetApp from the mptable name, and remove
the -n option from bhyve.

Reviewed by: neel
Obtained from: NetApp


# 4d1e669c 19-Oct-2012 Peter Grehan <grehan@FreeBSD.org>

Rework how guest MMIO regions are dealt with.

- New memory region interface. An RB tree holds the regions,
with a last-found per-vCPU cache to deal with the common case
of repeated guest accesses to MMIO registers in the same page.

- Support memory-mapped BARs in PCI emulation.

mem.c/h - memory region interface

instruction_emul.c/h - remove old region interface.
Use gpa from EPT exit to avoid a tablewalk to
determine operand address. Determine operand size
and use when calling through to region handler.

fbsdrun.c - call into region interface on paging
exit. Distinguish between instruction emul error
and region not found

pci_emul.c/h - implement new BAR callback api.
Split BAR alloc routine into routines that
require/don't require the BAR phys address.

ioapic.c
pci_passthru.c
pci_virtio_block.c
pci_virtio_net.c
pci_uart.c - update to new BAR callback i/f

Reviewed by: neel
Obtained from: NetApp


# 25d4944e 06-Aug-2012 Neel Natu <neel@FreeBSD.org>

Fix a bug in how a 64-bit bar in a pci passthru device would be presented to
the guest. Prior to the fix it was possible for such a bar to appear as a
32-bit bar as long as it was allocated from the region below 4GB.

This had the potential to confuse some drivers that were particular about
the size of the bars.

Obtained from: NetApp


# 99d65389 06-Aug-2012 Neel Natu <neel@FreeBSD.org>

Add support for emulating PCI multi-function devices.

These function number is specified by an optional [:<func>] after the slot
number: -s 1:0,virtio-net,tap0

Ditto for the mptable naming: -n 1:0,e0a

Obtained from: NetApp


# b0b53d3a 04-Aug-2012 Neel Natu <neel@FreeBSD.org>

Device model for ioapic emulation.

With this change the uart emulation is entirely interrupt driven.

Obtained from: NetApp


# 308f9077 03-Aug-2012 Neel Natu <neel@FreeBSD.org>

Use the correct variable to index into the 'lirq[]' array to check the legacy
IRQ ownership.


# 0038ee98 02-May-2012 Peter Grehan <grehan@FreeBSD.org>

Add 16550 uart emulation as a PCI device. This allows it to
be activated as part of the slot config options.
The syntax is:

-s <slotnum>,uart[,stdio]

The stdio parameter instructs the code to perform i/o using
stdin/stdout. It can only be used for one instance.
To allow legacy i/o ports/irqs to be used, a new variant of
the slot command, -S, is introduced. When used to specify a
slot, the device will use legacy resources if it supports
them; otherwise it will be treated the same as the '-s' option.
Specifying the -S option with the uart will first use the 0x3f8/irq 4
config, and the second -S will use 0x2F8/irq 3.

Interrupt delivery is awaiting the arrival of the i/o apic code,
but this works fine in uart(4)'s polled mode.

This code was written by Cynthia Lu @ MIT while an intern at NetApp,
with further work from neel@ and grehan@.

Obtained from: NetApp


# cd942e0f 28-Apr-2012 Peter Grehan <grehan@FreeBSD.org>

MSI-x interrupt support for PCI pass-thru devices.

Includes instruction emulation for memory r/w access. This
opens the door for io-apic, local apic, hpet timer, and
legacy device emulation.

Submitted by: ryan dot berryhill at sandvine dot com
Reviewed by: grehan
Obtained from: Sandvine


# 366f6083 12-May-2011 Peter Grehan <grehan@FreeBSD.org>

Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
bhyve - user-space sequencer and i/o emulation
vmmctl - dump of hypervisor register state
libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
Joe CaraDonna
Peter Snyder
Jeff Heller
Sandeep Mann
Steve Miller
Brian Pawlowski