History log of /freebsd-current/usr.sbin/bhyve/mevent.c
Revision Date Author Comments
# 4dfa329f 21-Feb-2024 Jessica Clarke <jrtc27@jrtc27.com>

bhyve: Extend mevent to support updating timers

This will be used by a new PL031 implementation to provide an RTC for
arm64 guests.

Reviewed by: jhb
MFC after: 2 weeks
Obtained from: CheriBSD


# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# b3e76948 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 98d920d9 08-Oct-2022 Mark Johnston <markj@FreeBSD.org>

bhyve: Annotate unused function parameters

MFC after: 1 week


# c2fa905c 26-Dec-2021 Toomas Soome <tsoome@FreeBSD.org>

bhyve: clean up trailing whitespaces

Clean up trailing whitespaces. No functional changes.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33681


# 0b29683b 12-Dec-2021 Robert Wing <rew@FreeBSD.org>

bhyve: set EV_CLEAR for EVFILT_VNODE mevents

When an EVFILT_VNODE filter event is triggered, reset it.

This fixes the issue where a virtio-blk resize event would cause the
mevent thread to consume 100% of the cpu.

Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D33326


# 7ecdfc82 25-Sep-2021 John Baldwin <jhb@FreeBSD.org>

bhyve: Add an empty case for event types in mevent_kq_fflags().

This fixes a -Wswitch error raised by GCC 9.

Differential Revision: https://reviews.freebsd.org/D31938


# 67d60dcc 11-Jun-2021 John Baldwin <jhb@FreeBSD.org>

bhyve: Add support for EVFILT_VNODE mevents.

This allows registering an event to watch for changes to a file's
attributes. This is a bit imperfect as it would be nice to have a way
to determine if an fd can use EVFILT_VNODE successfully. mevent's
current structure does not permit that and a failure to register a
single kevent impacts several other kevents.

Reviewed by: grehan, markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D30503


# e8424e29 11-Jun-2021 John Baldwin <jhb@FreeBSD.org>

bhyve: Register new kevents synchronously.

Change mevent_add*() to synchronously add the new kevent. This
permits reporting event registration failures to the caller and avoids
failing the registration of other, unrelated events queued up in the
same batch.

Reviewed by: grehan, markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D30502


# 621b5090 26-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Refactor configuration management in bhyve.

Replace the existing ad-hoc configuration via various global variables
with a small database of key-value pairs. The database supports
heirarchical keys using a MIB-like syntax to name the path to a given
key. Values are always stored as strings. The API used to manage
configuation values does include wrappers to handling boolean values.
Other values use non-string types require parsing by consumers.

The configuration values are stored in a tree using nvlists. Leaf
nodes hold string values. Configuration values are permitted to
reference other configuration values using '%(name)'. This permits
constructing template configurations.

All existing command line arguments now set configuration values. For
devices, the "-s" option parses its option argument to generate a list
of key-value pairs for the given device.

A new '-o' command line option permits setting an individual
configuration variable. The key name is always given as a full path
of dot-separated components.

A new '-k' command line option parses a simple configuration file.
This configuration file holds a flat list of 'key=value' lines where
the 'key' is the full path of a configuration variable. Lines
starting with a '#' are comments.

In general, bhyve starts by parsing command line options in sequence
and applying those settings to configuration values. Once this is
complete, bhyve then begins initializing its state based on the
configuration values. This means that subsequent configuration
options or files may override or supplement previously given settings.

A special 'config.dump' configuration value can be set to true to help
debug configuration issues. When this value is set, bhyve will print
out the configuration variables as a flat list of 'key=value' lines.

Most command line argments map to a single configuration variable,
e.g. '-w' sets the 'x86.strictmsr' value to false. A few command
line arguments have less obvious effects:

- Multiple '-p' options append their values (as a comma-seperated
list) to "vcpu.N.cpuset" values (where N is a decimal vcpu number).

- For '-s' options, a pci.<bus>.<slot>.<function> node is created.
The first argument to '-s' (the device type) is used as the value of
a "device" variable. Additional comma-separated arguments are then
parsed into 'key=value' pairs and used to set additional variables
under the device node. A PCI device emulation driver can provide
its own hook to override the parsing of the additonal '-s' arguments
after the device type.

After the configuration phase as completed, the init_pci hook
then walks the "pci.<bus>.<slot>.<func>" nodes. It uses the
"device" value to find the device model to use. The device
model's init routine is passed a reference to its nvlist node
in the configuration tree which it can query for specific
variables.

The result is that a lot of the string parsing is removed from
the device models and centralized. In addition, adding a new
variable just requires teaching the model to look for the new
variable.

- For '-l' options, a similar model is used where the string is
parsed into values that are later read during initialization.
One key note here is that the serial ports use the commonly
used lowercase names from existing documentation and examples
(e.g. "lpc.com1") instead of the uppercase names previously
used internally in bhyve.

Reviewed by: grehan
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D26035


# 483d953a 04-May-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for bhyve save and restore.

Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.

To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.

While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.

Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495


# 07b35f77 12-Nov-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: rework mevent processing to fix a race condition

At the end of both mevent_add() and mevent_update(), mevent_notify()
is called to wakeup the I/O thread, that will call kevent(changelist)
to update the kernel.
A race condition is possible where the client calls mevent_add() and
mevent_update(EV_ENABLE) before the I/O thread has the chance to wake
up and call mevent_build()+kevent(changelist) in response to mevent_add().
The mevent_add() is therefore ignored by the I/O thread, and
kevent(fd, EV_ENABLE) is called before kevent(fd, EV_ADD), resuliting
in a failure of the kevent(fd, EV_ENABLE) call.

PR: 241808
Reviewed by: jhb, markj
MFC with: r354288
Differential Revision: https://reviews.freebsd.org/D22286


# 3e11768e 03-Nov-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: add backend rx backpressure to virtio-net

If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().

Reviewed by: bryanv, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20987


# d12c5ef6 27-Sep-2019 Vincenzo Maffione <vmaffione@FreeBSD.org>

bhyve: support for enabling/disabling the net backend

Extend the net backend interface with two functions, namely netbe_rx_disable()
and netbe_rx_enable(), which can be used by the net device emulators to stop
the backend from invoking the receive callback. This is useful for device
emulators, i.e., on hardware resets or to implement receive backpressure.
The mevent module has been extendede to support the addition of a disabled
event. To prevent race conditions, the net backends will start with receive
operation disabled. A follow-up patch will use the new functionalities in
the virtio-net device.

Reviewed by: jhb, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20973


# fe1329e4 11-Jul-2019 Sean Chittenden <seanc@FreeBSD.org>

usr.sbin/bhyve: send an initialized value to wake up blocking kqueue

This is a no-op initialization because nothing reads this value. "This
wasn't wrong previously, but this is more correct now." -imp

Coverity CID: 1194307
Approved by: markj, imp, scottl
Differential Revision: https://reviews.freebsd.org/D20921


# abfa3c39 15-Jan-2019 Marcelo Araujo <araujo@FreeBSD.org>

Use capsicum_helpers(3) that allow us to simplify the code and its functions
will return success when the kernel is built without support of
the capability mode.

It is important to note, that I'm taking a more conservative approach
with these changes and it will be done in small steps.

Reviewed by: jhb
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D18744


# f7224b70 13-Jun-2018 Marcelo Araujo <araujo@FreeBSD.org>

Fix style(9) space vs tab.

Reviewed by: jhb
MFC after: 3 weeks.
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D15768


# 1de7b4b8 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

various: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 00ef17be 14-Feb-2017 Bartek Rutkowski <robak@FreeBSD.org>

Capsicum support for bhyve(8).

Adds Capsicum sandboxing to bhyve.

Submitted by: Pawel Biernacki <pawel.biernacki@gmail.com>
Reviewed by: grehan, oshogbo
Approved by: emaste, grehan
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D8290


# 09fd42cb 05-May-2014 Neel Natu <neel@FreeBSD.org>

Re-adding an event to a kqueue modifies the parameters of the original event.
However, if the original knote had been disabled then it is not automatically
re-enabled.

Fix this by using EV_ADD to create an mevent and EV_ENABLE to enable it.

Adding a kevent for the first time implicitly enables it so existing callers
of mevent_add() don't need to change.

Reviewed by: grehan


# 994f858a 22-Apr-2014 Xin LI <delphij@FreeBSD.org>

Use calloc() in favor of malloc + memset.

Reviewed by: neel


# 058e24d3 27-Dec-2013 John Baldwin <jhb@FreeBSD.org>

Extend the ACPI power management support to wire a virtual power button up
to SIGTERM when ACPI is enabled. Sending SIGTERM to the hypervisor when an
ACPI-aware OS is running will now trigger a soft-off allowing for a graceful
shutdown of the guest.
- Move constants for ACPI-related registers to acpi.h.
- Implement an SMI_CMD register with commands to enable and disable ACPI.
Currently the only change when ACPI is enabled is to enable the virtual
power button via SIGTERM.
- Implement a fixed-feature power button when ACPI is enabled by asserting
PWRBTN_STS in PM1_EVT when SIGTERM is received.
- Add support for EVFILT_SIGNAL events to mevent.
- Implement support for the ACPI system command interrupt (SCI) and assert
it when needed based on the values in PM1_EVT. Mark the SCI as active-low
and level triggered in the MADT and MP Table.
- Mark PCI interrupts in the MP Table as active-low in addition to level
triggered.

Reviewed by: neel


# 7f5487ac 05-Nov-2013 Peter Grehan <grehan@FreeBSD.org>

Add the VM name to the process name with setproctitle().
Remove the VM name from some of the thread-naming calls
since it is now in the proc title.
Slightly modify the thread-naming for the net and block
threads.

This improves readability when using top/ps with the -a
and -H options on a system with a large number of bhyve VMs.

Requested by: Michael Dexter
Reviewed by: neel
MFC after: 4 weeks


# 151dba4a 18-Sep-2013 Peter Grehan <grehan@FreeBSD.org>

Add simplistic periodic timer support to mevent using kqueue's
timer support. This should be enough for the emulation of
h/w periodic timers (and no more) e.g. some of the 8254's
more esoteric modes that happen to be used by non-FreeBSD o/s's.

Approved by: re@ (blanket)


# 4e8c7465 20-Dec-2012 Peter Grehan <grehan@FreeBSD.org>

Change thread name for the main kqueue event loop to "<vmname> mevent" so
it can be easily distinguished from other non-vCPU threads in forthcoming
changes.

Obtained from: NetApp


# 366f6083 12-May-2011 Peter Grehan <grehan@FreeBSD.org>

Import of bhyve hypervisor and utilities, part 1.
vmm.ko - kernel module for VT-x, VT-d and hypervisor control
bhyve - user-space sequencer and i/o emulation
vmmctl - dump of hypervisor register state
libvmm - front-end to vmm.ko chardev interface

bhyve was designed and implemented by Neel Natu.

Thanks to the following folk from NetApp who helped to make this available:
Joe CaraDonna
Peter Snyder
Jeff Heller
Sandeep Mann
Steve Miller
Brian Pawlowski