323739 |
19-Sep-2017 |
avg |
MFV r320195: bhyveload: correctly query size of disks
On FreeBSD fstat(2) works fine for querying sizes of plain files, but not so much for character devices. So, use DIOCGMEDIASIZE to try to get the correct size for disks and disk-like devices (e.g. zvols).
PR: 220186 |
295124 |
01-Feb-2016 |
grehan |
MFC r284539, r284630, r284688, r284877, r285217, r285218, r286837, r286838, r288470, r288522, r288524, r288826, r289001
Pull in bhyve bug fixes and changes to allow UEFI booting. This provides Windows support.
Tested on Intel and AMD with: - Arch Linux i386+amd64 (kernel 4.3.3) - Ubuntu 15.10 server 64-bit - FreeBSD-CURRENT/amd64 20160127 snap - FreeBSD 10.2 i386+amd64 - OpenBSD 5.8 i386+amd64 - SmartOS latest - Windows 10 build 1511'
Huge thanks to Yamagi Burmeister who submitted the patch and did the majority of the testing.
r284539 - bootrom mem allocation support r284630 - Add SO_REUSEADDR when starting debug port r284688 - Fix a regression in "movs" emulation r284877 - verify_gla() non-zero segment base fix r285217 - Always assert DCD and DSR in the uart r285218 - devmem nodes moved to /dev/vmm.io/ r286837 - Add define for SATA Check-Power-Mode r286838 - Add simple (no-op) SATA cmd emulations r288470 - Increase virtio-blk indirect descs r288522 - Firmware guest query interface r288524 - Fix post-test typo r288826 - Clean up SATA unimplemented cmd msg r289001 - Add -l option to specify userboot path
Submitted by: Yamagi Burmeister Approved by: re (kib) |
284900 |
28-Jun-2015 |
neel |
MFC r282209: Emulate the 'bit test' instruction.
MFC r282259: Re-implement RTC current time calculation to eliminate the possibility of losing time.
MFC r282281: Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.
MFC r282284: When an instruction cannot be decoded just return to userspace so bhyve(8) can dump the instruction bytes.
MFC r282287: Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.
MFC r282296: Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are enabled.
MFC r282301: Relax limits when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI.
MFC r282335: Advertise an additional memory BAR in the "dummy" device emulation.
MFC r282336: Emulate machine check related MSRs to allow guest OSes like Windows to boot.
MFC r282351: Don't advertise the Intel SMX capability to the guest.
MFC r282407: Emulate the 'CMP r/m8, imm8' instruction.
MFC r282519: Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.
MFC r282520: Emulate guest writes to EFER_MSR properly.
MFC r282558: Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().
MFC r282571: Check 'td_owepreempt' and yield the vcpu thread if it is set.
MFC r282595: Allow byte reads of AHCI registers.
MFC r282784: Handling indirect descriptors is a capability of the host and not one that needs to be negotiated. Use the host capabilities field and not the negotiated field when verifying that indirect descriptors are supported.
MFC r282788: Allow configuration of the sector size advertised to the guest.
MFC r282865: Set the subvendor field in config space to the vendor ID. This is required by the Windows virtio drivers to correctly match a device.
MFC r282922: Bump the size of the blockif scatter-gather list to 67.
MFC r283075: Fix off-by-one in array index bounds check. bhyveload would allow you to create 33 entries on an array that only has 32 slots
MFC r283168: Temporarily revert r282922 which bumped the max descriptors.
MFC r283255: Emulate the "CMP r/m, reg" instruction (opcode 39H).
MFC r283256: Add an option "--get-vmcs-exit-inst-length" to display the instruction length of the instruction that caused the VM-exit.
MFC r283264: Change the header type of the emulated host-bridge from type 1 to type 0.
MFC r283293: Don't rely on the 'VM-exit instruction length' field in the VMCS to always have an accurate length on an EPT violation.
MFC r283299: Remove bogus verification of instruction length after instruction decode.
MFC r283308: Exceptions don't deliver an error code in real mode.
MFC r283657: Fix non-deterministic delays when accessing a vcpu that was in "running" or "sleeping" state.
MFC r283973: Use tunable 'hw.vmm.svm.features' to disable specific SVM features even though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor.
MFC r284046: Fix regression in 'verify_gla()' with the RIP-relative addressing mode.
MFC r284174: Support guest writes to the TSC by enabling the "use TSC offsetting" execution control. |
270159 |
19-Aug-2014 |
grehan |
MFC r267921, r267934, r267949, r267959, r267966, r268202, r268276, r268427, r268428, r268521, r268638, r268639, r268701, r268777, r268889, r268922, r269008, r269042, r269043, r269080, r269094, r269108, r269109, r269281, r269317, r269700, r269896, r269962, r269989.
Catch bhyve up to CURRENT.
Lightly tested with FreeBSD i386/amd64, Linux i386/amd64, and OpenBSD/amd64. Still resolving an issue with OpenBSD/i386.
Many thanks to jhb@ for all the hard work on the prior MFCs !
r267921 - support the "mov r/m8, imm8" instruction r267934 - document options r267949 - set DMI vers/date to fixed values r267959 - doc: sort cmd flags r267966 - EPT misconf post-mortem info r268202 - use correct flag for event index r268276 - 64-bit virtio capability api r268427 - invalidate guest TLB when cr3 is updated, needed for TSS r268428 - identify vcpu's operating mode r268521 - use correct offset in guest logical-to-linear translation r268638 - chs value r268639 - chs fake values r268701 - instr emul operand/address size override prefix support r268777 - emulation for legacy x86 task switching r268889 - nested exception support r268922 - fix INVARIANTS build r269008 - emulate instructions found in the OpenBSD/i386 5.5 kernel r269042 - fix fault injection r269043 - Reduce VMEXIT_RESTARTs in task_switch.c r269080 - fix issues in PUSH emulation r269094 - simplify return values from the inout handlers r269108 - don't return -1 from the push emulation handler r269109 - avoid permanent sleep in vm_handle_hlt() r269281 - list VT-x features in base kernel dmesg r269317 - Mark AHCI fatal errors as not completed r269700 - Support PCI extended config space in bhyve r269896 - Minor cleanup r269962 - use max guest memory when creating IOMMU domain r269989 - fix interrupt mode names |
270074 |
17-Aug-2014 |
grehan |
MFC r267311, r267330, r267811, r267884
Turn on interrupt window exiting unconditionally when an ExtINT is being injected into the guest.
Add helper functions to populate VM exit information for rendezvous and astpending exits.
Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Expose the amount of resident and wired memory from the guest's vmspace |
270071 |
17-Aug-2014 |
grehan |
MFC r267216 Add ioctl(VM_REINIT) to reinitialize the virtual machine state maintained by vmm.ko. This allows the virtual machine to be restarted without having to destroy it first. |
268932 |
20-Jul-2014 |
jhb |
MFC 262331,262487,262495,262523: ZFS boot support for bhyveload. |
267399 |
12-Jun-2014 |
jhb |
MFC 261504: Add support for FreeBSD/i386 guests under bhyve. |
267394 |
12-Jun-2014 |
jhb |
MFC 261229: o Fix typo, sort .Xrs.
PR: docs/186191 |
259301 |
13-Dec-2013 |
grehan |
MFC r256657,r257018,r257347,r257423,r257729,r257767, r257933,r258609,r258614,r258668,r258673,r258855
Pull in some minor bugfixes and functionality enhancements from CURRENT. These are candidates to be moved to 10.0-release.
r258855 mdoc: quote string properly.
r258673 Don't create an initial value for the host filesystem of "/".
r258668 Allow bhyve and bhyveload to attach to tty devices.
r258614 The 22-bit Data Byte Count (DBC) field of a Physical Region Descriptor was being read as a 32-bit quantity by the bhyve AHCI driver.
r258609 Fix discrepancy between the IOAPIC ID advertised by firmware tables and the actual value read by the guest.
r257933 Route the legacy timer interrupt (IRQ0) to pin 2 of the IOAPIC.
r257767 Fix an off-by-one error when iterating over the emulated PCI BARs.
r257729 Add the VM name to the process name with setproctitle().
r257423 Make the virtual ioapic available unconditionally in a bhyve virtual machine.
r257347 Update copyright to include the author of the LPC bridge emulation code.
hand-merge r257018 Tidy usage messages for bhyve and bhyveload.
r256657 Add an option to bhyveload(8) that allows setting a loader environment variable from the command line.
Discussed with: neel |
259073 |
07-Dec-2013 |
peter |
Hoist all the mergeinfo up to the root in preparation for enforcing merges to the root only. All MFC's were rerecorded to the root.
Going forward, if an MFC includes mergeinfo, it will need to be made to the root and committed from the root. Merges with --ignore-ancestry or diff | patch can go anywhere.
The mergeinfo in HEAD is in a bad state from years of neglect and manual tampering and this was branched into 10.x. This confuses the coalescing code and prevents it from doing its job.
Approved by: re (gjb, implicit) |
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
256237 |
09-Oct-2013 |
joel |
Fix missing .
Approved by: re (blanket)
|
256176 |
09-Oct-2013 |
neel |
Parse the memory size parameter using expand_number() to allow specifying the memory size more intuitively (e.g. 512M, 4G etc).
Submitted by: rodrigc Reviewed by: grehan Approved by: re (blanket)
|
256072 |
05-Oct-2013 |
neel |
Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve guests. This allows bhyve to associate each guest with its own vmspace and deal with nested page faults in the context of that vmspace. This also enables features like accessed/dirty bit tracking, swapping to disk and transparent superpage promotions of guest memory.
Guest vmspace: Each bhyve guest has a unique vmspace to represent the physical memory allocated to the guest. Each memory segment allocated by the guest is mapped into the guest's address space via the 'vmspace->vm_map' and is backed by an object of type OBJT_DEFAULT.
pmap types: The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.
The PT_X86 pmap type is used by the vmspace associated with the host kernel as well as user processes executing on the host. The PT_EPT pmap is used by the vmspace associated with a bhyve guest.
Page Table Entries: The EPT page table entries as mostly similar in functionality to regular page table entries although there are some differences in terms of what bits are used to express that functionality. For e.g. the dirty bit is represented by bit 9 in the nested PTE as opposed to bit 6 in the regular x86 PTE. Therefore the bitmask representing the dirty bit is now computed at runtime based on the type of the pmap. Thus PG_M that was previously a macro now becomes a local variable that is initialized at runtime using 'pmap_modified_bit(pmap)'.
An additional wrinkle associated with EPT mappings is that older Intel processors don't have hardware support for tracking accessed/dirty bits in the PTE. This means that the amd64/pmap code needs to emulate these bits to provide proper accounting to the VM subsystem. This is achieved by using the following mapping for EPT entries that need emulation of A/D bits: Bit Position Interpreted By PG_V 52 software (accessed bit emulation handler) PG_RW 53 software (dirty bit emulation handler) PG_A 0 hardware (aka EPT_PG_RD) PG_M 1 hardware (aka EPT_PG_WR)
The idea to use the mapping listed above for A/D bit emulation came from Alan Cox (alc@).
The final difference with respect to x86 PTEs is that some EPT implementations do not support superpage mappings. This is recorded in the 'pm_flags' field of the pmap.
TLB invalidation: The amd64/pmap code has a number of ways to do invalidation of mappings that may be cached in the TLB: single page, multiple pages in a range or the entire TLB. All of these funnel into a single EPT invalidation routine called 'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and sends an IPI to the host cpus that are executing the guest's vcpus. On a subsequent entry into the guest it will detect that the EPT has changed and invalidate the mappings from the TLB.
Guest memory access: Since the guest memory is no longer wired we need to hold the host physical page that backs the guest physical page before we can access it. The helper functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.
PCI passthru: Guest's with PCI passthru devices will wire the entire guest physical address space. The MMIO BAR associated with the passthru device is backed by a vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that have one or more PCI passthru devices attached to them.
Limitations: There isn't a way to map a guest physical page without execute permissions. This is because the amd64/pmap code interprets the guest physical mappings as user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U shares the same bit position as EPT_PG_EXECUTE all guest mappings become automatically executable.
Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews as well as their support and encouragement.
Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing object for pci passthru mmio regions.
Special thanks to Peter Holm for testing the patch on short notice.
Approved by: re Discussed with: grehan Reviewed by: alc, kib Tested by: pho
|
248492 |
19-Mar-2013 |
joel |
mdoc: remove superfluous paragraph macro.
|
248477 |
18-Mar-2013 |
neel |
Simplify the assignment of memory to virtual machines by requiring a single command line option "-m <memsize in MB>" to specify the memory size.
Prior to this change the user needed to explicitly specify the amount of memory allocated below 4G (-m <lowmem>) and the amount above 4G (-M <highmem>).
The "-M" option is no longer supported by 'bhyveload' and 'bhyve'.
The start of the PCI hole is fixed at 3GB and cannot be directly changed using command line options. However it is still possible to change this in special circumstances via the 'vm_set_lowmem_limit()' API provided by libvmmapi.
Submitted by: Dinakar Medavaram (initial version) Reviewed by: grehan Obtained from: NetApp
|
245667 |
19-Jan-2013 |
joel |
Remove EOL whitespace.
|
245666 |
19-Jan-2013 |
joel |
Minor mdoc fixes.
|
245652 |
19-Jan-2013 |
neel |
Merge projects/bhyve to head.
'bhyve' was developed by grehan@ and myself at NetApp (thanks!).
Special thanks to Peter Snyder, Joe Caradonna and Michael Dexter for their support and encouragement.
Obtained from: NetApp
|
245155 |
08-Jan-2013 |
neel |
Add the 'bhyveload(8)' man page.
Obtained from: NetApp Reviewed by: grehan
|
245144 |
08-Jan-2013 |
neel |
Reduce the default memory allocation for a VM from 768MB to 128MB.
Obtained from: NetApp
|
242882 |
11-Nov-2012 |
neel |
IFC @ r242684
|
242676 |
06-Nov-2012 |
neel |
Use the new userboot 'getenv' callback to set a couple of environment variables in the guest.
The variables are: smbios.bios.vendor=BHYVE and boot_serial=1
The FreeBSD guest uses the "smbios.bios.vendor" environment variable to detect whether or not it is running as a guest inside a hypervisor.
The "boot_serial=1" is temporary and will be dropped when bhyve can do VGA emulation.
Obtained from: NetApp
|
234695 |
26-Apr-2012 |
grehan |
IFC @ r234692
sys/amd64/include/cpufunc.h sys/amd64/include/fpu.h sys/amd64/amd64/fpu.c sys/amd64/vmm/vmm.c
- Add API to allow vmm FPU state init/save/restore.
FP stuff discussed with: kib
|
223828 |
06-Jul-2011 |
neel |
'bhyveload' is a userspace FreeBSD loader that can load the kernel + metadata inside a BHyVe-based virtual machine.
It is a thin wrapper on top of userboot.so which is a variant of the FreeBSD loader packaged as a shared library. 'bhyveload' provides callbacks that are utilized by userboot.so to do things like console i/o, disk i/o, set virtual machine registers etc.
Thanks for Doug Rabson (dfr@) for making this happen.
|