History log of /freebsd-10-stable/usr.sbin/bhyve/pci_virtio_net.c
Revision Date Author Comments
# 307183 13-Oct-2016 np

bhyve(8): Fix typo from r294294 that prevented bhyve from working with
vmnet devices. This is a direct commit to stable/10.


# 295124 01-Feb-2016 grehan

MFC r284539, r284630, r284688, r284877, r285217, r285218,
r286837, r286838, r288470, r288522, r288524, r288826,
r289001

Pull in bhyve bug fixes and changes to allow UEFI booting.
This provides Windows support.

Tested on Intel and AMD with:
- Arch Linux i386+amd64 (kernel 4.3.3)
- Ubuntu 15.10 server 64-bit
- FreeBSD-CURRENT/amd64 20160127 snap
- FreeBSD 10.2 i386+amd64
- OpenBSD 5.8 i386+amd64
- SmartOS latest
- Windows 10 build 1511'

Huge thanks to Yamagi Burmeister who submitted the patch
and did the majority of the testing.

r284539 - bootrom mem allocation support
r284630 - Add SO_REUSEADDR when starting debug port
r284688 - Fix a regression in "movs" emulation
r284877 - verify_gla() non-zero segment base fix
r285217 - Always assert DCD and DSR in the uart
r285218 - devmem nodes moved to /dev/vmm.io/
r286837 - Add define for SATA Check-Power-Mode
r286838 - Add simple (no-op) SATA cmd emulations
r288470 - Increase virtio-blk indirect descs
r288522 - Firmware guest query interface
r288524 - Fix post-test typo
r288826 - Clean up SATA unimplemented cmd msg
r289001 - Add -l option to specify userboot path

Submitted by: Yamagi Burmeister
Approved by: re (kib)


# 294294 18-Jan-2016 gnn

MFC: 293459,293643

Add netmap support for bhyve


# 284900 28-Jun-2015 neel

MFC r282209:
Emulate the 'bit test' instruction.

MFC r282259:
Re-implement RTC current time calculation to eliminate the possibility of
losing time.

MFC r282281:
Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.

MFC r282284:
When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

MFC r282287:
Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.

MFC r282296:
Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC r282301:
Relax limits when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

MFC r282335:
Advertise an additional memory BAR in the "dummy" device emulation.

MFC r282336:
Emulate machine check related MSRs to allow guest OSes like Windows to boot.

MFC r282351:
Don't advertise the Intel SMX capability to the guest.

MFC r282407:
Emulate the 'CMP r/m8, imm8' instruction.

MFC r282519:
Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.

MFC r282520:
Emulate guest writes to EFER_MSR properly.

MFC r282558:
Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().

MFC r282571:
Check 'td_owepreempt' and yield the vcpu thread if it is set.

MFC r282595:
Allow byte reads of AHCI registers.

MFC r282784:
Handling indirect descriptors is a capability of the host and not one that
needs to be negotiated. Use the host capabilities field and not the negotiated
field when verifying that indirect descriptors are supported.

MFC r282788:
Allow configuration of the sector size advertised to the guest.

MFC r282865:
Set the subvendor field in config space to the vendor ID. This is required
by the Windows virtio drivers to correctly match a device.

MFC r282922:
Bump the size of the blockif scatter-gather list to 67.

MFC r283075:
Fix off-by-one in array index bounds check. bhyveload would allow you to
create 33 entries on an array that only has 32 slots

MFC r283168:
Temporarily revert r282922 which bumped the max descriptors.

MFC r283255:
Emulate the "CMP r/m, reg" instruction (opcode 39H).

MFC r283256:
Add an option "--get-vmcs-exit-inst-length" to display the instruction length
of the instruction that caused the VM-exit.

MFC r283264:
Change the header type of the emulated host-bridge from type 1 to type 0.

MFC r283293:
Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation.

MFC r283299:
Remove bogus verification of instruction length after instruction decode.

MFC r283308:
Exceptions don't deliver an error code in real mode.

MFC r283657:
Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state.

MFC r283973:
Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids'
to limit the number of ASIDs used by the hypervisor.

MFC r284046:
Fix regression in 'verify_gla()' with the RIP-relative addressing mode.

MFC r284174:
Support guest writes to the TSC by enabling the "use TSC offsetting"
execution control.


# 284899 27-Jun-2015 neel

MFC r279444:
Allow passthrough devices to be hinted.

MFC r279683:
When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.

MFC r279925:
Add -p parameter to list PCI device to pass through to the guest.

MFC r281559:
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.

MFC r280447:
When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

MFC r280725:
Move legacy interrupt allocation for virtio devices to common code.

MFC r280775:
Fix the RTC device model to operate correctly in 12-hour mode.

MFC r280929:
Fix "MOVS" instruction memory to MMIO emulation.

MFC r280968:
Display instruction bytes and %rip prior to aborting due to an instruction
emulation error.

MFC r281145:
Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions.

MFC r281542:
Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749)

MFC r281561:
Prior to aborting due to an ioport error, it is always interesting to see what
the guest's %rip is.

MFC r281611:
If the number of guest vcpus is less than '1' then flag it as an error.

MFC r281612:
Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.

MFC r281630:
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.

MFC r281879:
Missing break in switch case (Coverity ID 1292499)

MFC r281946:
Don't allow guest to modify readonly bits in the PCI config 'status' register.

MFC r281987:
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.

MFC r282206:
Implement the century byte in the RTC.


# 282840 13-May-2015 mav

MFC r281766, r281767:
Report link as up only if we managed to open tap device.

It would be cool to report tap device status, but it has no such API.


# 282839 13-May-2015 mav

MFC r281764, r282563: Disable RX/TX queues notifications when not needed.

This reduces CPU load and doubles iperf throughput, reaching 2-3Gbit/s.

Sponsored by: iXsystems, Inc.


# 280743 27-Mar-2015 mav

MFC r280026, r280041:
Modify virtqueue helpers added in r253440 to allow queuing.

Original virtqueue design allows queued and out-of-order processing, but
helpers added in r253440 suppose only direct blocking in-order one.
It could be fine for network, etc., but it is a huge limitation for storage
devices.


# 271685 16-Sep-2014 grehan

MFC virtio-net changes.

Re-tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta.

r271299:
Add a callback to be notified about negotiated features.

r271338:
Allow vtnet operation without merged rx buffers.

NetBSD's virtio-net implementation doesn't negotiate
the merged rx-buffers feature. To support this, check
to see if the feature was negotiated, and then adjust
the operation of the receive path accordingly by using
a larger iovec, and a smaller rx header.
In addition, ignore writes to the (read-only) status byte.

Approved by: re (glebius)
Obtained from: Vincenzo Maffione, Universita` di Pisa (r271299)


# 268953 21-Jul-2014 jhb

MFC 264353,264509,264768,264770,264825,264846,264988,265114,265165,265365,
265941,265951,266390,266550,266910:
Various bhyve fixes:
- Don't save host's return address in 'struct vmxctx'.
- Permit non-32-bit accesses to local APIC registers.
- Factor out common ioport handler code.
- Use calloc() in favor of malloc + memset.
- Change the vlapic timer frequency to be in the ballpark of contemporary
hardware.
- Allow the guest to read the TSC via MSR 0x10.
- A VMCS is always inactive when it exits the vmx_run() loop. Remove
redundant code and the misleading comment that suggest otherwise.
- Ignore writes to microcode update MSR. This MSR is accessed by RHEL7
guest.
Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
- Provide an alias for the userboot console and name it 'comconsole'.
- Use EV_ADD to create an mevent and EV_ENABLE to enable it.
- abort(3) the process in response to a VMEXIT_ABORT.
- Don't include the guest memory segments in the bhyve(8) process core dump.
- Make the vmx asm code dtrace-fbt-friendly.
- Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in
the VCPU_RUNNING state.
- Enable VMX in the IA32_FEATURE_CONTROL MSR if it not enabled and the MSR
isn't locked.


# 267393 12-Jun-2014 jhb

MFC 260239,261268,265058:
Expand the support for PCI INTx interrupts including providing interrupt
routing information for INTx interrupts to I/O APIC pins and enabling
INTx interrupts in the virtio and AHCI backends.


# 259301 13-Dec-2013 grehan

MFC r256657,r257018,r257347,r257423,r257729,r257767,
r257933,r258609,r258614,r258668,r258673,r258855

Pull in some minor bugfixes and functionality enhancements
from CURRENT. These are candidates to be moved to 10.0-release.

r258855
mdoc: quote string properly.

r258673
Don't create an initial value for the host filesystem of "/".

r258668
Allow bhyve and bhyveload to attach to tty devices.

r258614
The 22-bit Data Byte Count (DBC) field of a Physical Region Descriptor was
being read as a 32-bit quantity by the bhyve AHCI driver.

r258609
Fix discrepancy between the IOAPIC ID advertised by firmware tables and the
actual value read by the guest.

r257933
Route the legacy timer interrupt (IRQ0) to pin 2 of the IOAPIC.

r257767
Fix an off-by-one error when iterating over the emulated PCI BARs.

r257729
Add the VM name to the process name with setproctitle().

r257423
Make the virtual ioapic available unconditionally in a bhyve virtual machine.

r257347
Update copyright to include the author of the LPC bridge emulation code.

hand-merge r257018
Tidy usage messages for bhyve and bhyveload.

r256657
Add an option to bhyveload(8) that allows setting a loader environment variable
from the command line.

Discussed with: neel


# 256755 18-Oct-2013 grehan

MFC r256709:

Eliminate unconditional debug printfs.

Linux writes to these nominally read-only registers,
so avoid having bhyve write warning messages to stdout
when the reg writes can be safely ignored. Change the
WPRINTF to DPRINTF which is conditional.

Approved by: re (delphij)


# 284900 28-Jun-2015 neel

MFC r282209:
Emulate the 'bit test' instruction.

MFC r282259:
Re-implement RTC current time calculation to eliminate the possibility of
losing time.

MFC r282281:
Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.

MFC r282284:
When an instruction cannot be decoded just return to userspace so bhyve(8)
can dump the instruction bytes.

MFC r282287:
Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.

MFC r282296:
Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs are
enabled.

MFC r282301:
Relax limits when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.

MFC r282335:
Advertise an additional memory BAR in the "dummy" device emulation.

MFC r282336:
Emulate machine check related MSRs to allow guest OSes like Windows to boot.

MFC r282351:
Don't advertise the Intel SMX capability to the guest.

MFC r282407:
Emulate the 'CMP r/m8, imm8' instruction.

MFC r282519:
Add macros for AMD-specific bits in MSR_EFER: LMSLE, FFXSR and TCE.

MFC r282520:
Emulate guest writes to EFER_MSR properly.

MFC r282558:
Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().

MFC r282571:
Check 'td_owepreempt' and yield the vcpu thread if it is set.

MFC r282595:
Allow byte reads of AHCI registers.

MFC r282784:
Handling indirect descriptors is a capability of the host and not one that
needs to be negotiated. Use the host capabilities field and not the negotiated
field when verifying that indirect descriptors are supported.

MFC r282788:
Allow configuration of the sector size advertised to the guest.

MFC r282865:
Set the subvendor field in config space to the vendor ID. This is required
by the Windows virtio drivers to correctly match a device.

MFC r282922:
Bump the size of the blockif scatter-gather list to 67.

MFC r283075:
Fix off-by-one in array index bounds check. bhyveload would allow you to
create 33 entries on an array that only has 32 slots

MFC r283168:
Temporarily revert r282922 which bumped the max descriptors.

MFC r283255:
Emulate the "CMP r/m, reg" instruction (opcode 39H).

MFC r283256:
Add an option "--get-vmcs-exit-inst-length" to display the instruction length
of the instruction that caused the VM-exit.

MFC r283264:
Change the header type of the emulated host-bridge from type 1 to type 0.

MFC r283293:
Don't rely on the 'VM-exit instruction length' field in the VMCS to always
have an accurate length on an EPT violation.

MFC r283299:
Remove bogus verification of instruction length after instruction decode.

MFC r283308:
Exceptions don't deliver an error code in real mode.

MFC r283657:
Fix non-deterministic delays when accessing a vcpu that was in "running" or
"sleeping" state.

MFC r283973:
Use tunable 'hw.vmm.svm.features' to disable specific SVM features even
though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids'
to limit the number of ASIDs used by the hypervisor.

MFC r284046:
Fix regression in 'verify_gla()' with the RIP-relative addressing mode.

MFC r284174:
Support guest writes to the TSC by enabling the "use TSC offsetting"
execution control.


# 284899 27-Jun-2015 neel

MFC r279444:
Allow passthrough devices to be hinted.

MFC r279683:
When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.

MFC r279925:
Add -p parameter to list PCI device to pass through to the guest.

MFC r281559:
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.

MFC r280447:
When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

MFC r280725:
Move legacy interrupt allocation for virtio devices to common code.

MFC r280775:
Fix the RTC device model to operate correctly in 12-hour mode.

MFC r280929:
Fix "MOVS" instruction memory to MMIO emulation.

MFC r280968:
Display instruction bytes and %rip prior to aborting due to an instruction
emulation error.

MFC r281145:
Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions.

MFC r281542:
Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749)

MFC r281561:
Prior to aborting due to an ioport error, it is always interesting to see what
the guest's %rip is.

MFC r281611:
If the number of guest vcpus is less than '1' then flag it as an error.

MFC r281612:
Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.

MFC r281630:
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.

MFC r281879:
Missing break in switch case (Coverity ID 1292499)

MFC r281946:
Don't allow guest to modify readonly bits in the PCI config 'status' register.

MFC r281987:
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.

MFC r282206:
Implement the century byte in the RTC.


# 282840 13-May-2015 mav

MFC r281766, r281767:
Report link as up only if we managed to open tap device.

It would be cool to report tap device status, but it has no such API.


# 282839 13-May-2015 mav

MFC r281764, r282563: Disable RX/TX queues notifications when not needed.

This reduces CPU load and doubles iperf throughput, reaching 2-3Gbit/s.

Sponsored by: iXsystems, Inc.


# 280743 27-Mar-2015 mav

MFC r280026, r280041:
Modify virtqueue helpers added in r253440 to allow queuing.

Original virtqueue design allows queued and out-of-order processing, but
helpers added in r253440 suppose only direct blocking in-order one.
It could be fine for network, etc., but it is a huge limitation for storage
devices.


# 271685 16-Sep-2014 grehan

MFC virtio-net changes.

Re-tested with NetBSD/amd64 5.2.2, 6.1.4 and 7-beta.

r271299:
Add a callback to be notified about negotiated features.

r271338:
Allow vtnet operation without merged rx buffers.

NetBSD's virtio-net implementation doesn't negotiate
the merged rx-buffers feature. To support this, check
to see if the feature was negotiated, and then adjust
the operation of the receive path accordingly by using
a larger iovec, and a smaller rx header.
In addition, ignore writes to the (read-only) status byte.

Approved by: re (glebius)
Obtained from: Vincenzo Maffione, Universita` di Pisa (r271299)


# 268953 21-Jul-2014 jhb

MFC 264353,264509,264768,264770,264825,264846,264988,265114,265165,265365,
265941,265951,266390,266550,266910:
Various bhyve fixes:
- Don't save host's return address in 'struct vmxctx'.
- Permit non-32-bit accesses to local APIC registers.
- Factor out common ioport handler code.
- Use calloc() in favor of malloc + memset.
- Change the vlapic timer frequency to be in the ballpark of contemporary
hardware.
- Allow the guest to read the TSC via MSR 0x10.
- A VMCS is always inactive when it exits the vmx_run() loop. Remove
redundant code and the misleading comment that suggest otherwise.
- Ignore writes to microcode update MSR. This MSR is accessed by RHEL7
guest.
Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
- Provide an alias for the userboot console and name it 'comconsole'.
- Use EV_ADD to create an mevent and EV_ENABLE to enable it.
- abort(3) the process in response to a VMEXIT_ABORT.
- Don't include the guest memory segments in the bhyve(8) process core dump.
- Make the vmx asm code dtrace-fbt-friendly.
- Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in
the VCPU_RUNNING state.
- Enable VMX in the IA32_FEATURE_CONTROL MSR if it not enabled and the MSR
isn't locked.


# 267393 12-Jun-2014 jhb

MFC 260239,261268,265058:
Expand the support for PCI INTx interrupts including providing interrupt
routing information for INTx interrupts to I/O APIC pins and enabling
INTx interrupts in the virtio and AHCI backends.


# 259301 13-Dec-2013 grehan

MFC r256657,r257018,r257347,r257423,r257729,r257767,
r257933,r258609,r258614,r258668,r258673,r258855

Pull in some minor bugfixes and functionality enhancements
from CURRENT. These are candidates to be moved to 10.0-release.

r258855
mdoc: quote string properly.

r258673
Don't create an initial value for the host filesystem of "/".

r258668
Allow bhyve and bhyveload to attach to tty devices.

r258614
The 22-bit Data Byte Count (DBC) field of a Physical Region Descriptor was
being read as a 32-bit quantity by the bhyve AHCI driver.

r258609
Fix discrepancy between the IOAPIC ID advertised by firmware tables and the
actual value read by the guest.

r257933
Route the legacy timer interrupt (IRQ0) to pin 2 of the IOAPIC.

r257767
Fix an off-by-one error when iterating over the emulated PCI BARs.

r257729
Add the VM name to the process name with setproctitle().

r257423
Make the virtual ioapic available unconditionally in a bhyve virtual machine.

r257347
Update copyright to include the author of the LPC bridge emulation code.

hand-merge r257018
Tidy usage messages for bhyve and bhyveload.

r256657
Add an option to bhyveload(8) that allows setting a loader environment variable
from the command line.

Discussed with: neel


# 256755 18-Oct-2013 grehan

MFC r256709:

Eliminate unconditional debug printfs.

Linux writes to these nominally read-only registers,
so avoid having bhyve write warning messages to stdout
when the reg writes can be safely ignored. Change the
WPRINTF to DPRINTF which is conditional.

Approved by: re (delphij)