Cross Reference: /linux-master/drivers/hv/channel

History log of /linux-master/drivers/hv/channel_mgmt.c
Revision	Date	Author	Comments
# e8c4bd6c	30-Mar-2024	Saurabh Sengar <ssengar@linux.microsoft.com>	Drivers: hv: vmbus: Add utility function for querying ring size Add a function to query for the preferred ring buffer size of VMBus device. This will allow the drivers (eg. UIO) to allocate the most optimized ring buffer size for devices. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1711788723-8593-2-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 320805ab	18-May-2023	Michael Kelley <mikelley@microsoft.com>	Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs vmbus_wait_for_unload() may be called in the panic path after other CPUs are stopped. vmbus_wait_for_unload() currently loops through online CPUs looking for the UNLOAD response message. But the values of CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used to stop the other CPUs, and in one of the paths the stopped CPUs are removed from cpu_online_mask. This removal happens in both x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() only checks the panic'ing CPU, and misses the UNLOAD response message except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() eventually times out, but only after waiting 100 seconds. Fix this by looping through present CPUs in vmbus_wait_for_unload(). The cpu_present_mask is not modified by stopping the other CPUs in the panic path, nor should it be. Also, in a CoCo VM the synic_message_page is not allocated in hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs() and hv_synic_disable_regs() such that it is set only when the CPU is online. If not all present CPUs are online when vmbus_wait_for_unload() is called, the synic_message_page might be NULL. Add a check for this. Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") Cc: stable@vger.kernel.org Reported-by: John Starks <jostarks@microsoft.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 2c6ba421	26-Mar-2023	Michael Kelley <mikelley@microsoft.com>	PCI: hv: Enable PCI pass-thru devices in Confidential VMs For PCI pass-thru devices in a Confidential VM, Hyper-V requires that PCI config space be accessed via hypercalls. In normal VMs, config space accesses are trapped to the Hyper-V host and emulated. But in a confidential VM, the host can't access guest memory to decode the instruction for emulation, so an explicit hypercall must be used. Add functions to make the new MMIO read and MMIO write hypercalls. Update the PCI config space access functions to use the hypercalls when such use is indicated by Hyper-V flags. Also, set the flag to allow the Hyper-V PCI driver to be loaded and used in a Confidential VM (a.k.a., "Isolation VM"). The driver has previously been hardened against a malicious Hyper-V host[1]. [1] https://lore.kernel.org/all/20220511223207.3386-2-parri.andrea@gmail.com/ Co-developed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-13-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# f92a4b50	19-Nov-2022	Yang Yingliang <yangyingliang@huawei.com>	Drivers: hv: vmbus: fix double free in the error path of vmbus_add_channel_work() In the error path of vmbus_device_register(), device_unregister() is called, which calls vmbus_device_release(). The latter frees the struct hv_device that was passed in to vmbus_device_register(). So remove the kfree() in vmbus_add_channel_work() to avoid a double free. Fixes: c2e5df616e1a ("vmbus: add per-channel sysfs info") Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20221119081135.1564691-2-yangyingliang@huawei.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 656c5ba5	09-Jun-2022	Saurabh Sengar <ssengar@linux.microsoft.com>	Drivers: hv: vmbus: Release cpu lock in error case In case of invalid sub channel, release cpu lock before returning. Fixes: a949e86c0d780 ("Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug") Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1654794996-13244-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 6640b5df	27-May-2022	Saurabh Sengar <ssengar@linux.microsoft.com>	Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs When initially assigning a VMbus channel interrupt to a CPU, don’t choose a managed IRQ isolated CPU (as specified on the kernel boot line with parameter 'isolcpus=managed_irq,<#cpu>'). Also, when using sysfs to change the CPU that a VMbus channel will interrupt, don't allow changing to a managed IRQ isolated CPU. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1653637439-23060-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 1940f9f8	21-May-2022	Julia Lawall <Julia.Lawall@inria.fr>	Drivers: hv: vmbus: fix typo in comment Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220521111145.81697-60-Julia.Lawall@inria.fr Signed-off-by: Wei Liu <wei.liu@kernel.org>
# a6b94c6b	02-May-2022	Michael Kelley <mikelley@microsoft.com>	Drivers: hv: vmbus: Remove support for Hyper-V 2008 and Hyper-V 2008R2/Win7 The VMbus driver has special case code for running on the first released versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions are now out of support (except for extended security updates) and lack the performance features needed for effective production usage of Linux guests. Simplify the code by removing the negotiation of the VMbus protocol versions required for these releases of Hyper-V, and by removing the special case code for handling these VMbus protocol versions. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/1651509391-2058-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# da795eb2	28-Apr-2022	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Accept hv_sock offers in isolated guests So that isolated guests can communicate with the host via hv_sock channels. Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220428145107.7878-5-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 66200bbc	12-Apr-2022	Michael Kelley <mikelley@microsoft.com>	Drivers: hv: vmbus: Add VMbus IMC device to unsupported list Hyper-V may offer an Initial Machine Configuration (IMC) synthetic device to guest VMs. The device may be used by Windows guests to get specialization information, such as the hostname. But the device is not used in Linux and there is no Linux driver, so it is unsupported. Currently, the IMC device GUID is not recognized by the VMbus driver, which results in an "Unknown GUID" error message during boot. Add the GUID to the list of known but unsupported devices so that the error message is not generated. Other than avoiding the error message, there is no change in guest behavior. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1649818140-100953-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# eaa03d34	28-Mar-2022	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() Following the recommendation in Documentation/memory-barriers.txt for virtual machine guests. Fixes: 8b6a877c060ed ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels") Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 4ee52458	28-Jan-2022	Vitaly Kuznetsov <vkuznets@redhat.com>	Drivers: hv: Compare cpumasks and not their weights in init_vp_index() The condition is supposed to check whether 'allocated_mask' got fully exhausted, i.e. there's no free CPU on the NUMA node left so we have to use one of the already used CPUs. As only bits which correspond to CPUs from 'cpumask_of_node(numa_node)' get set in 'allocated_mask', checking for the equal weights is technically correct but not obvious. Let's compare cpumasks directly. No functional change intended. Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220128103412.3033736-3-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# de96e8a0	28-Jan-2022	Vitaly Kuznetsov <vkuznets@redhat.com>	Drivers: hv: Rename 'alloced' to 'allocated' 'Alloced' is not a real word and only saves us two letters, let's use 'allocated' instead. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220128103412.3033736-2-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 6a27e396	05-Jan-2022	Juan Vazquez <juvazq@linux.microsoft.com>	Drivers: hv: vmbus: Initialize request offers message for Isolation VM Initialize memory of request offers message to be sent to the host so padding or uninitialized fields do not leak guest memory contents. Signed-off-by: Juan Vazquez <juvazq@linux.microsoft.com> Link: https://lore.kernel.org/r/20220105192746.23046-1-juvazq@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 20cf6616	25-Oct-2021	Michael Kelley <mikelley@microsoft.com>	Drivers: hv: vmbus: Remove unused code to check for subchannels The last caller of vmbus_are_subchannels_present() was removed in commit c967590457ca ("scsi: storvsc: Fix a race in sub-channel creation that can cause panic"). Remove this dead code, and the utility function invoke_sc_cb() that it is the only caller of. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1635191674-34407-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 7c9ff3de	16-Jul-2021	Haiyang Zhang <haiyangz@microsoft.com>	Drivers: hv: vmbus: Fix duplicate CPU assignments within a device The vmbus module uses a rotational algorithm to assign target CPUs to a device's channels. Depending on the timing of different device's channel offers, different channels of a device may be assigned to the same CPU. For example on a VM with 2 CPUs, if NIC A and B's channels are offered in the following order, NIC A will have both channels on CPU0, and NIC B will have both channels on CPU1 -- see below. This kind of assignment causes RSS load that is spreading across different channels to end up on the same CPU. Timing of channel offers: NIC A channel 0 NIC B channel 0 NIC A channel 1 NIC B channel 1 VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd} Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd Rel_ID=14, target_cpu=0 Rel_ID=17, target_cpu=0 VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea} Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea Rel_ID=16, target_cpu=1 Rel_ID=18, target_cpu=1 Update the vmbus CPU assignment algorithm to avoid duplicate CPU assignments within a device. The new algorithm iterates num_online_cpus + 1 times. The existing rotational algorithm to find "next NUMA & CPU" is still here. But if the resulting CPU is already used by the same device, it will try the next CPU. In the last iteration, it assigns the channel to the next available CPU like the existing algorithm. This is not normally expected, because during device probe, we limit the number of channels of a device to be <= number of online CPUs. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 77db0ec8	19-Apr-2021	Michael Kelley <mikelley@microsoft.com>	Drivers: hv: vmbus: Increase wait time for VMbus unload When running in Azure, disks may be connected to a Linux VM with read/write caching enabled. If a VM panics and issues a VMbus UNLOAD request to Hyper-V, the response is delayed until all dirty data in the disk cache is flushed. In extreme cases, this flushing can take 10's of seconds, depending on the disk speed and the amount of dirty data. If kdump is configured for the VM, the current 10 second timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD complete message may arrive well after the kdump kernel is already running, causing problems. Note that no problem occurs if kdump is not enabled because Hyper-V waits for the cache flush before doing a reboot through the BIOS/UEFI code. Fix this problem by increasing the timeout in vmbus_wait_for_unload() to 100 seconds. Also output periodic messages so that if anyone is watching the serial console, they won't think the VM is completely hung. Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/1618894089-126662-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 8c2d5e06	19-Apr-2021	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Initialize unload_event statically If a malicious or compromised Hyper-V sends a spurious message of type CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will call complete() on an uninitialized event, and cause an oops. Reported-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210420014350.2002-1-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 870ced05	16-Apr-2021	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Drivers: hv: vmbus: Introduce CHANNELMSG_MODIFYCHANNEL_RESPONSE Introduce the CHANNELMSG_MODIFYCHANNEL_RESPONSE message type, and code to receive and process such a message. Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210416143449.16185-3-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 05e48d89	09-Mar-2021	Vasanth <vasanth3g@gmail.com>	drivers: hv: Fix EXPORT_SYMBOL and tab spaces issue 1.Fixed EXPORT_SYMBOL should be follow immediately function/variable. 2.Fixed code tab spaces issue. Signed-off-by: Vasanth M <vasanth3g@gmail.com> Link: https://lore.kernel.org/r/20210310052155.39460-1-vasanth3g@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 21a4e356	01-Feb-2021	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Restrict vmbus_devices on isolated guests Only the VSCs or ICs that have been hardened and that are critical for the successful adoption of Confidential VMs should be allowed if the guest is running isolated. This change reduces the footprint of the code that will be exercised by Confidential VMs and hence the exposure to bugs and vulnerabilities. Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210201144814.2701-3-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# e4d221b4	09-Dec-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Resolve race condition in vmbus_onoffer_rescind() An erroneous or malicious host could send multiple rescind messages for a same channel. In vmbus_onoffer_rescind(), the guest maps the channel ID to obtain a pointer to the channel object and it eventually releases such object and associated data. The host could time rescind messages and lead to an use-after-free. Add a new flag to the channel structure to make sure that only one instance of vmbus_onoffer_rescind() can get the reference to the channel object. Reported-by: Juan Vazquez <juvazq@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20201209070827.29335-6-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# e3fa4b74	09-Dec-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind() When channel->device_obj is non-NULL, vmbus_onoffer_rescind() could invoke put_device(), that will eventually release the device and free the channel object (cf. vmbus_device_release()). However, a pointer to the object is dereferenced again later to load the primary_channel. The use-after-free can be avoided by noticing that this load/check is redundant if device_obj is non-NULL: primary_channel must be NULL if device_obj is non-NULL, cf. vmbus_add_channel_work(). Fixes: 54a66265d6754b ("Drivers: hv: vmbus: Fix rescind handling") Reported-by: Juan Vazquez <juvazq@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20201209070827.29335-5-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 06caa778	09-Nov-2020	Andres Beltran <lkmlabelt@gmail.com>	hv_utils: Add validation for untrusted Hyper-V values For additional robustness in the face of Hyper-V errors or malicious behavior, validate all values that originate from packets that Hyper-V has sent to the guest in the host-to-guest ring buffer. Ensure that invalid values cannot cause indexing off the end of the icversion_data array in vmbus_prep_negotiate_resp(). Signed-off-by: Andres Beltran <lkmlabelt@gmail.com> Co-developed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20201109100704.9152-1-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 911e1987	13-Sep-2020	Michael Kelley <mikelley@microsoft.com>	Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload vmbus_wait_for_unload() looks for a CHANNELMSG_UNLOAD_RESPONSE message coming from Hyper-V. But if the message isn't found for some reason, the panic path gets hung forever. Add a timeout of 10 seconds to prevent this. Fixes: 415719160de3 ("Drivers: hv: vmbus: avoid scheduling in interrupt context in vmbus_initiate_unload()") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/1600026449-23651-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 775f43fa	17-Jun-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Remove the lock field from the vmbus_channel struct The spinlock is (now) *not used to protect test-and-set accesses to attributes of the structure or sc_list operations. There is, AFAICT, a distinct lack of {WRITE,READ}_ONCE()s in the handling of channel->state, but the changes below do not seem to make things "worse". ;-) Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200617164642.37393-9-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 8a99e501	17-Jun-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Remove unnecessary channel->lock critical sections (sc_list updaters) None of the readers/updaters of sc_list rely on channel->lock for synchronization. Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200617164642.37393-7-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 458d090f	17-Jun-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Remove the numa_node field from the vmbus_channel struct The field is read only in numa_node_show() and it is already stored twice (after a call to cpu_to_node()) in target_cpu_store() and init_vp_index(); there is no need to "cache" its value in the channel data structure. Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200617164642.37393-3-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 5bf74682	17-Jun-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Remove the target_vp field from the vmbus_channel struct The field is read only in __vmbus_open() and it is already stored twice (after a call to hv_cpu_number_to_vp_number()) in target_cpu_store() and init_vp_index(); there is no need to "cache" its value in the channel data structure. Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200617164642.37393-2-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# afaa33da	22-May-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Resolve more races involving init_vp_index() init_vp_index() uses the (per-node) hv_numa_map[] masks to record the CPUs allocated for channel interrupts at a given time, and distribute the performance-critical channels across the available CPUs: in part., the mask of "candidate" target CPUs in a given NUMA node, for a newly offered channel, is determined by XOR-ing the node's CPU mask and the node's hv_numa_map. This operation/mechanism assumes that no offline CPUs is set in the hv_numa_map mask, an assumption that does not hold since such mask is currently not updated when a channel is removed or assigned to a different CPU. To address the issues described above, this adds hooks in the channel removal path (hv_process_channel_removal()) and in target_cpu_store() in order to clear, resp. to update, the hv_numa_map[] masks as needed. This also adds a (missed) update of the masks in init_vp_index() (cf., e.g., the memory-allocation failure path in this function). Like in the case of init_vp_index(), such hooks require to determine if the given channel is performance critical. init_vp_index() does this by parsing the channel's offer, it can not rely on the device data structure (device_obj) to retrieve such information because the device data structure has not been allocated/linked with the channel by the time that init_vp_index() executes. A similar situation may hold in hv_is_alloced_cpu() (defined below); the adopted approach is to "cache" the device type of the channel, as computed by parsing the channel's offer, in the channel structure itself. Fixes: 7527810573436f ("Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type") Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200522171901.204127-3-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# a949e86c	22-May-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug vmbus_process_offer() does two things (among others): 1) first, it sets the channel's target CPU with cpu_hotplug_lock; 2) it then adds the channel to the channel list(s) with channel_mutex. Since cpu_hotplug_lock is released before (2), the channel's target CPU (as designated in (1)) can be deemed "free" by hv_synic_cleanup() and go offline before the channel is added to the list. Fix the race condition by "extending" the cpu_hotplug_lock critical section to include (2) (and (1)), nesting the channel_mutex critical section within the cpu_hotplug_lock critical section as done elsewhere (hv_synic_cleanup(), target_cpu_store()) in the hyperv drivers code. Move even further by extending the channel_mutex critical section to include (1) (and (2)): this change allows to remove (the now redundant) bind_channel_to_cpu_lock, and generally simplifies the handling of the target CPUs (that are now always modified with channel_mutex held). Fixes: d570aec0f2154e ("Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug") Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200522171901.204127-2-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 677b0ce5	14-Apr-2020	Colin Ian King <colin.king@canonical.com>	drivers: hv: remove redundant assignment to pointer primary_channel The pointer primary_channel is being assigned with a value that is never used. The assignment is redundant and can be removed. Move the definition of primary_channel to a narrower scope. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200414152343.243166-1-colin.king@canonical.com [ wei: move primary_channel and update commit message ] Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 75278105	05-Apr-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type VMBus version 4.1 and later support the CHANNELMSG_MODIFYCHANNEL(22) message type which can be used to request Hyper-V to change the vCPU that a channel will interrupt. Introduce the CHANNELMSG_MODIFYCHANNEL message type, and define the vmbus_send_modifychannel() function to send CHANNELMSG_MODIFYCHANNEL requests to the host via a hypercall. The function is then used to define a sysfs "store" operation, which allows to change the (v)CPU the channel will interrupt by using the sysfs interface. The feature can be used for load balancing or other purposes. One interesting catch here is that Hyper-V can not currently ACK CHANNELMSG_MODIFYCHANNEL messages with the promise that (after the ACK is sent) the channel won't send any more interrupts to the "old" CPU. The peculiarity of the CHANNELMSG_MODIFYCHANNEL messages is problematic if the user want to take a CPU offline, since we don't want to take a CPU offline (and, potentially, "lose" channel interrupts on such CPU) if the host is still processing a CHANNELMSG_MODIFYCHANNEL message associated to that CPU. It is worth mentioning, however, that we have been unable to observe the above mentioned "race": in all our tests, CHANNELMSG_MODIFYCHANNEL requests appeared as if they were processed synchronously by the host. Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200406001514.19876-11-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> [ wei: fix conflict in channel_mgmt.c ] Signed-off-by: Wei Liu <wei.liu@kernel.org>
# d570aec0	05-Apr-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug init_vp_index() may access the cpu_online_mask mask via its calls of cpumask_of_node(). Make sure to protect these accesses with a cpus_read_lock() critical section. Also, remove some (hardcoded) instances of CPU(0) from init_vp_index() and replace them with VMBUS_CONNECT_CPU. The connect CPU can not go offline, since Hyper-V does not provide a way to change it. Finally, order the accesses of target_cpu from init_vp_index() and hv_synic_cleanup() by relying on the channel_mutex; this is achieved by moving the call of init_vp_index() into vmbus_process_offer(). Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200406001514.19876-10-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 8ef4c4ab	05-Apr-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic The logic is unused since commit 509879bdb30b8 ("Drivers: hv: Introduce a policy for controlling channel affinity"). This logic assumes that a channel target_cpu doesn't change during the lifetime of a channel, but this assumption is incompatible with the new functionality that allows changing the vCPU a channel will interrupt. Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200406001514.19876-9-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 9403b66e	05-Apr-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Since vmbus_chan_sched() dereferences the ring buffer pointer, we have to make sure that the ring buffer data structures don't get freed while such dereferencing is happening. Current code does this by sending an IPI to the CPU that is allowed to access that ring buffer from interrupt level, cf., vmbus_reset_channel_cb(). But with the new functionality to allow changing the CPU that a channel will interrupt, we can't be sure what CPU will be running the vmbus_chan_sched() function for a particular channel, so the current IPI mechanism is infeasible. Instead synchronize vmbus_chan_sched() and vmbus_reset_channel_cb() by using the (newly introduced) per-channel spin lock "sched_lock". Move the test for onchannel_callback being NULL before the "switch" control statement in vmbus_chan_sched(), in order to not access the ring buffer if the vmbus_reset_channel_cb() has been completed on the channel. Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200406001514.19876-7-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 8b6a877c	05-Apr-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels When Hyper-V sends an interrupt to the guest, the guest has to figure out which channel the interrupt is associated with. Hyper-V sets a bit in a memory page that is shared with the guest, indicating a particular "relid" that the interrupt is associated with. The current Linux code then uses a set of per-CPU linked lists to map a given "relid" to a pointer to a channel structure. This design introduces a synchronization problem if the CPU that Hyper-V will interrupt for a certain channel is changed. If the interrupt comes on the "old CPU" and the channel was already moved to the per-CPU list of the "new CPU", then the relid -> channel mapping will fail and the interrupt is dropped. Similarly, if the interrupt comes on the new CPU but the channel was not moved to the per-CPU list of the new CPU, then the mapping will fail and the interrupt is dropped. Relids are integers ranging from 0 to 2047. The mapping from relids to channel structures can be done by setting up an array with 2048 entries, each entry being a pointer to a channel structure (hence total size ~16K bytes, which is not a problem). The array is global, so there are no per-CPU linked lists to update. The array can be searched and updated by loading from/storing to the array at the specified index. With no per-CPU data structures, the above mentioned synchronization problem is avoided and the relid2channel() function gets simpler. Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200406001514.19876-4-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# b9fa1b87	05-Apr-2020	Andrea Parri (Microsoft) <parri.andrea@gmail.com>	Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU The offer and rescind works are currently scheduled on the so called "connect CPU". However, this is not really needed: we can synchronize the works by relying on the usage of the offer_in_progress counter and of the channel_mutex mutex. This synchronization is already in place. So, remove this unnecessary "bind to the connect CPU" constraint and update the inline comments accordingly. Suggested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20200406001514.19876-3-parri.andrea@gmail.com Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 52c7803f	05-Apr-2020	Vitaly Kuznetsov <vkuznets@redhat.com>	Drivers: hv: check VMBus messages lengths VMBus message handlers (channel_message_table) receive a pointer to 'struct vmbus_channel_message_header' and cast it to a structure of their choice, which is sometimes longer than the header. We, however, don't check that the message is long enough so in case hypervisor screws up we'll be accessing memory beyond what was allocated for temporary buffer. Previously, we used to always allocate and copy 256 bytes from message page to temporary buffer but this is hardly better: in case the message is shorter than we expect we'll be trying to consume garbage as some real data and no memory guarding technique will be able to identify an issue. Introduce 'min_payload_len' to 'struct vmbus_channel_message_table_entry' and check against it in vmbus_on_msg_dpc(). Note, we can't require the exact length as new hypervisor versions may add extra fields to messages, we only check that the message is not shorter than we expect. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200406104326.45361-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
# 5cc41500	05-Apr-2020	Vitaly Kuznetsov <vkuznets@redhat.com>	Drivers: hv: avoid passing opaque pointer to vmbus_onmessage() vmbus_onmessage() doesn't need the header of the message, it only uses it to get to the payload, we can pass the pointer to the payload directly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20200406104154.45010-4-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>