#
2125c003 |
|
04-Apr-2024 |
Waiman Long <longman@redhat.com> |
cgroup/cpuset: Make cpuset hotplug processing synchronous Since commit 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()"), cpuset hotplug was done asynchronously via a work function. This is to avoid recursive locking of cgroup_mutex. Since then, the cgroup locking scheme has changed quite a bit. A cpuset_mutex was introduced to protect cpuset specific operations. The cpuset_mutex is then replaced by a cpuset_rwsem. With commit d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order"), cpu_hotplug_lock is acquired before cpuset_rwsem. Later on, cpuset_rwsem is reverted back to cpuset_mutex. All these locking changes allow the hotplug code to call into cpuset core directly. The following commits were also merged due to the asynchronous nature of cpuset hotplug processing. - commit b22afcdf04c9 ("cpu/hotplug: Cure the cpusets trainwreck") - commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs") - commit 28b89b9e6f7b ("cpuset: handle race between CPU hotplug and cpuset_hotplug_work") Clean up all these bandages by making cpuset hotplug processing synchronous again with the exception that the call to cgroup_transfer_tasks() to transfer tasks out of an empty cgroup v1 cpuset, if necessary, will still be done via a work function due to the existing cgroup_mutex -> cpu_hotplug_lock dependency. It is possible to reverse that dependency, but that will require updating a number of different cgroup controllers. This special hotplug code path should be rarely taken anyway. As all the cpuset states will be updated by the end of the hotplug operation, we can revert most the above commits except commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs") which is partially reverted. Also removing some cpus_read_lock trylock attempts in the cpuset partition code as they are no longer necessary since the cpu_hotplug_lock is now held for the whole duration of the cpuset hotplug code path. Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
ce0abef6 |
|
19-Apr-2024 |
Sean Christopherson <seanjc@google.com> |
cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n Explicitly disallow enabling mitigations at runtime for kernels that were built with CONFIG_CPU_MITIGATIONS=n, as some architectures may omit code entirely if mitigations are disabled at compile time. E.g. on x86, a large pile of Kconfigs are buried behind CPU_MITIGATIONS, and trying to provide sane behavior for retroactively enabling mitigations is extremely difficult, bordering on impossible. E.g. page table isolation and call depth tracking require build-time support, BHI mitigations will still be off without additional kernel parameters, etc. [ bp: Touchups. ] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240420000556.2645001-3-seanjc@google.com
|
#
fe42754b |
|
19-Apr-2024 |
Sean Christopherson <seanjc@google.com> |
cpu: Re-enable CPU mitigations by default for !X86 architectures Rename x86's to CPU_MITIGATIONS, define it in generic code, and force it on for all architectures exception x86. A recent commit to turn mitigations off by default if SPECULATION_MITIGATIONS=n kinda sorta missed that "cpu_mitigations" is completely generic, whereas SPECULATION_MITIGATIONS is x86-specific. Rename x86's SPECULATIVE_MITIGATIONS instead of keeping both and have it select CPU_MITIGATIONS, as having two configs for the same thing is unnecessary and confusing. This will also allow x86 to use the knob to manage mitigations that aren't strictly related to speculative execution. Use another Kconfig to communicate to common code that CPU_MITIGATIONS is already defined instead of having x86's menu depend on the common CPU_MITIGATIONS. This allows keeping a single point of contact for all of x86's mitigations, and it's not clear that other architectures *want* to allow disabling mitigations at compile-time. Fixes: f337a6a21e2f ("x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n") Closes: https://lkml.kernel.org/r/20240413115324.53303a68%40canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240420000556.2645001-2-seanjc@google.com
|
#
f337a6a2 |
|
09-Apr-2024 |
Sean Christopherson <seanjc@google.com> |
x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly states that disabling SPECULATION_MITIGATIONS is supposed to turn off all mitigations by default. │ If you say N, all mitigations will be disabled. You really │ should know what you are doing to say so. As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n. Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240409175108.1512861-2-seanjc@google.com
|
#
4c8a4985 |
|
27-Feb-2024 |
Ingo Molnar <mingo@kernel.org> |
smp: Avoid 'setup_max_cpus' namespace collision/shadowing bringup_nonboot_cpus() gets passed the 'setup_max_cpus' variable in init/main.c - which is also the name of the parameter, shadowing the name. To reduce confusion and to allow the 'setup_max_cpus' value to be #defined in the <linux/smp.h> header, use the 'max_cpus' name for the function parameter name. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org
|
#
500f8f9b |
|
25-Feb-2024 |
Frederic Weisbecker <frederic@kernel.org> |
tick: Assume timekeeping is correctly handed over upon last offline idle call The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after. Therefore it's guaranteed that the current CPU isn't the timekeeper upon its last call to idle. Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes into idle suggests that the tick is going to be stopped while it is actually stopped already from the appropriate CPU hotplug state. Remove the confusing call and the obsolete case handling and convert it to a sanity check that verifies the above assumption. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-16-frederic@kernel.org
|
#
ef8969bb |
|
25-Feb-2024 |
Frederic Weisbecker <frederic@kernel.org> |
tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING The broadcast shutdown code is executed through a random explicit call within stop machine from the outgoing CPU. However the tick broadcast is a midware between the tick callback and the clocksource, therefore it makes more sense to shut it down after the tick callback and before the clocksource drivers. Move it instead to the common tick shutdown CPU hotplug state where related operations can be ordered from highest to lowest level. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-10-frederic@kernel.org
|
#
3ad6eb06 |
|
25-Feb-2024 |
Frederic Weisbecker <frederic@kernel.org> |
tick: Start centralizing tick related CPU hotplug operations During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks on stop machine, sometimes through explicit calls, sometimes from the control CPU after the CPU died. The reason why these shutdown operations are spread around is not always clear and it makes the tick lifecycle hard to follow. The tick should be shut down in order from highest to lowest level: On stop machine from the dying CPU (high-level): 1) Hand-over the timekeeping duty (tick_handover_do_timer()) 2) Cancel the tick implementation called by the clockevent callback (tick_cancel_sched_timer()) 3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline()) On stop machine from the dying CPU (low-level): 4) Shutdown clockevents drivers (CPUHP_AP_*_TIMER_STARTING states) From the control CPU after the CPU died (low-level): 5) Shutdown/unregister/cleanup clockevents for the dead CPU (tick_cleanup_dead_cpu()) Instead the current order is 2, 4 (both from CPU hotplug states), then 1 and 3 through direct calls. This layout and order don't make much sense. The operations 1, 2, 3 should be gathered together and in order. Sort this situation with creating a new TICK shut-down CPU hotplug state and start with introducing the timekeeping duty hand-over there. The state must precede hrtimers migration because the tick hrtimer will be stopped from it in a further patch. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-8-frederic@kernel.org
|
#
266e9578 |
|
21-Feb-2024 |
Max Kellermann <max.kellermann@ionos.com> |
cpu: Remove stray semicolon This syntax error was introduced by commit da92df490eea ("cpu: Mark cpu_possible_mask as __ro_after_init"). Fixes: da92df490eea ("cpu: Mark cpu_possible_mask as __ro_after_init") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240222114727.1144588-1-max.kellermann@ionos.com
|
#
da92df49 |
|
05-Feb-2024 |
Alexey Dobriyan <adobriyan@gmail.com> |
cpu: Mark cpu_possible_mask as __ro_after_init cpu_possible_mask is by definition "cpus which could be hotplugged without reboot". It's a property which is fixed after kernel enumerates the hardware configuration. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/41cd78af-92a3-4f23-8c7a-4316a04a66d8@p183
|
#
effe6d27 |
|
15-Jan-2024 |
Li Zhijian <lizhijian@fujitsu.com> |
kernel/cpu: Convert snprintf() to sysfs_emit() Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). No functional change intended. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240116045151.3940401-40-lizhijian@fujitsu.com
|
#
ef7e585b |
|
13-Jan-2024 |
Randy Dunlap <rdunlap@infradead.org> |
cpu/hotplug: Delete an extraneous kernel-doc description struct cpuhp_cpu_state has an extraneous kernel-doc comment for @cpu. There is no struct member by that name, so remove the comment to prevent the kernel-doc warning: kernel/cpu.c:85: warning: Excess struct member 'cpu' description in 'cpuhp_cpu_state' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240114030615.30441-1-rdunlap@infradead.org
|
#
70da1d01 |
|
02-Oct-2023 |
Vlastimil Babka <vbabka@suse.cz> |
cpu/hotplug: remove CPUHP_SLAB_PREPARE hooks The CPUHP_SLAB_PREPARE hooks are only used by SLAB which is removed. SLUB defines them as NULL, so we can remove those altogether. Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
#
5c0930cc |
|
07-Nov-2023 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimers: Push pending hrtimers away from outgoing CPU earlier 2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") solved the straight forward CPU hotplug deadlock vs. the scheduler bandwidth timer. Yu discovered a more involved variant where a task which has a bandwidth timer started on the outgoing CPU holds a lock and then gets throttled. If the lock required by one of the CPU hotplug callbacks the hotplug operation deadlocks because the unthrottling timer event is not handled on the dying CPU and can only be recovered once the control CPU reaches the hotplug state which pulls the pending hrtimers from the dead CPU. Solve this by pushing the hrtimers away from the dying CPU in the dying callbacks. Nothing can queue a hrtimer on the dying CPU at that point because all other CPUs spin in stop_machine() with interrupts disabled and once the operation is finished the CPU is marked offline. Reported-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liu Tie <liutie4@huawei.com> Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
|
#
cf8e8658 |
|
20-Oct-2022 |
Ard Biesheuvel <ardb@kernel.org> |
arch: Remove Itanium (IA-64) architecture The Itanium architecture is obsolete, and an informal survey [0] reveals that any residual use of Itanium hardware in production is mostly HP-UX or OpenVMS based. The use of Linux on Itanium appears to be limited to enthusiasts that occasionally boot a fresh Linux kernel to see whether things are still working as intended, and perhaps to churn out some distro packages that are rarely used in practice. None of the original companies behind Itanium still produce or support any hardware or software for the architecture, and it is listed as 'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers that contributed on behalf of those companies (nor anyone else, for that matter) have been willing to support or maintain the architecture upstream or even be responsible for applying the odd fix. The Intel firmware team removed all IA-64 support from the Tianocore/EDK2 reference implementation of EFI in 2018. (Itanium is the original architecture for which EFI was developed, and the way Linux supports it deviates significantly from other architectures.) Some distros, such as Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have dropped support years ago. While the argument is being made [1] that there is a 'for the common good' angle to being able to build and run existing projects such as the Grid Community Toolkit [2] on Itanium for interoperability testing, the fact remains that none of those projects are known to be deployed on Linux/ia64, and very few people actually have access to such a system in the first place. Even if there were ways imaginable in which Linux/ia64 could be put to good use today, what matters is whether anyone is actually doing that, and this does not appear to be the case. There are no emulators widely available, and so boot testing Itanium is generally infeasible for ordinary contributors. GCC still supports IA-64 but its compile farm [3] no longer has any IA-64 machines. GLIBC would like to get rid of IA-64 [4] too because it would permit some overdue code cleanups. In summary, the benefits to the ecosystem of having IA-64 be part of it are mostly theoretical, whereas the maintenance overhead of keeping it supported is real. So let's rip off the band aid, and remove the IA-64 arch code entirely. This follows the timeline proposed by the Debian/ia64 maintainer [5], which removes support in a controlled manner, leaving IA-64 in a known good state in the most recent LTS release. Other projects will follow once the kernel support is removed. [0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/ [1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/ [2] https://gridcf.org/gct-docs/latest/index.html [3] https://cfarm.tetaneutral.net/machines/list/ [4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/ [5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/ Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
#
38685e2a |
|
17-Oct-2023 |
Ran Xiaokai <ran.xiaokai@zte.com.cn> |
cpu/hotplug: Don't offline the last non-isolated CPU If a system has isolated CPUs via the "isolcpus=" command line parameter, then an attempt to offline the last housekeeping CPU will result in a WARN_ON() when rebuilding the scheduler domains and a subsequent panic due to and unhandled empty CPU mas in partition_sched_domains_locked(). cpuset_hotplug_workfn() rebuild_sched_domains_locked() ndoms = generate_sched_domains(&doms, &attr); cpumask_and(doms[0], top_cpuset.effective_cpus, housekeeping_cpumask(HK_FLAG_DOMAIN)); Thus results in an empty CPU mask which triggers the warning and then the subsequent crash: WARNING: CPU: 4 PID: 80 at kernel/sched/topology.c:2366 build_sched_domains+0x120c/0x1408 Call trace: build_sched_domains+0x120c/0x1408 partition_sched_domains_locked+0x234/0x880 rebuild_sched_domains_locked+0x37c/0x798 rebuild_sched_domains+0x30/0x58 cpuset_hotplug_workfn+0x2a8/0x930 Unable to handle kernel paging request at virtual address fffe80027ab37080 partition_sched_domains_locked+0x318/0x880 rebuild_sched_domains_locked+0x37c/0x798 Aside of the resulting crash, it does not make any sense to offline the last last housekeeping CPU. Prevent this by masking out the non-housekeeping CPUs when selecting a target CPU for initiating the CPU unplug operation via the work queue. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/202310171709530660462@zte.com.cn
|
#
d91bdd96 |
|
14-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/SMT: Make SMT control more robust against enumeration failures The SMT control mechanism got added as speculation attack vector mitigation. The implemented logic relies on the primary thread mask to be set up properly. This turns out to be an issue with XEN/PV guests because their CPU hotplug mechanics do not enumerate APICs and therefore the mask is never correctly populated. This went unnoticed so far because by chance XEN/PV ends up with smp_num_siblings == 2. So smt_hotplug_control stays at its default value CPU_SMT_ENABLED and the primary thread mask is never evaluated in the context of CPU hotplug. This stopped "working" with the upcoming overhaul of the topology evaluation which legitimately provides a fake topology for XEN/PV. That sets smp_num_siblings to 1, which causes the core CPU hot-plug core to refuse to bring up the APs. This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED which causes cpu_smt_allowed() to evaluate the unpopulated primary thread mask with the conclusion that all non-boot CPUs are not valid to be plugged. Make cpu_smt_allowed() more robust and take CPU_SMT_NOT_SUPPORTED and CPU_SMT_NOT_IMPLEMENTED into account. Rename it to cpu_bootable() while at it as that makes it more clear what the function is about. The primary mask issue on x86 XEN/PV needs to be addressed separately as there are users outside of the CPU hotplug code too. Fixes: 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT") Reported-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230814085112.149440843@linutronix.de
|
#
a28ab03b |
|
08-Sep-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP The callbacks migration is performed through an explicit call from the hotplug control CPU right after the death of the target CPU and before proceeding with the CPUHP_ teardown functions. This is unusual but necessary and yet uncommented. Summarize the reason as explained in the changelog of: a58163d8ca2c (rcu: Migrate callbacks earlier in the CPU-offline timeline) Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
448e9f34 |
|
08-Sep-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Standardize explicit CPU-hotplug calls rcu_report_dead() and rcutree_migrate_callbacks() have their headers in rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug functions. Also rcu_cpu_starting() and rcu_report_dead() have different naming conventions while they mirror each other's effects. Fix the headers and propose a naming that relates both functions and aligns with the prefix of other rcutree CPU-hotplug functions. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
c964c1f5 |
|
08-Sep-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Assume rcu_report_dead() is always called locally rcu_report_dead() has to be called locally by the CPU that is going to exit the RCU state machine. Passing a cpu argument here is error-prone and leaves the possibility for a racy remote call. Use local access instead. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
|
#
2b8272ff |
|
23-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Prevent self deadlock on CPU hot-unplug Xiongfeng reported and debugged a self deadlock of the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. CPU1 CPU2 T1 sets cfs_quota starts hrtimer cfs_bandwidth 'period_timer' T1 is migrated to CPU2 T1 initiates offlining of CPU1 Hotplug operation starts ... 'period_timer' expires and is re-enqueued on CPU1 ... take_cpu_down() CPU1 shuts down and does not handle timers anymore. They have to be migrated in the post dead hotplug steps by the control task. T1 runs the post dead offline operation T1 is scheduled out T1 waits for 'period_timer' to expire T1 waits there forever if it is scheduled out before it can execute the hrtimer offline callback hrtimers_dead_cpu(). Cure this by delegating the hotplug control operation to a worker thread on an online CPU. This takes the initiating user space task, which might be affected by the bandwidth timer, completely out of the picture. Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Yu Liao <liaoyu15@huawei.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
|
#
52b38b7a |
|
28-Jul-2023 |
Zhang Rui <rui.zhang@intel.com> |
cpu/SMT: Fix cpu_smt_possible() comment Commit e1572f1d08be ("cpu/SMT: create and export cpu_smt_possible()") introduces cpu_smt_possible() to represent if SMT is theoretically possible. It returns true when SMT is supported and not forcefully disabled ('nosmt=force'). But the comment of it says "Returns true if SMT is not supported of forcefully (irreversibly) disabled", which is wrong. Fix that comment accordingly. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230728155313.44170-1-rui.zhang@intel.com
|
#
7f48405c |
|
05-Jul-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
cpu/SMT: Allow enabling partial SMT states via sysfs Add support to the /sys/devices/system/cpu/smt/control interface for enabling a specified number of SMT threads per core, including partial SMT states where not all threads are brought online. The current interface accepts "on" and "off", to enable either 1 or all SMT threads per core. This commit allows writing an integer, between 1 and the number of SMT threads supported by the machine. Writing 1 is a synonym for "off", 2 or more enables SMT with the specified number of threads. When reading the file, if all threads are online "on" is returned, to avoid changing behaviour for existing users. If some other number of threads is online then the integer value is returned. Architectures like x86 only supporting 1 thread or all threads, should not define CONFIG_SMT_NUM_THREADS_DYNAMIC. Architecture supporting partial SMT states, like PowerPC, should define it. [ ldufour: Slightly reword the commit's description ] [ ldufour: Remove switch() in __store_smt_control() ] [ ldufour: Rix build issue in control_show() ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20230705145143.40545-8-ldufour@linux.ibm.com
|
#
38253464 |
|
05-Jul-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
cpu/SMT: Create topology_smt_thread_allowed() Some architectures allows partial SMT states, i.e. when not all SMT threads are brought online. To support that, add an architecture helper which checks whether a given CPU is allowed to be brought online depending on how many SMT threads are currently enabled. Since this is only applicable to architecture supporting partial SMT, only these architectures should select the new configuration variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not supporting the partial SMT states, there is no need to define topology_cpu_smt_allowed(), the generic code assumed that all the threads are allowed or only the primary ones. Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is enabled, to check if the particular thread should be onlined. Notably, also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow offlining some threads to move from a higher to lower number of threads online. [ ldufour: Slightly reword the commit's description ] [ ldufour: Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20230705145143.40545-7-ldufour@linux.ibm.com
|
#
91b4a7db |
|
05-Jul-2023 |
Laurent Dufour <ldufour@linux.ibm.com> |
cpu/SMT: Remove topology_smt_supported() Since the maximum number of threads is now passed to cpu_smt_set_num_threads(), checking that value is enough to know whether SMT is supported. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20230705145143.40545-6-ldufour@linux.ibm.com
|
#
447ae4ac |
|
05-Jul-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
cpu/SMT: Store the current/max number of threads Some architectures allow partial SMT states at boot time, ie. when not all SMT threads are brought online. To support that the SMT code needs to know the maximum number of SMT threads, and also the currently configured number. The architecture code knows the max number of threads, so have the architecture code pass that value to cpu_smt_set_num_threads(). Note that although topology_max_smt_threads() exists, it is not configured early enough to be used here. As architecture, like PowerPC, allows the threads number to be set through the kernel command line, also pass that value. [ ldufour: Slightly reword the commit message ] [ ldufour: Rename cpu_smt_check_topology and add a num_threads argument ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20230705145143.40545-5-ldufour@linux.ibm.com
|
#
c53361ce |
|
05-Jul-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
cpu/SMT: Move smt/control simple exit cases earlier Move the simple exit cases, i.e. those which don't depend on the value written, earlier in the function. That makes it clearer that regardless of the input those states cannot be transitioned out of. That does have a user-visible effect, in that the error returned will now always be EPERM/ENODEV for those states, regardless of the value written. Previously writing an invalid value would return EINVAL even when in those states. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20230705145143.40545-4-ldufour@linux.ibm.com
|
#
3f916919 |
|
05-Jul-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
cpu/SMT: Move SMT prototypes into cpu_smt.h In order to export the cpuhp_smt_control enum as part of the interface between generic and architecture code, the architecture code needs to include asm/topology.h. But that leads to circular header dependencies. So split the enum and related declarations into a separate header. [ ldufour: Reworded the commit's description ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20230705145143.40545-3-ldufour@linux.ibm.com
|
#
7a4dcb4a |
|
05-Jul-2023 |
Laurent Dufour <ldufour@linux.ibm.com> |
cpu/hotplug: Remove dependancy against cpu_primary_thread_mask The commit 18415f33e2ac ("cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE") introduce a dependancy against a global variable cpu_primary_thread_mask exported by the X86 code. This variable is only used when CONFIG_HOTPLUG_PARALLEL is set. Since cpuhp_get_primary_thread_mask() and cpuhp_smt_aware() are only used when CONFIG_HOTPLUG_PARALLEL is set, don't define them when it is not set. No functional change. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20230705145143.40545-2-ldufour@linux.ibm.com
|
#
06c6796e |
|
22-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Fix off by one in cpuhp_bringup_mask() cpuhp_bringup_mask() iterates over a cpumask and starts all present CPUs up to a caller provided upper limit. The limit variable is decremented and checked for 0 before invoking cpu_up(), which is obviously off by one and prevents the bringup of the last CPU when the limit is equal to the number of present CPUs. Move the decrement and check after the cpu_up() invocation. Fixes: 18415f33e2ac ("cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE") Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/87wn10ufj9.ffs@tglx
|
#
18415f33 |
|
12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE There is often significant latency in the early stages of CPU bringup, and time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and then waiting for it to respond before moving on to the next. Allow a platform to enable parallel setup which brings all to be onlined CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the control CPU (BP) is single-threaded the important part is the last state CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up. This allows the CPUs to run up to the first sychronization point cpuhp_ap_sync_alive() where they wait for the control CPU to release them one by one for the full onlining procedure. This parallelism depends on the CPU hotplug core sync mechanism which ensures that the parallel brought up CPUs wait for release before touching any state which would make the CPU visible to anything outside the hotplug control mechanism. To handle the SMT constraints of X86 correctly the bringup happens in two iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up the primary SMT threads of each core first, which can load the microcode without the need to rendevouz with the thread siblings. Once that's completed it brings up the secondary SMT threads. Co-developed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.240231377@linutronix.de
|
#
a631be92 |
|
12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism The bring up logic of a to be onlined CPU consists of several parts, which are considered to be a single hotplug state: 1) Control CPU issues the wake-up 2) To be onlined CPU starts up, does the minimal initialization, reports to be alive and waits for release into the complete bring-up. 3) Control CPU waits for the alive report and releases the upcoming CPU for the complete bring-up. Allow to split this into two states: 1) Control CPU issues the wake-up After that the to be onlined CPU starts up, does the minimal initialization, reports to be alive and waits for release into the full bring-up. As this can run after the control CPU dropped the hotplug locks the code which is executed on the AP before it reports alive has to be carefully audited to not violate any of the hotplug constraints, especially not modifying any of the various cpumasks. This is really only meant to avoid waiting for the AP to react on the wake-up. Of course an architecture can move strict CPU related setup functionality, e.g. microcode loading, with care before the synchronization point to save further pointless waiting time. 2) Control CPU waits for the alive report and releases the upcoming CPU for the complete bring-up. This allows that the two states can be split up to run all to be onlined CPUs up to state #1 on the control CPU and then at a later point run state #2. This spares some of the latencies of the full serialized per CPU bringup by avoiding the per CPU wakeup/wait serialization. The assumption is that the first AP already waits when the last AP has been woken up. This obvioulsy depends on the hardware latencies and depending on the timings this might still not completely eliminate all wait scenarios. This split is just a preparatory step for enabling the parallel bringup later. The boot time bringup is still fully serialized. It has a separate config switch so that architectures which want to support parallel bringup can test the split of the CPUHP_BRINGUG step separately. To enable this the architecture must support the CPU hotplug core sync mechanism and has to be audited that there are no implicit hotplug state dependencies which require a fully serialized bringup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.080801387@linutronix.de
|
#
6d712b9b |
|
12-May-2023 |
David Woodhouse <dwmw@amazon.co.uk> |
cpu/hotplug: Reset task stack state in _cpu_up() Commit dce1ca0525bf ("sched/scs: Reset task stack state in bringup_cpu()") ensured that the shadow call stack and KASAN poisoning were removed from a CPU's stack each time that CPU is brought up, not just once. This is not incorrect. However, with parallel bringup the idle thread setup will happen at a different step. As a consequence the cleanup in bringup_cpu() would be too late. Move the SCS/KASAN cleanup to the generic _cpu_up() function instead, which already ensures that the new CPU's stack is available, purely to allow for early failure. This occurs when the CPU to be brought up is in the CPUHP_OFFLINE state, which should correctly do the cleanup any time the CPU has been taken down to the point where such is needed. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.027075560@linutronix.de
|
#
6f062123 |
|
12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Add CPU state tracking and synchronization The CPU state tracking and synchronization mechanism in smpboot.c is completely independent of the hotplug code and all logic around it is implemented in architecture specific code. Except for the state reporting of the AP there is absolutely nothing architecture specific and the sychronization and decision functions can be moved into the generic hotplug core code. Provide an integrated variant and add the core synchronization and decision points. This comes in two flavours: 1) DEAD state synchronization Updated by the architecture code once the AP reaches the point where it is ready to be torn down by the control CPU, e.g. by removing power or clocks or tear down via the hypervisor. The control CPU waits for this state to be reached with a timeout. If the state is reached an architecture specific cleanup function is invoked. 2) Full state synchronization This extends #1 with AP alive synchronization. This is new functionality, which allows to replace architecture specific wait mechanims, e.g. cpumasks, completely. It also prevents that an AP which is in a limbo state can be brought up again. This can happen when an AP failed to report dead state during a previous off-line operation. The dead synchronization is what most architectures use. Only x86 makes a bringup decision based on that state at the moment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205256.476305035@linutronix.de
|
#
22b612e2 |
|
12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Rework sparse_irq locking in bringup_cpu() There is no harm to hold sparse_irq lock until the upcoming CPU completes in cpuhp_online_idle(). This allows to remove cpu_online() synchronization from architecture code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205256.263722880@linutronix.de
|
#
ba831b7b |
|
12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init No point in keeping them around. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205255.551974164@linutronix.de
|
#
aa464ba9 |
|
03-Feb-2023 |
Nicholas Piggin <npiggin@gmail.com> |
lazy tlb: introduce lazy tlb mm refcount helper functions Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. This makes the lazy tlb mm references more obvious, and allows the refcounting scheme to be modified in later changes. There is no functional change with this patch. Link: https://lkml.kernel.org/r/20230203071837.1136453-3-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
db281d59 |
|
13-Mar-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
cpu/hotplug: move to use bus_get_dev_root() Direct access to the struct bus_type dev_root pointer is going away soon so replace that with a call to bus_get_dev_root() instead, which is what it is there for. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Phil Auld <pauld@redhat.com> Cc: Steven Price <steven.price@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20230313182918.1312597-7-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
6f855b39 |
|
27-Sep-2022 |
Vincent Donnefort <vdonnefort@google.com> |
cpu/hotplug: Do not bail-out in DYING/STARTING sections The DYING/STARTING callbacks are not expected to fail. However, as reported by Derek, buggy drivers such as tboot are still free to return errors within those sections, which halts the hot(un)plug and leaves the CPU in an unrecoverable state. As there is no rollback possible, only log the failures and proceed with the following steps. This restores the hotplug behaviour prior to commit 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()") Fixes: 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()") Reported-by: Derek Dolney <z23@posteo.net> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Derek Dolney <z23@posteo.net> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867 Link: https://lore.kernel.org/r/20220927101259.1149636-1-vdonnefort@google.com
|
#
d385febc |
|
17-Nov-2022 |
Phil Auld <pauld@redhat.com> |
cpu/hotplug: Set cpuhp target for boot cpu Since the boot cpu does not go through the hotplug process it ends up with state == CPUHP_ONLINE but target == CPUHP_OFFLINE. So set the target to match in boot_cpu_hotplug_init(). Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20221117162329.3164999-3-pauld@redhat.com
|
#
64ea6e44 |
|
17-Nov-2022 |
Phil Auld <pauld@redhat.com> |
cpu/hotplug: Make target_store() a nop when target == state Writing the current state back in hotplug/target calls cpu_down() which will set cpu dying even when it isn't and then nothing will ever clear it. A stress test that reads values and writes them back for all cpu device files in sysfs will trigger the BUG() in select_fallback_rq once all cpus are marked as dying. kernel/cpu.c::target_store() ... if (st->state < target) ret = cpu_up(dev->id, target); else ret = cpu_down(dev->id, target); cpu_down() -> cpu_set_state() bool bringup = st->state < target; ... if (cpu_dying(cpu) != !bringup) set_cpu_dying(cpu, !bringup); Fix this by letting state==target fall through in the target_store() conditional. Also make sure st->target == target in that case. Fixes: 757c989b9994 ("cpu/hotplug: Make target state writeable") Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20221117162329.3164999-2-pauld@redhat.com
|
#
d308077e |
|
11-Apr-2022 |
Steven Price <steven.price@arm.com> |
cpu/hotplug: Initialise all cpuhp_cpu_state structs earlier Rather than waiting until a CPU is first brought online, do the initialisation of the cpuhp_cpu_state structure for each CPU during the __init phase. This saves a (small) amount of non-__init memory and avoids potential confusion about when the cpuhp_cpu_state struct is valid. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220411152233.474129-3-steven.price@arm.com
|
#
bae1a962 |
|
05-Apr-2022 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
x86/topology: Disable CPU online/offline control for TDX guests Unlike regular VMs, TDX guests use the firmware hand-off wakeup method to wake up the APs during the boot process. This wakeup model uses a mailbox to communicate with firmware to bring up the APs. As per the design, this mailbox can only be used once for the given AP, which means after the APs are booted, the same mailbox cannot be used to offline/online the given AP. More details about this requirement can be found in Intel TDX Virtual Firmware Design Guide, sec titled "AP initialization in OS" and in sec titled "Hotplug Device". Since the architecture does not support any method of offlining the CPUs, disable CPU hotplug support in the kernel. Since this hotplug disable feature can be re-used by other VM guests, add a new CC attribute CC_ATTR_HOTPLUG_DISABLED and use it to disable the hotplug support. Attempt to offline CPU will fail with -EOPNOTSUPP. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-25-kirill.shutemov@linux.intel.com
|
#
b7ba6d8d |
|
11-Apr-2022 |
Steven Price <steven.price@arm.com> |
cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state Currently the setting of the 'cpu' member of struct cpuhp_cpu_state in cpuhp_create() is too late as it is used earlier in _cpu_up(). If kzalloc_node() in __smpboot_create_thread() fails then the rollback will be done with st->cpu==0 causing CPU0 to be erroneously set to be dying, causing the scheduler to get mightily confused and throw its toys out of the pram. However the cpu number is actually available directly, so simply remove the 'cpu' member and avoid the problem in the first place. Fixes: 2ea46c6fc945 ("cpumask/hotplug: Fix cpu_dying() state tracking") Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220411152233.474129-2-steven.price@arm.com
|
#
04d4e665 |
|
07-Feb-2022 |
Frederic Weisbecker <frederic@kernel.org> |
sched/isolation: Use single feature type while referring to housekeeping cpumask Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
|
#
3191dd5a |
|
13-Feb-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
random: clear fast pool, crng, and batches in cpuhp bring up For the irq randomness fast pool, rather than having to use expensive atomics, which were visibly the most expensive thing in the entire irq handler, simply take care of the extreme edge case of resetting count to zero in the cpuhp online handler, just after workqueues have been reenabled. This simplifies the code a bit and lets us use vanilla variables rather than atomics, and performance should be improved. As well, very early on when the CPU comes up, while interrupts are still disabled, we clear out the per-cpu crng and its batches, so that it always starts with fresh randomness. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
dce1ca05 |
|
23-Nov-2021 |
Mark Rutland <mark.rutland@arm.com> |
sched/scs: Reset task stack state in bringup_cpu() To hot unplug a CPU, the idle task on that CPU calls a few layers of C code before finally leaving the kernel. When KASAN is in use, poisoned shadow is left around for each of the active stack frames, and when shadow call stacks are in use. When shadow call stacks (SCS) are in use the task's saved SCS SP is left pointing at an arbitrary point within the task's shadow call stack. When a CPU is offlined than onlined back into the kernel, this stale state can adversely affect execution. Stale KASAN shadow can alias new stackframes and result in bogus KASAN warnings. A stale SCS SP is effectively a memory leak, and prevents a portion of the shadow call stack being used. Across a number of hotplug cycles the idle task's entire shadow call stack can become unusable. We previously fixed the KASAN issue in commit: e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug") ... by removing any stale KASAN stack poison immediately prior to onlining a CPU. Subsequently in commit: f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled") ... the refactoring left the KASAN and SCS cleanup in one-time idle thread initialization code rather than something invoked prior to each CPU being onlined, breaking both as above. We fixed SCS (but not KASAN) in commit: 63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit") ... but as this runs in the context of the idle task being offlined it's potentially fragile. To fix these consistently and more robustly, reset the SCS SP and KASAN shadow of a CPU's idle task immediately before we online that CPU in bringup_cpu(). This ensures the idle task always has a consistent state when it is running, and removes the need to so so when exiting an idle task. Whenever any thread is created, dup_task_struct() will give the task a stack which is free of KASAN shadow, and initialize the task's SCS SP, so there's no need to specially initialize either for idle thread within init_idle(), as this was only necessary to handle hotplug cycles. I've tested this on arm64 with: * gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK * clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK ... offlining and onlining CPUS with: | while true; do | for C in /sys/devices/system/cpu/cpu*/online; do | echo 0 > $C; | echo 1 > $C; | done | done Fixes: f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Qian Cai <quic_qiancai@quicinc.com> Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
|
#
ebca71a8 |
|
08-Apr-2021 |
Dongli Zhang <dongli.zhang@oracle.com> |
cpu/hotplug: Add debug printks for hotplug callback failures CPU hotplug callbacks can fail and cause a rollback to the previous state. These failures are silent and therefore hard to debug. Add pr_debug() to the up and down paths which provide information about the error code, the CPU and the failed state. The debug printks can be enabled via kernel command line or sysfs. [ tglx: Adopt to current mainline, massage printk and changelog ] Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Qais Yousef <qais.yousef@arm.com> Link: https://lore.kernel.org/r/20210409055316.1709-1-dongli.zhang@oracle.com
|
#
1782dc87 |
|
27-May-2021 |
YueHaibing <yuehaibing@huawei.com> |
cpu/hotplug: Use DEVICE_ATTR_*() macro Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR, which makes the code a bit shorter and easier to read. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210527141105.2312-1-yuehaibing@huawei.com
|
#
11bc021d |
|
09-Aug-2021 |
Randy Dunlap <rdunlap@infradead.org> |
cpu/hotplug: Eliminate all kernel-doc warnings kernel/cpu.c:57: warning: cannot understand function prototype: 'struct cpuhp_cpu_state ' kernel/cpu.c:115: warning: cannot understand function prototype: 'struct cpuhp_step ' kernel/cpu.c:146: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * cpuhp_invoke_callback _ Invoke the callbacks for a given state kernel/cpu.c:75: warning: Function parameter or member 'fail' not described in 'cpuhp_cpu_state' kernel/cpu.c:75: warning: Function parameter or member 'cpu' not described in 'cpuhp_cpu_state' kernel/cpu.c:75: warning: Function parameter or member 'node' not described in 'cpuhp_cpu_state' kernel/cpu.c:75: warning: Function parameter or member 'last' not described in 'cpuhp_cpu_state' kernel/cpu.c:130: warning: Function parameter or member 'list' not described in 'cpuhp_step' kernel/cpu.c:130: warning: Function parameter or member 'multi_instance' not described in 'cpuhp_step' kernel/cpu.c:158: warning: No description found for return value of 'cpuhp_invoke_callback' kernel/cpu.c:1188: warning: No description found for return value of 'cpu_device_down' kernel/cpu.c:1400: warning: No description found for return value of 'cpu_device_up' kernel/cpu.c:1425: warning: No description found for return value of 'bringup_hibernate_cpu' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210809223825.24512-1-rdunlap@infradead.org
|
#
ed3cd1da |
|
05-Jun-2021 |
Baokun Li <libaokun1@huawei.com> |
cpu/hotplug: Fix kernel doc warnings for __cpuhp_setup_state_cpuslocked() Fixes the following W=1 kernel build warning(s): kernel/cpu.c:1949: warning: Function parameter or member 'name' not described in '__cpuhp_setup_state_cpuslocked' Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210605063003.681049-1-libaokun1@huawei.com
|
#
b22afcdf |
|
27-Mar-2021 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Cure the cpusets trainwreck Alexey and Joshua tried to solve a cpusets related hotplug problem which is user space visible and results in unexpected behaviour for some time after a CPU has been plugged in and the corresponding uevent was delivered. cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a workqueue. This is done because the cpusets code has already a lock nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or waiting for the work to finish with cpu_hotplug_lock held can and will deadlock because that results in the reverse lock order. As a consequence the uevent can be delivered before cpusets have consistent state which means that a user space invocation of sched_setaffinity() to move a task to the plugged CPU fails up to the point where the scheduled work has been processed. The same is true for CPU unplug, but that does not create user observable failure (yet). It's still inconsistent to claim that an operation is finished before it actually is and that's the real issue at hand. uevents just make it reliably observable. Obviously the problem should be fixed in cpusets/cgroups, but untangling that is pretty much impossible because according to the changelog of the commit which introduced this 8 years ago: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()") the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and the whole code is built around that. So bite the bullet and invoke the relevant cpuset function, which waits for the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and only when tasks are not frozen by suspend/hibernate because that would obviously wait forever. Waiting there with cpu_add_remove_lock, which is protecting the present and possible CPU maps, held is not a problem at all because neither work queues nor cpusets/cgroups have any lockchains related to that lock. Waiting in the hotplug machinery is not problematic either because there are already state callbacks which wait for hardware queues to drain. It makes the operations slightly slower, but hotplug is slow anyway. This ensures that state is consistent before returning from a hotplug up/down operation. It's still inconsistent during the operation, but that's a different story. Add a large comment which explains why this is done and why this is not a dump ground for the hack of the day to work around half thought out locking schemes. Document also the implications vs. hotplug operations and serialization or the lack of it. Thanks to Alexy and Joshua for analyzing why this temporary sched_setaffinity() failure happened. Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()") Reported-by: Alexey Klimov <aklimov@redhat.com> Reported-by: Joshua Baker <jobaker@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexey Klimov <aklimov@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
|
#
13070833 |
|
23-May-2021 |
Yuan ZhaoXiong <yuanzhaoxiong@baidu.com> |
cpu/hotplug: Simplify access to percpu cpuhp_state It is unnecessary to invoke per_cpu_ptr() everytime to access cpuhp_state. Use the available pointer instead. Signed-off-by: Yuan ZhaoXiong <yuanzhaoxiong@baidu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lore.kernel.org/r/1621776690-13264-1-git-send-email-yuanzhaoxiong@baidu.com
|
#
2ea46c6f |
|
20-Apr-2021 |
Peter Zijlstra <peterz@infradead.org> |
cpumask/hotplug: Fix cpu_dying() state tracking Vincent reported that for states with a NULL startup/teardown function we do not call cpuhp_invoke_callback() (because there is none) and as such we'll not update the cpu_dying() state. The stale cpu_dying() can eventually lead to triggering BUG(). Rectify this by updating cpu_dying() in the exact same places the hotplug machinery tracks its directional state, namely cpuhp_set_state() and cpuhp_reset_state(). Reported-by: Vincent Donnefort <vincent.donnefort@arm.com> Suggested-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/YH7r+AoQEReSvxBI@hirez.programming.kicks-ass.net
|
#
e40f74c5 |
|
19-Jan-2021 |
Peter Zijlstra <peterz@infradead.org> |
cpumask: Introduce DYING mask Introduce a cpumask that indicates (for each CPU) what direction the CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when an up fails and we do a roll-back down, it will accurately reflect the direction. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.org
|
#
453e4108 |
|
16-Feb-2021 |
Vincent Donnefort <vincent.donnefort@arm.com> |
cpu/hotplug: Add cpuhp_invoke_callback_range() Factorizing and unifying cpuhp callback range invocations, especially for the hotunplug path, where two different ways of decrementing were used. The first one, decrements before the callback is called: cpuhp_thread_fun() state = st->state; st->state--; cpuhp_invoke_callback(state); The second one, after: take_down_cpu()|cpuhp_down_callbacks() cpuhp_invoke_callback(st->state); st->state--; This is problematic for rolling back the steps in case of error, as depending on the decrement, the rollback will start from N or N-1. It also makes tracing inconsistent, between steps run in the cpuhp thread and the others. Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty steps. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210216103506.416286-4-vincent.donnefort@arm.com
|
#
62f25069 |
|
16-Feb-2021 |
Vincent Donnefort <vincent.donnefort@arm.com> |
cpu/hotplug: CPUHP_BRINGUP_CPU failure exception The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are triggered by the CPUHP_BRINGUP_CPU step. If the latter fails, no atomic state can be rolled back. DEAD callbacks too can't fail and disallow recovery. As a consequence, during hotunplug, the fail injection interface should prohibit all states from CPUHP_BRINGUP_CPU to CPUHP_ONLINE. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210216103506.416286-3-vincent.donnefort@arm.com
|
#
3ae70c25 |
|
16-Feb-2021 |
Vincent Donnefort <vincent.donnefort@arm.com> |
cpu/hotplug: Allowing to reset fail injection Currently, the only way of resetting the fail injection is to trigger a hotplug, hotunplug or both. This is rather annoying for testing and, as the default value for this file is -1, it seems pretty natural to let a user write it. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210216103506.416286-2-vincent.donnefort@arm.com
|
#
43759fe5 |
|
11-Nov-2020 |
Frederic Weisbecker <frederic@kernel.org> |
cpu/hotplug: Add lockdep_is_cpus_held() This commit adds a lockdep_is_cpus_held() function to verify that the proper locks are held and that various operations are running in the correct context. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8ff00399 |
|
26-Nov-2020 |
Nicholas Piggin <npiggin@gmail.com> |
kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling powerpc/64s keeps a counter in the mm which counts bits set in mm_cpumask as well as other things. This means it can't use generic code to clear bits out of the mask and doesn't adjust the arch specific counter. Add an arch override that allows powerpc/64s to use clear_tasks_mm_cpumask(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-4-npiggin@gmail.com
|
#
1cf12e08 |
|
16-Sep-2020 |
Thomas Gleixner <tglx@linutronix.de> |
sched/hotplug: Consolidate task migration on CPU unplug With the new mechanism which kicks tasks off the outgoing CPU at the end of schedule() the situation on an outgoing CPU right before the stopper thread brings it down completely is: - All user tasks and all unbound kernel threads have either been migrated away or are not running and the next wakeup will move them to a online CPU. - All per CPU kernel threads, except cpu hotplug thread and the stopper thread have either been unbound or parked by the responsible CPU hotplug callback. That means that at the last step before the stopper thread is invoked the cpu hotplug thread is the last legitimate running task on the outgoing CPU. Add a final wait step right before the stopper thread is kicked which ensures that any still running tasks on the way to park or on the way to kick themself of the CPU are either sleeping or gone. This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If sched_cpu_dying() detects that there is still another running task aside of the stopper thread then it will explode with the appropriate fireworks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.547163969@infradead.org
|
#
fb7fb84a |
|
29-Apr-2020 |
Qais Yousef <qais.yousef@arm.com> |
cpu/hotplug: Remove __freeze_secondary_cpus() The refactored function is no longer required as the codepaths that call freeze_secondary_cpus() are all suspend/resume related now. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://lkml.kernel.org/r/20200430114004.17477-2-qais.yousef@arm.com
|
#
56555855 |
|
29-Apr-2020 |
Qais Yousef <qais.yousef@arm.com> |
cpu/hotplug: Remove disable_nonboot_cpus() The single user could have called freeze_secondary_cpus() directly. Since this function was a source of confusion, remove it as it's just a pointless wrapper. While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to preserve the naming symmetry. Done automatically via: git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g' Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
|
#
bf2c59fc |
|
01-Apr-2020 |
Peter Zijlstra <peterz@infradead.org> |
sched/core: Fix illegal RCU from offline CPUs In the CPU-offline process, it calls mmdrop() after idle entry and the subsequent call to cpuhp_report_idle_dead(). Once execution passes the call to rcu_report_dead(), RCU is ignoring the CPU, which results in lockdep complaining when mmdrop() uses RCU from either memcg or debugobjects below. Fix it by cleaning up the active_mm state from BP instead. Every arch which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit() from AP. The only exception is parisc because it switches them to &init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()), but the patch will still work there because it calls mmgrab(&init_mm) in smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu(). WARNING: suspicious RCU usage ----------------------------- kernel/workqueue.c:710 RCU or wq_pool_mutex should be held! other info that might help us debug this: RCU used illegally from offline CPU! Call Trace: dump_stack+0xf4/0x164 (unreliable) lockdep_rcu_suspicious+0x140/0x164 get_work_pool+0x110/0x150 __queue_work+0x1bc/0xca0 queue_work_on+0x114/0x120 css_release+0x9c/0xc0 percpu_ref_put_many+0x204/0x230 free_pcp_prepare+0x264/0x570 free_unref_page+0x38/0xf0 __mmdrop+0x21c/0x2c0 idle_task_exit+0x170/0x1b0 pnv_smp_cpu_kill_self+0x38/0x2e0 cpu_die+0x48/0x64 arch_cpu_idle_dead+0x30/0x50 do_idle+0x2f4/0x470 cpu_startup_entry+0x38/0x40 start_secondary+0x7a8/0xa80 start_secondary_resume+0x10/0x14 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
|
#
182e073f |
|
17-Apr-2020 |
Ethon Paul <ethp@qq.com> |
cpu/hotplug: Fix a typo in comment "broadacasted"->"broadcasted" Signed-off-by: Ethon Paul <ethp@qq.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200417164008.6541-1-ethp@qq.com
|
#
e98eac6f |
|
26-Mar-2020 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus() A recent change to freeze_secondary_cpus() which added an early abort if a wakeup is pending missed the fact that the function is also invoked for shutdown, reboot and kexec via disable_nonboot_cpus(). In case of disable_nonboot_cpus() the wakeup event needs to be ignored as the purpose is to terminate the currently running kernel. Add a 'suspend' argument which is only set when the freeze is in context of a suspend operation. If not set then an eventually pending wakeup event is ignored. Fixes: a66d955e910a ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending") Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pavankumar Kondeti <pkondeti@codeaurora.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
|
#
33c3736e |
|
23-Mar-2020 |
Qais Yousef <qais.yousef@arm.com> |
cpu/hotplug: Hide cpu_up/down() Use separate functions for the device core to bring a CPU up and down. Users outside the device core must use add/remove_cpu() which will take care of extra housekeeping work like keeping sysfs in sync. Make cpu_up/down() static and replace the extra layer of indirection. [ tglx: Removed the extra wrapper functions and adjusted function names ] Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com
|
#
b99a2659 |
|
23-Mar-2020 |
Qais Yousef <qais.yousef@arm.com> |
cpu/hotplug: Move bringup of secondary CPUs out of smp_init() This is the last direct user of cpu_up() before it can become an internal implementation detail of the cpu subsystem. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200323135110.30522-17-qais.yousef@arm.com
|
#
d720f986 |
|
23-Mar-2020 |
Qais Yousef <qais.yousef@arm.com> |
cpu/hotplug: Provide bringup_hibernate_cpu() arm64 uses cpu_up() in the resume from hibernation code to ensure that the CPU on which the system hibernated is online. Provide a core function for this. [ tglx: Split out from the combo arm64 patch ] Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20200323135110.30522-9-qais.yousef@arm.com
|
#
0441a559 |
|
23-Mar-2020 |
Qais Yousef <qais.yousef@arm.com> |
cpu/hotplug: Create a new function to shutdown nonboot cpus This function will be used later in machine_shutdown() for some architectures. disable_nonboot_cpus() is not safe to use when doing machine_down(), because it relies on freeze_secondary_cpus() which in turn is a suspend/resume related freeze and could abort if the logic detects any pending activities that can prevent finishing the offlining process. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200323135110.30522-3-qais.yousef@arm.com
|
#
93ef1429 |
|
23-Mar-2020 |
Qais Yousef <qais.yousef@arm.com> |
cpu/hotplug: Add new {add,remove}_cpu() functions The new functions use device_{online,offline}() which are userspace safe. This is in preparation to move cpu_{up, down} kernel users to use a safer interface that is not racy with userspace. Suggested-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20200323135110.30522-2-qais.yousef@arm.com
|
#
1751060e |
|
30-Oct-2019 |
Peter Zijlstra <peterz@infradead.org> |
locking/percpu-rwsem, lockdep: Make percpu-rwsem use its own lockdep_map As preparation for replacing the embedded rwsem, give percpu-rwsem its own lockdep_map. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200131151539.927625541@infradead.org
|
#
dc8d37ed |
|
10-Dec-2019 |
Arnd Bergmann <arnd@arndb.de> |
cpu/SMT: Fix x86 link error without CONFIG_SYSFS When CONFIG_SYSFS is disabled, but CONFIG_HOTPLUG_SMT is enabled, the kernel fails to link: arch/x86/power/cpu.o: In function `hibernate_resume_nonboot_cpu_disable': (.text+0x38d): undefined reference to `cpuhp_smt_enable' arch/x86/power/hibernate.o: In function `arch_resume_nosmt': hibernate.c:(.text+0x291): undefined reference to `cpuhp_smt_enable' hibernate.c:(.text+0x29c): undefined reference to `cpuhp_smt_disable' Move the exported functions out of the #ifdef section into its own with the correct conditions. The patch that caused this is marked for stable backports, so this one may need to be backported as well. Fixes: ec527c318036 ("x86/power: Fix 'nosmt' vs hibernation triple fault during resume") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191210195614.786555-1-arnd@arndb.de
|
#
45178ac0 |
|
10-Dec-2019 |
Peter Zijlstra <peterz@infradead.org> |
cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order Paul reported a very sporadic, rcutorture induced, workqueue failure. When the planets align, the workqueue rescuer's self-migrate fails and then triggers a WARN for running a work on the wrong CPU. Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call could be ignored! When stopper->enabled is false, stop_machine will insta complete the work, without actually doing the work. Worse, it will not WARN about this (we really should fix this). It turns out there is a small window where a freshly online'ed CPU is marked 'online' but doesn't yet have the stopper task running: BP AP bringup_cpu() __cpu_up(cpu, idle) --> start_secondary() ... cpu_startup_entry() bringup_wait_for_ap() wait_for_ap_thread() <-- cpuhp_online_idle() while (1) do_idle() ... available to run kthreads ... stop_machine_unpark() stopper->enable = true; Close this by moving the stop_machine_unpark() into cpuhp_online_idle(), such that the stopper thread is ready before we start the idle loop and schedule. Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Debugged-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: "Paul E. McKenney" <paulmck@kernel.org>
|
#
731dc9df |
|
03-Nov-2019 |
Tyler Hicks <tyhicks@canonical.com> |
cpu/speculation: Uninline and export CPU mitigations helpers A kernel module may need to check the value of the "mitigations=" kernel command line parameter as part of its setup when the module needs to perform software mitigations for a CPU flaw. Uninline and export the helper functions surrounding the cpu_mitigations enum to allow for their usage from a module. Lastly, privatize the enum and cpu_mitigations variable since the value of cpu_mitigations can be checked with the exported helper functions. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
5facae4f |
|
18-Sep-2019 |
Qian Cai <cai@lca.pw> |
locking/lockdep: Remove unused @nested argument from lock_release() Since the following commit: b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release") @nested is no longer used in lock_release(), so remove it from all lock_release() calls and friends. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: alexander.levin@microsoft.com Cc: daniel@iogearbox.net Cc: davem@davemloft.net Cc: dri-devel@lists.freedesktop.org Cc: duyuyang@gmail.com Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: intel-gfx@lists.freedesktop.org Cc: jack@suse.com Cc: jlbec@evilplan.or Cc: joonas.lahtinen@linux.intel.com Cc: joseph.qi@linux.alibaba.com Cc: jslaby@suse.com Cc: juri.lelli@redhat.com Cc: maarten.lankhorst@linux.intel.com Cc: mark@fasheh.com Cc: mhocko@kernel.org Cc: mripard@kernel.org Cc: ocfs2-devel@oss.oracle.com Cc: rodrigo.vivi@intel.com Cc: sean@poorly.run Cc: st@kernel.org Cc: tj@kernel.org Cc: tytso@mit.edu Cc: vdavydov.dev@gmail.com Cc: vincent.guittot@linaro.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
e1572f1d |
|
16-Sep-2019 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
cpu/SMT: create and export cpu_smt_possible() KVM needs to know if SMT is theoretically possible, this means it is supported and not forcefully disabled ('nosmt=force'). Create and export cpu_smt_possible() answering this question. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0c09ab96 |
|
09-Jul-2019 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Cache number of online CPUs Re-evaluating the bitmap wheight of the online cpus bitmap in every invocation of num_online_cpus() over and over is a pretty useless exercise. Especially when num_online_cpus() is used in code paths like the IPI delivery of x86 or the membarrier code. Cache the number of online CPUs in the core and just return the cached variable. The accessor function provides only a snapshot when used without protection against concurrent CPU hotplug. The storage needs to use an atomic_t because the kexec and reboot code (ab)use set_cpu_online() in their 'shutdown' handlers without any form of serialization as pointed out by Mathieu. Regular CPU hotplug usage is properly serialized. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091622590.1634@nanos.tec.linutronix.de
|
#
e797bda3 |
|
22-Jul-2019 |
Thomas Gleixner <tglx@linutronix.de> |
smp/hotplug: Track booted once CPUs in a cpumask The booted once information which is required to deal with the MCE broadcast issue on X86 correctly is stored in the per cpu hotplug state, which is perfectly fine for the intended purpose. X86 needs that information for supporting NMI broadcasting via shortcuts, but retrieving it from per cpu data is cumbersome. Move it to a cpumask so the information can be checked against the cpu_present_mask quickly. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.818822855@linutronix.de
|
#
33d4a5a7 |
|
26-Jun-2019 |
Eiichi Tsukata <devel@etsukata.com> |
cpu/hotplug: Fix out-of-bounds read when setting fail state Setting invalid value to /sys/devices/system/cpu/cpuX/hotplug/fail can control `struct cpuhp_step *sp` address, results in the following global-out-of-bounds read. Reproducer: # echo -2 > /sys/devices/system/cpu/cpu0/hotplug/fail KASAN report: BUG: KASAN: global-out-of-bounds in write_cpuhp_fail+0x2cd/0x2e0 Read of size 8 at addr ffffffff89734438 by task bash/1941 CPU: 0 PID: 1941 Comm: bash Not tainted 5.2.0-rc6+ #31 Call Trace: write_cpuhp_fail+0x2cd/0x2e0 dev_attr_store+0x58/0x80 sysfs_kf_write+0x13d/0x1a0 kernfs_fop_write+0x2bc/0x460 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f05e4f4c970 The buggy address belongs to the variable: cpu_hotplug_lock+0x98/0xa0 Memory state around the buggy address: ffffffff89734300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffff89734400: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa ^ ffffffff89734480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Add a sanity check for the value written from user space. Fixes: 1db49484f21ed ("smp/hotplug: Hotplug state fail injection") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/20190627024732.31672-1-devel@etsukata.com
|
#
1bf72720 |
|
16-May-2019 |
Geert Uytterhoeven <geert@linux-m68k.org> |
cpu/speculation: Warn on unsupported mitigations= parameter Currently, if the user specifies an unsupported mitigation strategy on the kernel command line, it will be ignored silently. The code will fall back to the default strategy, possibly leaving the system more vulnerable than expected. This may happen due to e.g. a simple typo, or, for a stable kernel release, because not all mitigation strategies have been backported. Inform the user by printing a message. Fixes: 98af8452945c5565 ("cpu/speculation: Add 'mitigations=' cmdline option") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190516070935.22546-1-geert@linux-m68k.org
|
#
a66d955e |
|
02-Jun-2019 |
Pavankumar Kondeti <pkondeti@codeaurora.org> |
cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending When "deep" suspend is enabled, all CPUs except the primary CPU are frozen via CPU hotplug one by one. After all secondary CPUs are unplugged the wakeup pending condition is evaluated and if pending the suspend operation is aborted and the secondary CPUs are brought up again. CPU hotplug is a slow operation, so it makes sense to check for wakeup pending in the freezer loop before bringing down the next CPU. This improves the system suspend abort latency significantly. [ tglx: Massaged changelog and improved printk message ] Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: iri Kosina <jkosina@suse.cz> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: linux-pm@vger.kernel.org Link: https://lkml.kernel.org/r/1559536263-16472-1-git-send-email-pkondeti@codeaurora.org
|
#
ec527c31 |
|
29-May-2019 |
Jiri Kosina <jkosina@suse.cz> |
x86/power: Fix 'nosmt' vs hibernation triple fault during resume As explained in 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once. This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid. That results in triple fault shortly after cr3 is switched, and machine reboots. Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait. Cc: 4.19+ <stable@vger.kernel.org> # v4.19+ Debugged-by: Thomas Gleixner <tglx@linutronix.de> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
f5602011 |
|
28-May-2019 |
Jiri Kosina <jkosina@suse.cz> |
cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap() bringup_wait_for_ap() comment references cpu_notify_starting(), but the function is actually called notify_cpu_starting(). Fix that. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1905282128100.1962@cbobk.fhfr.pm
|
#
9ca12ac0 |
|
10-Apr-2019 |
Nicholas Piggin <npiggin@gmail.com> |
kernel/cpu: Allow non-zero CPU to be primary for suspend / kexec freeze This patch provides an arch option, ARCH_SUSPEND_NONZERO_CPU, to opt-in to allowing suspend to occur on one of the housekeeping CPUs rather than hardcoded CPU0. This will allow CPU0 to be a nohz_full CPU with a later change. It may be possible for platforms with hardware/firmware restrictions on suspend/wake effectively support this by handing off the final stage to CPU0 when kernel housekeeping is no longer required. Another option is to make housekeeping / nohz_full mask dynamic at runtime, but the complexity could not be justified at this time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lkml.kernel.org/r/20190411033448.20842-4-npiggin@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
98af8452 |
|
12-Apr-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
cpu/speculation: Add 'mitigations=' cmdline option Keeping track of the number of mitigations for all the CPU speculation bugs has become overwhelming for many users. It's getting more and more complicated to decide which mitigations are needed for a given architecture. Complicating matters is the fact that each arch tends to have its own custom way to mitigate the same vulnerability. Most users fall into a few basic categories: a) they want all mitigations off; b) they want all reasonable mitigations on, with SMT enabled even if it's vulnerable; or c) they want all reasonable mitigations on, with SMT disabled if vulnerable. Define a set of curated, arch-independent options, each of which is an aggregation of existing options: - mitigations=off: Disable all mitigations. - mitigations=auto: [default] Enable all the default mitigations, but leave SMT enabled, even if it's vulnerable. - mitigations=auto,nosmt: Enable all the default mitigations, disabling SMT if needed by a mitigation. Currently, these options are placeholders which don't actually do anything. They will be fleshed out in upcoming patches. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> (on x86) Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Jon Masters <jcm@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Price <steven.price@arm.com> Cc: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoimboe@redhat.com
|
#
de7b77e5 |
|
27-Mar-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
cpu/hotplug: Create SMT sysfs interface for all arches Make the /sys/devices/system/cpu/smt/* files available on all arches, so user space has a consistent way to detect whether SMT is enabled. The 'control' file now shows 'notimplemented' for architectures which don't yet have CONFIG_HOTPLUG_SMT. [ tglx: Make notimplemented a real state ] Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Link: https://lkml.kernel.org/r/469c2b98055f2c41e75748e06447d592a64080c9.1553635520.git.jpoimboe@redhat.com
|
#
206b9235 |
|
26-Mar-2019 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Tianyu reported a crash in a CPU hotplug teardown callback when booting a kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot parameter. It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken forever in case that a bringup callback fails. Unfortunately this issue was not recognized when the CPU hotplug code was reworked, so the shortcoming just stayed in place. When a bringup callback fails, the CPU hotplug code rolls back the operation and takes the CPU offline. The 'nosmt' command line argument uses a bringup failure to abort the bringup of SMT sibling CPUs. This partial bringup is required due to the MCE misdesign on Intel CPUs. With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level teardown of a CPU including the synchronizations in various facilities like RCU, NOHZ and others. As a consequence the teardown callbacks which must be executed on the outgoing CPU within stop machine with interrupts disabled are executed on the control CPU in interrupt enabled and preemptible context causing the kernel to crash and burn. The pre state machine code has a different failure mode which is more subtle and resulting in a less obvious use after free crash because the control side frees resources which are still in use by the undead CPU. But this is not a x86 only problem. Any architecture which supports the SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less likely to be triggered because in 99.99999% of the cases all bringup callbacks succeed. The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on all architectures as the following architectures have either no hotplug support at all or not all subarchitectures support it: alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial). Crashing the kernel in such a situation is not an acceptable state either. Implement a minimal rollback variant by limiting the teardown to the point where all regular teardown callbacks have been invoked and leave the CPU in the 'dead' idle state. This has the following consequences: - the CPU is brought down to the point where the stop_machine takedown would happen. - the CPU stays there forever and is idle - The CPU is cleared in the CPU active mask, but not in the CPU online mask which is a legit state. - Interrupts are not forced away from the CPU - All facilities which only look at online mask would still see it, but that is the case during normal hotplug/unplug operations as well. It's just a (way) longer time frame. This will expose issues, which haven't been exposed before or only seldom, because now the normally transient state of being non active but online is a permanent state. In testing this exposed already an issue vs. work queues where the vmstat code schedules work on the almost dead CPU which ends up in an unbound workqueue and triggers 'preemtible context' warnings. This is not a problem of this change, it merily exposes an already existing issue. Still this is better than crashing fully without a chance to debug it. This is mainly thought as workaround for those architectures which do not support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP. Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions") Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Rik van Riel <riel@surriel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Micheal Kelley <michael.h.kelley@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de
|
#
1b72d432 |
|
21-Mar-2019 |
Thomas Gleixner <tglx@linutronix.de> |
tick: Remove outgoing CPU from broadcast masks Valentin reported that unplugging a CPU occasionally results in a warning in the tick broadcast code which is triggered when an offline CPU is in the broadcast mask. This happens because the outgoing CPU is not removing itself from the broadcast masks, especially not from the broadcast_force_mask. The removal happens on the control CPU after the outgoing CPU is dead. It's a long standing issue, but the warning is harmless. Rework the hotplug mechanism so that the outgoing CPU removes itself from the broadcast masks after disabling interrupts and removing itself from the online mask. Reported-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.de
|
#
b284909a |
|
30-Jan-2019 |
Josh Poimboeuf <jpoimboe@redhat.com> |
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM With the following commit: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") ... the hotplug code attempted to detect when SMT was disabled by BIOS, in which case it reported SMT as permanently disabled. However, that code broke a virt hotplug scenario, where the guest is booted with only primary CPU threads, and a sibling is brought online later. The problem is that there doesn't seem to be a way to reliably distinguish between the HW "SMT disabled by BIOS" case and the virt "sibling not yet brought online" case. So the above-mentioned commit was a bit misguided, as it permanently disabled SMT for both cases, preventing future virt sibling hotplugs. Going back and reviewing the original problems which were attempted to be solved by that commit, when SMT was disabled in BIOS: 1) /sys/devices/system/cpu/smt/control showed "on" instead of "notsupported"; and 2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning. I'd propose that we instead consider #1 above to not actually be a problem. Because, at least in the virt case, it's possible that SMT wasn't disabled by BIOS and a sibling thread could be brought online later. So it makes sense to just always default the smt control to "on" to allow for that possibility (assuming cpuid indicates that the CPU supports SMT). The real problem is #2, which has a simple fix: change vmx_vm_init() to query the actual current SMT state -- i.e., whether any siblings are currently online -- instead of looking at the SMT "control" sysfs value. So fix it by: a) reverting the original "fix" and its followup fix: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation") and b) changing vmx_vm_init() to query the actual current SMT state -- instead of the sysfs control value -- to determine whether the L1TF warning is needed. This also requires the 'sched_smt_present' variable to exported, instead of 'cpu_smt_control'. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joe Mario <jmario@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
|
#
34d66caf |
|
17-Jan-2019 |
Zhenzhong Duan <zhenzhong.duan@oracle.com> |
x86/speculation: Remove redundant arch_smt_update() invocation With commit a74cfffb03b7 ("x86/speculation: Rework SMT state change"), arch_smt_update() is invoked from each individual CPU hotplug function. Therefore the extra arch_smt_update() call in the sysfs SMT control is redundant. Fixes: a74cfffb03b7 ("x86/speculation: Rework SMT state change") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <konrad.wilk@oracle.com> Cc: <dwmw@amazon.co.uk> Cc: <bp@suse.de> Cc: <srinivas.eeda@oracle.com> Cc: <peterz@infradead.org> Cc: <hpa@zytor.com> Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default
|
#
ce48c457 |
|
19-Dec-2018 |
Valentin Schneider <valentin.schneider@arm.com> |
cpu/hotplug: Mute hotplug lockdep during init Since we've had: commit cb538267ea1e ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations") we've been getting some lockdep warnings during init, such as on HiKey960: [ 0.820495] WARNING: CPU: 4 PID: 0 at kernel/cpu.c:316 lockdep_assert_cpus_held+0x3c/0x48 [ 0.820498] Modules linked in: [ 0.820509] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S 4.20.0-rc5-00051-g4cae42a #34 [ 0.820511] Hardware name: HiKey960 (DT) [ 0.820516] pstate: 600001c5 (nZCv dAIF -PAN -UAO) [ 0.820520] pc : lockdep_assert_cpus_held+0x3c/0x48 [ 0.820523] lr : lockdep_assert_cpus_held+0x38/0x48 [ 0.820526] sp : ffff00000a9cbe50 [ 0.820528] x29: ffff00000a9cbe50 x28: 0000000000000000 [ 0.820533] x27: 00008000b69e5000 x26: ffff8000bff4cfe0 [ 0.820537] x25: ffff000008ba69e0 x24: 0000000000000001 [ 0.820541] x23: ffff000008fce000 x22: ffff000008ba70c8 [ 0.820545] x21: 0000000000000001 x20: 0000000000000003 [ 0.820548] x19: ffff00000a35d628 x18: ffffffffffffffff [ 0.820552] x17: 0000000000000000 x16: 0000000000000000 [ 0.820556] x15: ffff00000958f848 x14: 455f3052464d4d34 [ 0.820559] x13: 00000000769dde98 x12: ffff8000bf3f65a8 [ 0.820564] x11: 0000000000000000 x10: ffff00000958f848 [ 0.820567] x9 : ffff000009592000 x8 : ffff00000958f848 [ 0.820571] x7 : ffff00000818ffa0 x6 : 0000000000000000 [ 0.820574] x5 : 0000000000000000 x4 : 0000000000000001 [ 0.820578] x3 : 0000000000000000 x2 : 0000000000000001 [ 0.820582] x1 : 00000000ffffffff x0 : 0000000000000000 [ 0.820587] Call trace: [ 0.820591] lockdep_assert_cpus_held+0x3c/0x48 [ 0.820598] static_key_enable_cpuslocked+0x28/0xd0 [ 0.820606] arch_timer_check_ool_workaround+0xe8/0x228 [ 0.820610] arch_timer_starting_cpu+0xe4/0x2d8 [ 0.820615] cpuhp_invoke_callback+0xe8/0xd08 [ 0.820619] notify_cpu_starting+0x80/0xb8 [ 0.820625] secondary_start_kernel+0x118/0x1d0 We've also had a similar warning in sched_init_smp() for every asymmetric system that would enable the sched_asym_cpucapacity static key, although that was singled out in: commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()") Those warnings are actually harmless, since we cannot have hotplug operations at the time they appear. Instead of starting to sprinkle useless hotplug lock operations in the init codepaths, mute the warnings until they start warning about real problems. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: cai@gmx.us Cc: daniel.lezcano@linaro.org Cc: dietmar.eggemann@arm.com Cc: linux-arm-kernel@lists.infradead.org Cc: longman@redhat.com Cc: marc.zyngier@arm.com Cc: mark.rutland@arm.com Link: https://lkml.kernel.org/r/1545243796-23224-2-git-send-email-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a74cfffb |
|
25-Nov-2018 |
Thomas Gleixner <tglx@linutronix.de> |
x86/speculation: Rework SMT state change arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline. To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled. Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
|
#
d0e7d144 |
|
04-Oct-2018 |
Borislav Petkov <bp@suse.de> |
cpu/SMT: State SMT is disabled even with nosmt and without "=force" When booting with "nosmt=force" a message is issued into dmesg to confirm that SMT has been force-disabled but such a message is not issued when only "nosmt" is on the kernel command line. Fix that. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20181004172227.10094-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
53c613fe |
|
25-Sep-2018 |
Jiri Kosina <jkosina@suse.cz> |
x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation STIBP is a feature provided by certain Intel ucodes / CPUs. This feature (once enabled) prevents cross-hyperthread control of decisions made by indirect branch predictors. Enable this feature if - the CPU is vulnerable to spectre v2 - the CPU supports SMT and has SMT siblings online - spectre_v2 mitigation autoselection is enabled (default) After some previous discussion, this leaves STIBP on all the time, as wrmsr on crossing kernel boundary is a no-no. This could perhaps later be a bit more optimized (like disabling it in NOHZ, experiment with disabling it in idle, etc) if needed. Note that the synchronization of the mask manipulation via newly added spec_ctrl_mutex is currently not strictly needed, as the only updater is already being serialized by cpu_add_remove_lock, but let's make this a little bit more future-proof. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
|
#
cb92173d |
|
11-Sep-2018 |
Peter Zijlstra <peterz@infradead.org> |
locking/lockdep, cpu/hotplug: Annotate AP thread Anybody trying to assert the cpu_hotplug_lock is held (lockdep_assert_cpus_held()) from AP callbacks will fail, because the lock is held by the BP. Stick in an explicit annotation in cpuhp_thread_fun() to make this work. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-tip-commits@vger.kernel.org Fixes: cb538267ea1e ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations") Link: http://lkml.kernel.org/r/20180911095127.GT24082@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
69fa6eb7 |
|
06-Sep-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Prevent state corruption on error rollback When a teardown callback fails, the CPU hotplug code brings the CPU back to the previous state. The previous state becomes the new target state. The rollback happens in undo_cpu_down() which increments the state unconditionally even if the state is already the same as the target. As a consequence the next CPU hotplug operation will start at the wrong state. This is easily to observe when __cpu_disable() fails. Prevent the unconditional undo by checking the state vs. target before incrementing state and fix up the consequently wrong conditional in the unplug code which handles the failure of the final CPU take down on the control CPU side. Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core") Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: josh@joshtriplett.org Cc: peterz@infradead.org Cc: jiangshanlai@gmail.com Cc: dzickus@redhat.com Cc: brendan.jackman@arm.com Cc: malat@debian.org Cc: sramana@codeaurora.org Cc: linux-arm-msm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1809051419580.1416@nanos.tec.linutronix.de ----
|
#
f8b7530a |
|
04-Sep-2018 |
Neeraj Upadhyay <neeraju@codeaurora.org> |
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun() The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the load of st->should_run to prevent reordering of the later load/stores w.r.t. the load of st->should_run. Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core") Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infraded.org> Cc: josh@joshtriplett.org Cc: peterz@infradead.org Cc: jiangshanlai@gmail.com Cc: dzickus@redhat.com Cc: brendan.jackman@arm.com Cc: malat@debian.org Cc: mojha@codeaurora.org Cc: sramana@codeaurora.org Cc: linux-arm-msm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neeraju@codeaurora.org
|
#
6fb86d97 |
|
27-Aug-2018 |
Mukesh Ojha <mojha@codeaurora.org> |
cpu/hotplug: Remove skip_onerr field from cpuhp_step structure When notifiers were there, `skip_onerr` was used to avoid calling particular step startup/teardown callbacks in the CPU up/down rollback path, which made the hotplug asymmetric. As notifiers are gone now after the full state machine conversion, the `skip_onerr` field is no longer required. Remove it from the structure and its usage. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1535439294-31426-1-git-send-email-mojha@codeaurora.org
|
#
269777aa |
|
14-Aug-2018 |
Abel Vesa <abelvesa@linux.com> |
cpu/hotplug: Non-SMP machines do not make use of booted_once Commit 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") breaks non-SMP builds. [ I suspect the 'bool' fields should just be made to be bitfields and be exposed regardless of configuration, but that's a separate cleanup that I'll leave to the owners of this file for later. - Linus ] Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Abel Vesa <abelvesa@linux.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b5b1404d |
|
12-Aug-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
init: rename and re-order boot_cpu_state_init() This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Mian Yousaf Kaukab <yousaf.kaukab@suse.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bc2d8d26 |
|
07-Aug-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Fix SMT supported evaluation Josh reported that the late SMT evaluation in cpu_smt_state_init() sets cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied on the kernel command line as it cannot differentiate between SMT disabled by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and makes the sysfs interface unusable. Rework this so that during bringup of the non boot CPUs the availability of SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a 'primary' thread then set the local cpu_smt_available marker and evaluate this explicitely right after the initial SMP bringup has finished. SMT evaulation on x86 is a trainwreck as the firmware has all the information _before_ booting the kernel, but there is no interface to query it. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d018031f |
|
24-Jul-2018 |
Mukesh Ojha <mojha@codeaurora.org> |
cpu/hotplug: Clarify CPU hotplug step name for timers After commit 249d4a9b3246 ("timers: Reinitialize per cpu bases on hotplug") i.e. the introduction of state CPUHP_TIMERS_PREPARE instead of CPUHP_TIMERS_DEAD the step name "timers:dead" is not longer accurate. Rename it to "timers:prepare". [ tglx: Massaged changelog ] Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: gkohli@codeaurora.org Cc: neeraju@codeaurora.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Brendan Jackman <brendan.jackman@arm.com> Cc: Mathieu Malaterre <malat@debian.org> Link: https://lkml.kernel.org/r/1532443668-26810-1-git-send-email-mojha@codeaurora.org
|
#
6f4ceee9 |
|
24-Jul-2018 |
Waiman Long <longman@redhat.com> |
cpu/hotplug: Add a cpus_read_trylock() function There are use cases where it can be useful to have a cpus_read_trylock() function to work around circular lock dependency problem involving the cpu_hotplug_lock. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
73d5e2b4 |
|
24-Jul-2018 |
Josh Poimboeuf <jpoimboe@redhat.com> |
cpu/hotplug: detect SMT disabled by BIOS If SMT is disabled in BIOS, the CPU code doesn't properly detect it. The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf' vulnerabilities file shows SMT as vulnerable. Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a case. Unfortunately the detection can only be done after bringing all the CPUs online, so we have to overwrite any previous writes to the variable. Reported-by: Joe Mario <jmario@redhat.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Fixes: f048c399e0f7 ("x86/topology: Provide topology_smt_supported()") Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
#
fee0aede |
|
13-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support SMT) when the sysfs SMT control file is initialized. That was fine so far as this was only required to make the output of the control file correct and to prevent writes in that case. With the upcoming l1tf command line parameter, this needs to be set up before the L1TF mitigation selection and command line parsing happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
|
#
8e1b706b |
|
13-Jul-2018 |
Jiri Kosina <jkosina@suse.cz> |
cpu/hotplug: Expose SMT control init function The L1TF mitigation will gain a commend line parameter which allows to set a combination of hypervisor mitigation and SMT control. Expose cpu_smt_disable() so the command line parser can tweak SMT settings. [ tglx: Split out of larger patch and made it preserve an already existing force off state ] Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20180713142323.039715135@linutronix.de
|
#
215af549 |
|
07-Jul-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Online siblings when SMT control is turned on Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT siblings. Writing 'on' merily enables the abilify to online them, but does not online them automatically. Make 'on' more useful by onlining all offline siblings. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
26acfb66 |
|
20-Jun-2018 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present If the L1TF CPU bug is present we allow the KVM module to be loaded as the major of users that use Linux and KVM have trusted guests and do not want a broken setup. Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as such they are the ones that should set nosmt to one. Setting 'nosmt' means that the system administrator also needs to disable SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line parameter, or via the /sys/devices/system/cpu/smt/control. See commit 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT"). Other mitigations are to use task affinity, cpu sets, interrupt binding, etc - anything to make sure that _only_ the same guests vCPUs are running on sibling threads. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9cf57731 |
|
07-Jun-2018 |
Peter Zijlstra <peterz@infradead.org> |
watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work Oleg suggested to replace the "watchdog/%u" threads with cpu_stop_work. That removes one thread per CPU while at the same time fixes softlockup vs SCHED_DEADLINE. But more importantly, it does away with the single smpboot_update_cpumask_percpu_thread() user, which allows cleanups/shrinkage of the smpboot interface. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
0cc3cd21 |
|
29-Jun-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Boot HT siblings at least once Due to the way Machine Check Exceptions work on X86 hyperthreads it's required to boot up _all_ logical cores at least once in order to set the CR4.MCE bit. So instead of ignoring the sibling threads right away, let them boot up once so they can configure themselves. After they came out of the initial boot stage check whether its a "secondary" sibling and cancel the operation which puts the CPU back into offline state. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tony Luck <tony.luck@intel.com>
|
#
05736e4a |
|
29-May-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Provide knobs to control SMT Provide a command line and a sysfs knob to control SMT. The command line options are: 'nosmt': Enumerate secondary threads, but do not online them 'nosmt=force': Ignore secondary threads completely during enumeration via MP table and ACPI/MADT. The sysfs control file has the following states (read/write): 'on': SMT is enabled. Secondary threads can be freely onlined 'off': SMT is disabled. Secondary threads, even if enumerated cannot be onlined 'forceoff': SMT is permanentely disabled. Writes to the control file are rejected. 'notsupported': SMT is not supported by the CPU The command line option 'nosmt' sets the sysfs control to 'off'. This can be changed to 'on' to reenable SMT during runtime. The command line option 'nosmt=force' sets the sysfs control to 'forceoff'. This cannot be changed during runtime. When SMT is 'on' and the control file is changed to 'off' then all online secondary threads are offlined and attempts to online a secondary thread later on are rejected. When SMT is 'off' and the control file is changed to 'on' then secondary threads can be onlined again. The 'off' -> 'on' transition does not automatically online the secondary threads. When the control file is set to 'forceoff', the behaviour is the same as setting it to 'off', but the operation is irreversible and later writes to the control file are rejected. When the control status is 'notsupported' then writes to the control file are rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
|
#
cc1fe215 |
|
29-May-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Split do_cpu_down() Split out the inner workings of do_cpu_down() to allow reuse of that function for the upcoming SMT disabling mechanism. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
|
#
c4de6569 |
|
29-May-2018 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Make bringup/teardown of smp threads symmetric The asymmetry caused a warning to trigger if the bootup was stopped in state CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can now be invoked on already or still parked threads. But there is still no reason to have this be asymmetric. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org>
|
#
fcb3029a |
|
15-Mar-2018 |
Arnd Bergmann <arnd@arndb.de> |
cpu/hotplug: Fix unused function warning The cpuhp_is_ap_state() function is no longer called outside of the CONFIG_SMP #ifdef section, causing a harmless warning: kernel/cpu.c:129:13: error: 'cpuhp_is_ap_state' defined but not used [-Werror=unused-function] This moves the function into the #ifdef to get a clean build again. Fixes: 17a2f1ced028 ("cpu/hotplug: Merge cpuhp_bp_states and cpuhp_ap_states") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20180315153829.3819606-1-arnd@arndb.de
|
#
17a2f1ce |
|
01-Dec-2017 |
Lai Jiangshan <jiangshanlai@gmail.com> |
cpu/hotplug: Merge cpuhp_bp_states and cpuhp_ap_states cpuhp_bp_states and cpuhp_ap_states have different set of steps without any conflicting steps, so that they can be merged. The original `[CPUHP_BRINGUP_CPU] = { },` is removed, because the new cpuhp_hp_states has CPUHP_ONLINE index which is larger than CPUHP_BRINGUP_CPU. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20171201135008.21633-1-jiangshanlai@gmail.com
|
#
26456f87 |
|
27-Dec-2017 |
Thomas Gleixner <tglx@linutronix.de> |
timers: Reinitialize per cpu bases on hotplug The timer wheel bases are not (re)initialized on CPU hotplug. That leaves them with a potentially stale clk and next_expiry valuem, which can cause trouble then the CPU is plugged. Add a prepare callback which forwards the clock, sets next_expiry to far in the future and reset the control flags to a known state. Set base->must_forward_clk so the first timer which is queued will try to forward the clock to current jiffies. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
|
#
76dc6c09 |
|
26-Dec-2017 |
Mathieu Malaterre <malat@debian.org> |
cpu/hotplug: Move inline keyword at the beginning of declaration Fix non-fatal warnings such as: kernel/cpu.c:95:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] static void inline cpuhp_lock_release(bool bringup) { } ^~~~~~ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20171226140855.16583-1-malat@debian.org
|
#
5b1ead68 |
|
06-Dec-2017 |
Brendan Jackman <brendan.jackman@arm.com> |
cpu/hotplug: Fix state name in takedown_cpu() comment CPUHP_AP_SCHED_MIGRATE_DYING doesn't exist, it looks like this was supposed to refer to CPUHP_AP_SCHED_STARTING's teardown callback, i.e. sched_cpu_dying(). Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171206105911.28093-1-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
46febd37 |
|
28-Nov-2017 |
Lai Jiangshan <jiangshanlai@gmail.com> |
smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place Commit 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine") accidently put this step on the wrong place. The step should be at the cpuhp_ap_states[] rather than the cpuhp_bp_states[]. grep smpcfd /sys/devices/system/cpu/hotplug/states 40: smpcfd:prepare 129: smpcfd:dying "smpcfd:dying" was missing before. So was the invocation of the function smpcfd_dying_cpu(). Fixes: 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine") Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lkml.kernel.org/r/20171128131954.81229-1-jiangshanlai@gmail.com
|
#
1f7c70d6 |
|
21-Oct-2017 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Reset node state after operation The recent rework of the cpu hotplug internals changed the usage of the per cpu state->node field, but missed to clean it up after usage. So subsequent hotplug operations use the stale pointer from a previous operation and hand it into the callback functions. The callbacks then dereference a pointer which either belongs to a different facility or points to freed and potentially reused memory. In either case data corruption and crashes are the obvious consequence. Reset the node and the last pointers in the per cpu state to NULL after the operation which set them has completed. Fixes: 96abb968549c ("smp/hotplug: Allow external multi-instance rollback") Reported-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710211606130.3213@nanos
|
#
1db49484 |
|
20-Sep-2017 |
Peter Zijlstra <peterz@infradead.org> |
smp/hotplug: Hotplug state fail injection Add a sysfs file to one-time fail a specific state. This can be used to test the state rollback code paths. Something like this (hotplug-up.sh): #!/bin/bash echo 0 > /debug/sched_debug echo 1 > /debug/tracing/events/cpuhp/enable ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1` STATES=${1:-$ALL_STATES} for state in $STATES do echo 0 > /sys/devices/system/cpu/cpu1/online echo 0 > /debug/tracing/trace echo Fail state: $state echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail cat /sys/devices/system/cpu/cpu1/hotplug/fail echo 1 > /sys/devices/system/cpu/cpu1/online cat /debug/tracing/trace > hotfail-${state}.trace sleep 1 done Can be used to test for all possible rollback (barring multi-instance) scenarios on CPU-up, CPU-down is a trivial modification of the above. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Cc: max.byungchul.park@gmail.com Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org
|
#
5ebe7742 |
|
20-Sep-2017 |
Peter Zijlstra <peterz@infradead.org> |
smp/hotplug: Differentiate the AP completion between up and down With lockdep-crossrelease we get deadlock reports that span cpu-up and cpu-down chains. Such deadlocks cannot possibly happen because cpu-up and cpu-down are globally serialized. takedown_cpu() irq_lock_sparse() wait_for_completion(&st->done) cpuhp_thread_fun cpuhp_up_callback cpuhp_invoke_callback irq_affinity_online_cpu irq_local_spare() irq_unlock_sparse() complete(&st->done) Now that we have consistent AP state, we can trivially separate the AP completion between up and down using st->bringup. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: max.byungchul.park@gmail.com Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Link: https://lkml.kernel.org/r/20170920170546.872472799@infradead.org
|
#
5f4b55e1 |
|
20-Sep-2017 |
Peter Zijlstra <peterz@infradead.org> |
smp/hotplug: Differentiate the AP-work lockdep class between up and down With lockdep-crossrelease we get deadlock reports that span cpu-up and cpu-down chains. Such deadlocks cannot possibly happen because cpu-up and cpu-down are globally serialized. CPU0 CPU1 CPU2 cpuhp_up_callbacks: takedown_cpu: cpuhp_thread_fun: cpuhp_state irq_lock_sparse() irq_lock_sparse() wait_for_completion() cpuhp_state complete() Now that we have consistent AP state, we can trivially separate the AP-work class between up and down using st->bringup. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: max.byungchul.park@gmail.com Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Link: https://lkml.kernel.org/r/20170920170546.922524234@infradead.org
|
#
724a8688 |
|
20-Sep-2017 |
Peter Zijlstra <peterz@infradead.org> |
smp/hotplug: Callback vs state-machine consistency While the generic callback functions have an 'int' return and thus appear to be allowed to return error, this is not true for all states. Specifically, what used to be STARTING/DYING are ran with IRQs disabled from critical parts of CPU bringup/teardown and are not allowed to fail. Add WARNs to enforce this rule. But since some callbacks are indeed allowed to fail, we have the situation where a state-machine rollback encounters a failure, in this case we're stuck, we can't go forward and we can't go back. Also add a WARN for that case. AFAICT this is a fundamental 'problem' with no real obvious solution. We want the 'prepare' callbacks to allow failure on either up or down. Typically on prepare-up this would be things like -ENOMEM from resource allocations, and the typical usage in prepare-down would be something like -EBUSY to avoid CPUs being taken away. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Cc: max.byungchul.park@gmail.com Link: https://lkml.kernel.org/r/20170920170546.819539119@infradead.org
|
#
4dddfb5f |
|
20-Sep-2017 |
Peter Zijlstra <peterz@infradead.org> |
smp/hotplug: Rewrite AP state machine core There is currently no explicit state change on rollback. That is, st->bringup, st->rollback and st->target are not consistent when doing the rollback. Rework the AP state handling to be more coherent. This does mean we have to do a second AP kick-and-wait for rollback, but since rollback is the slow path of a slowpath, this really should not matter. Take this opportunity to simplify the AP thread function to only run a single callback per invocation. This unifies the three single/up/down modes is supports. The looping it used to do for up/down are achieved by retaining should_run and relying on the main smpboot_thread_fn() loop. (I have most of a patch that does the same for the BP state handling, but that's not critical and gets a little complicated because CPUHP_BRINGUP_CPU does the AP handoff from a callback, which gets recursive @st usage, I still have de-fugly that.) [ tglx: Move cpuhp_down_callbacks() et al. into the HOTPLUG_CPU section to avoid gcc complaining about unused functions. Make the HOTPLUG_CPU one piece instead of having two consecutive ifdef sections of the same type. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Cc: max.byungchul.park@gmail.com Link: https://lkml.kernel.org/r/20170920170546.769658088@infradead.org
|
#
96abb968 |
|
20-Sep-2017 |
Peter Zijlstra <peterz@infradead.org> |
smp/hotplug: Allow external multi-instance rollback Currently the rollback of multi-instance states is handled inside cpuhp_invoke_callback(). The problem is that when we want to allow an explicit state change for rollback, we need to return from the function without doing the rollback. Change cpuhp_invoke_callback() to optionally return the multi-instance state, such that rollback can be done from a subsequent call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Cc: max.byungchul.park@gmail.com Link: https://lkml.kernel.org/r/20170920170546.720361181@infradead.org
|
#
941154bd |
|
12-Sep-2017 |
Thomas Gleixner <tglx@linutronix.de> |
watchdog/hardlockup/perf: Prevent CPU hotplug deadlock The following deadlock is possible in the watchdog hotplug code: cpus_write_lock() ... takedown_cpu() smpboot_park_threads() smpboot_park_thread() kthread_park() ->park() := watchdog_disable() watchdog_nmi_disable() perf_event_release_kernel(); put_event() _free_event() ->destroy() := hw_perf_event_destroy() x86_release_hardware() release_ds_buffers() get_online_cpus() when a per cpu watchdog perf event is destroyed which drops the last reference to the PMU hardware. The cleanup code there invokes get_online_cpus() which instantly deadlocks because the hotplug percpu rwsem is write locked. To solve this add a deferring mechanism: cpus_write_lock() kthread_park() watchdog_nmi_disable(deferred) perf_event_disable(event); move_event_to_deferred(event); .... cpus_write_unlock() cleaup_deferred_events() perf_event_release_kernel() This is still properly serialized against concurrent hotplug via the cpu_add_remove_lock, which is held by the task which initiated the hotplug event. This is also used to handle event destruction when the watchdog threads are parked via other mechanisms than CPU hotplug. Analyzed-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a58163d8 |
|
20-Jun-2017 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Migrate callbacks earlier in the CPU-offline timeline RCU callbacks must be migrated away from an outgoing CPU, and this is done near the end of the CPU-hotplug operation, after the outgoing CPU is long gone. Unfortunately, this means that other CPU-hotplug callbacks can execute while the outgoing CPU's callbacks are still immobilized on the long-gone CPU's callback lists. If any of these CPU-hotplug callbacks must wait, either directly or indirectly, for the invocation of any of the immobilized RCU callbacks, the system will hang. This commit avoids such hangs by migrating the callbacks away from the outgoing CPU immediately upon its departure, shortly after the return from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these callbacks and invoke them, which allows all the after-the-fact CPU-hotplug callbacks to wait on these RCU callbacks without risk of a hang. While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including dead code on the one hand and to avoid define-without-use warnings on the other hand. Reported-by: Jeffrey Hugo <jhugo@codeaurora.org> Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Richard Weinberger <richard@nod.at>
|
#
0c96b273 |
|
19-Jul-2017 |
Ethan Barnes <Ethan.Barnes@wdc.com> |
smp/hotplug: Handle removal correctly in cpuhp_store_callbacks() If cpuhp_store_callbacks() is called for CPUHP_AP_ONLINE_DYN or CPUHP_BP_PREPARE_DYN, which are the indicators for dynamically allocated states, then cpuhp_store_callbacks() allocates a new dynamic state. The first allocation in each range returns CPUHP_AP_ONLINE_DYN or CPUHP_BP_PREPARE_DYN. If cpuhp_remove_state() is invoked for one of these states, then there is no protection against the allocation mechanism. So the removal, which should clear the callbacks and the name, gets a new state assigned and clears that one. As a consequence the state which should be cleared stays initialized. A consecutive CPU hotplug operation dereferences the state callbacks and accesses either freed or reused memory, resulting in crashes. Add a protection against this by checking the name argument for NULL. If it's NULL it's a removal. If not, it's an allocation. [ tglx: Added a comment and massaged changelog ] Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface") Signed-off-by: Ethan Barnes <ethan.barnes@sandisk.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.or> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Sebastian Siewior <bigeasy@linutronix.d> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/DM2PR04MB398242FC7776D603D9F99C894A60@DM2PR04MB398.namprd04.prod.outlook.com
|
#
dea1d0f5 |
|
11-Jul-2017 |
Thomas Gleixner <tglx@linutronix.de> |
smp/hotplug: Replace BUG_ON and react useful The move of the unpark functions to the control thread moved the BUG_ON() there as well. While it made some sense in the idle thread of the upcoming CPU, it's bogus to crash the control thread on the already online CPU, especially as the function has a return value and the callsite is prepared to handle an error return. Replace it with a WARN_ON_ONCE() and return a proper error code. Fixes: 9cd4f1a4e7a8 ("smp/hotplug: Move unparking of percpu threads to the control CPU") Rightfully-ranted-at-by: Linux Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9cd4f1a4 |
|
04-Jul-2017 |
Thomas Gleixner <tglx@linutronix.de> |
smp/hotplug: Move unparking of percpu threads to the control CPU Vikram reported the following backtrace: BUG: scheduling while atomic: swapper/7/0/0x00000002 CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680 schedule schedule_hrtimeout_range_clock schedule_hrtimeout wait_task_inactive __kthread_bind_mask __kthread_bind __kthread_unpark kthread_unpark cpuhp_online_idle cpu_startup_entry secondary_start_kernel He analyzed correctly that a parked cpu hotplug thread of an offlined CPU was still on the runqueue when the CPU came back online and tried to unpark it. This causes the thread which invoked kthread_unpark() to call wait_task_inactive() and subsequently schedule() with preemption disabled. His proposed workaround was to "make sure" that a parked thread has scheduled out when the CPU goes offline, so the situation cannot happen. But that's still wrong because the root cause is not the fact that the percpu thread is still on the runqueue and neither that preemption is disabled, which could be simply solved by enabling preemption before calling kthread_unpark(). The real issue is that the calling thread is the idle task of the upcoming CPU, which is not supposed to call anything which might sleep. The moron, who wrote that code, missed completely that kthread_unpark() might end up in schedule(). The solution is simpler than expected. The thread which controls the hotplug operation is waiting for the CPU to call complete() on the hotplug state completion. So the idle task of the upcoming CPU can set its state to CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control task on a different CPU, which then can safely do the unpark and kick the now unparked hotplug thread of the upcoming CPU to complete the bringup to the final target state. Control CPU AP bringup_cpu(); __cpu_up() ------------> bringup_ap(); bringup_wait_for_ap() wait_for_completion(); cpuhp_online_idle(); <------------ complete(); unpark(AP->stopper); unpark(AP->hotplugthread); while(1) do_idle(); kick(AP->hotplugthread); wait_for_completion(); hotplug_thread() run_online_callbacks(); complete(); Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up") Reported-by: Vikram Mulukutla <markivx@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
993647a2 |
|
29-Jun-2017 |
Arvind Yadav <arvind.yadav.cs@gmail.com> |
cpu/hotplug: Constify attribute_group structures attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const: File size before: text data bss dec hex filename 12582 15361 20 27963 6d3b kernel/cpu.o File size After adding 'const': text data bss dec hex filename 12710 15265 20 27995 6d5b kernel/cpu.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: anna-maria@linutronix.de Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: rcochran@linutronix.de Link: http://lkml.kernel.org/r/f9079e94e12b36d245e7adbf67d312bc5d0250c6.1498737970.git.arvind.yadav.cs@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c5cb83bb |
|
19-Jun-2017 |
Thomas Gleixner <tglx@linutronix.de> |
genirq/cpuhotplug: Handle managed IRQs on CPU hotplug If a CPU goes offline, interrupts affine to the CPU are moved away. If the outgoing CPU is the last CPU in the affinity mask the migration code breaks the affinity and sets it it all online cpus. This is a problem for affinity managed interrupts as CPU hotplug is often used for power management purposes. If the affinity is broken, the interrupt is not longer affine to the CPUs to which it was allocated. The affinity spreading allows to lay out multi queue devices in a way that they are assigned to a single CPU or a group of CPUs. If the last CPU goes offline, then the queue is not longer used, so the interrupt can be shutdown gracefully and parked until one of the assigned CPUs comes online again. Add a graceful shutdown mechanism into the irq affinity breaking code path, mark the irq as MANAGED_SHUTDOWN and leave the affinity mask unmodified. In the online path, scan the active interrupts for managed interrupts and if the interrupt is functional and the newly online CPU is part of the affinity mask, restart the interrupt if it is marked MANAGED_SHUTDOWN or if the interrupts is started up, try to add the CPU back to the effective affinity mask. Originally-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170619235447.273417334@linutronix.de
|
#
57de7212 |
|
08-Jun-2017 |
Arnd Bergmann <arnd@arndb.de> |
cpu/hotplug: Remove unused check_for_tasks() function clang -Wunused-function found one remaining function that was apparently meant to be removed in a recent code cleanup: kernel/cpu.c:565:20: warning: unused function 'check_for_tasks' [-Wunused-function] Sebastian explained: The function became unused unintentionally, but there is already a failure check, when a task cannot be removed from the outgoing cpu in the scheduler code, so bringing it back is not really giving any extra value. Fixes: 530e9b76ae8f ("cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: http://lkml.kernel.org/r/20170608085544.2257132-1-arnd@arndb.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
40da1b11 |
|
02-Jun-2017 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
cpu/hotplug: Drop the device lock on error If a custom CPU target is specified and that one is not available _or_ can't be interrupted then the code returns to userland without dropping a lock as notices by lockdep: |echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target | ================================================ | [ BUG: lock held when returning to user space! ] | ------------------------------------------------ | bash/503 is leaving the kernel with locks still held! | 1 lock held by bash/503: | #0: (device_hotplug_lock){+.+...}, at: [<ffffffff815b5650>] lock_device_hotplug_sysfs+0x10/0x40 So release the lock then. Fixes: 757c989b9994 ("cpu/hotplug: Make target state writeable") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de
|
#
49dfe2a6 |
|
24-May-2017 |
Thomas Gleixner <tglx@linutronix.de> |
cpuhotplug: Link lock stacks for hotplug callbacks The CPU hotplug callbacks are not covered by lockdep versus the cpu hotplug rwsem. CPU0 CPU1 cpuhp_setup_state(STATE, startup, teardown); cpus_read_lock(); invoke_callback_on_ap(); kick_hotplug_thread(ap); wait_for_completion(); hotplug_thread_fn() lock(m); do_stuff(); unlock(m); Lockdep does not know about this dependency and will not trigger on the following code sequence: lock(m); cpus_read_lock(); Add a lockdep map and connect the initiators lock chain with the hotplug thread lock chain, so potential deadlocks can be detected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081549.709375845@linutronix.de
|
#
fc8dffd3 |
|
24-May-2017 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Convert hotplug locking to percpu rwsem There are no more (known) nested calls to get_online_cpus() and all observed lock ordering problems have been addressed. Replace the magic nested 'rwsem' hackery with a percpu-rwsem. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081549.447014063@linutronix.de
|
#
210e2133 |
|
24-May-2017 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
cpu/hotplug: Use stop_machine_cpuslocked() in takedown_cpu() takedown_cpu() is a cpu hotplug function invoking stop_machine(). The cpu hotplug machinery holds the hotplug lock for write. stop_machine() invokes get_online_cpus() as well. This is correct, but prevents the conversion of the hotplug locking to a percpu rwsem. Use stop_machine_cpuslocked() to avoid the nested call. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081548.423292433@linutronix.de
|
#
9805c673 |
|
24-May-2017 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Add __cpuhp_state_add_instance_cpuslocked() Add cpuslocked() variants for the multi instance registration so this can be called from a cpus_read_lock() protected region. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081547.321782217@linutronix.de
|
#
71def423 |
|
24-May-2017 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
cpu/hotplug: Provide cpuhp_setup/remove_state[_nocalls]_cpuslocked() Some call sites of cpuhp_setup/remove_state[_nocalls]() are within a cpus_read locked region. cpuhp_setup/remove_state[_nocalls]() call cpus_read_lock() as well, which is possible in the current implementation but prevents converting the hotplug locking to a percpu rwsem. Provide locked versions of the interfaces to avoid nested calls to cpus_read_lock(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081547.239600868@linutronix.de
|
#
8f553c49 |
|
24-May-2017 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Provide cpus_read|write_[un]lock() The counting 'rwsem' hackery of get|put_online_cpus() is going to be replaced by percpu rwsem. Rename the functions to make it clear that it's locking and not some refcount style interface. These new functions will be used for the preparatory patches which make the code ready for the percpu rwsem conversion. Rename all instances in the cpu hotplug code while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081547.080397752@linutronix.de
|
#
8ce371f9 |
|
19-Mar-2017 |
Peter Zijlstra <peterz@infradead.org> |
lockdep: Fix per-cpu static objects Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly") we try to collapse per-cpu locks into a single class by giving them all the same key. For this key we choose the canonical address of the per-cpu object, which would be the offset into the per-cpu area. This has two problems: - there is a case where we run !0 lock->key through static_obj() and expect this to pass; it doesn't for canonical pointers. - 0 is a valid canonical address. Cure both issues by redefining the canonical address as the address of the per-cpu variable on the boot CPU. Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at all, track the boot CPU in a variable. Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Borislav Petkov <bp@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-mm@kvack.org Cc: wfg@linux.intel.com Cc: kernel test robot <fengguang.wu@intel.com> Cc: LKP <lkp@01.org> Link: http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j5cr@hirez.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
dc434e05 |
|
14-Mar-2017 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
cpu/hotplug: Serialize callback invocations proper The setup/remove_state/instance() functions in the hotplug core code are serialized against concurrent CPU hotplug, but unfortunately not serialized against themself. As a consequence a concurrent invocation of these function results in corruption of the callback machinery because two instances try to invoke callbacks on remote cpus at the same time. This results in missing callback invocations and initiator threads waiting forever on the completion. The obvious solution to replace get_cpu_online() with cpu_hotplug_begin() is not possible because at least one callsite calls into these functions from a get_online_cpu() locked region. Extend the protection scope of the cpuhp_state_mutex from solely protecting the state arrays to cover the callback invocation machinery as well. Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface") Reported-and-tested-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: hpa@zytor.com Cc: mingo@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20170314150645.g4tdyoszlcbajmna@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
29930025 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> We are going to split <linux/sched/task.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ef8bd77f |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/hotplug.h> We are going to split <linux/sched/hotplug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/hotplug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3f07c014 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
0fec9557 |
|
17-Jan-2017 |
Tobias Klauser <tklauser@distanz.ch> |
cpu/hotplug: Remove unused but set variable in _cpu_down() After the recent removal of the hotplug notifiers the variable 'hasdied' in _cpu_down() is set but no longer read, leading to the following GCC warning when building with 'make W=1': kernel/cpu.c:767:7: warning: variable ‘hasdied’ set but not used [-Wunused-but-set-variable] Fix it by removing the variable. Fixes: 530e9b76ae8f ("cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20170117143501.20893-1-tklauser@distanz.ch Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4205e478 |
|
10-Jan-2017 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Provide dynamic range for prepare stage Mathieu reported that the LTTNG modules are broken as of 4.10-rc1 due to the removal of the cpu hotplug notifiers. Usually I don't care much about out of tree modules, but LTTNG is widely used in distros. There are two ways to solve that: 1) Reserve a hotplug state for LTTNG 2) Add a dynamic range for the prepare states. While #1 is the simplest solution, #2 is the proper one as we can convert in tree users, which do not care about ordering, to the dynamic range as well. Add a dynamic range which allows LTTNG to request states in the prepare stage. Reported-and-tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701101353010.3401@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b9d9d691 |
|
26-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
smp/hotplug: Undo tglxs brainfart The attempt to prevent overwriting an active state resulted in a disaster which effectively disables all dynamically allocated hotplug states. Cleanup the mess. Fixes: dc280d936239 ("cpu/hotplug: Prevent overwriting of callbacks") Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
530e9b76 |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(), register_hotcpu_notifier(), register_cpu_notifier(), __register_hotcpu_notifier(), __register_cpu_notifier(), unregister_hotcpu_notifier(), unregister_cpu_notifier(), __unregister_hotcpu_notifier(), __unregister_cpu_notifier() are unused now. Remove them and all related code. Remove also the now pointless cpu notifier error injection mechanism. The states can be executed step by step and error rollback is the same as cpu down, so any state transition can be tested w/o requiring the notifier error injection. Some CPU hotplug states are kept as they are (ab)used for hotplug state tracking. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
dc280d93 |
|
21-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Prevent overwriting of callbacks Developers manage to overwrite states blindly without thought. That's fatal and hard to debug. Add sanity checks to make it fail. This requries to restructure the code so that the dynamic state allocation happens in the same lock protected section as the actual store. Otherwise the previous assignment of 'Reserved' to the name field would trigger the overwrite check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192111.675234535@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1358e038 |
|
15-Dec-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
CPU/hotplug: Clarify description of __cpuhp_setup_state() return value When ivoked with CPUHP_AP_ONLINE_DYN state __cpuhp_setup_state() is expected to return positive value which is the hotplug state that the routine assigns. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
512f0980 |
|
15-Dec-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
cpu/hotplug: Clarify description of __cpuhp_setup_state() return value When invoked with CPUHP_AP_ONLINE_DYN state __cpuhp_setup_state() is expected to return positive value which is the hotplug state that the routine assigns. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: linux-pm@vger.kernel.org Cc: viresh.kumar@linaro.org Cc: bigeasy@linutronix.de Cc: rjw@rjwysocki.net Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1481814058-4799-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
777c6e0d |
|
07-Dec-2016 |
Michal Hocko <mhocko@suse.com> |
hotplug: Make register and unregister notifier API symmetric Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its notifiers when HOTPLUG_CPU=y while the registration might succeed even when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap might keep a stale notifier on the list on the manual clean up during the pool tear down and thus corrupt the list. Resulting in the following [ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78 [ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40 <snipped> [ 145.122628] Call Trace: [ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20 [ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400 [ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300 [ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10 [ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30 [ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0 [ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20 [ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0 [ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30 [ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70 [ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180 [ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40 [ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0 [ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0 [ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17 This can be even triggered manually by changing /sys/module/zswap/parameters/compressor multiple times. Fix this issue by making unregister APIs symmetric to the register so there are no surprises. Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU") Reported-and-tested-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Streetman <ddstreet@ieee.org> Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a705e07b |
|
12-Oct-2016 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
cpu/hotplug: Use distinct name for cpu_hotplug.dep_map Use distinctive name for cpu_hotplug.dep_map to avoid the actual cpu_hotplug.lock appearing as cpu_hotplug.lock#2 in lockdep splats. Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Gautham R . Shenoy <ego@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: trivial@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6731d4f1 |
|
23-Aug-2016 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
slab: Convert to hotplug state machine Install the callbacks via the state machine. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: linux-mm@kvack.org Cc: rt@linutronix.de Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/20160823125319.abeapfjapf2kfezp@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
e6d4989a |
|
18-Aug-2016 |
Richard Weinberger <richard@nod.at> |
relayfs: Convert to hotplug state machine Install the callbacks via the state machine. They are installed at run time but relay_prepare_cpu() does not need to be invoked by the boot CPU because relay_open() was not yet invoked and there are no pools that need to be created. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20160818125731.27256-3-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
ee1e714b |
|
18-Aug-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Remove CPU_STARTING and CPU_DYING notifier All users are converted to state machine, remove CPU_STARTING and the corresponding CPU_DYING. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160818125731.27256-2-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
677f6646 |
|
06-Sep-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Make state names consistent We should have all names in the scheme "[subsys/]facility:state]". Fix the core to comply. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3c1627e9 |
|
05-Sep-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Replace anon union Some compilers are unhappy with the anon union in the state array. Replace it with a named union. While at it align the state array initializers proper and add the missing name tags. Fixes: cf392d10b69e "cpu/hotplug: Add multi instance support" Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Fenguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: rt@linutronix.de
|
#
01b41159 |
|
10-Jun-2016 |
Lianwei Wang <lianwei.wang@gmail.com> |
cpu/hotplug: Handle unbalanced hotplug enable/disable When cpu_hotplug_enable() is called unbalanced w/o a preceeding cpu_hotplug_disable() the code emits a warning, but happily decrements the disabled counter. This causes the next operations to malfunction. Prevent the decrement and just emit a warning. Signed-off-by: Lianwei Wang <lianwei.wang@gmail.com> Cc: peterz@infradead.org Cc: linux-pm@vger.kernel.org Cc: oleg@redhat.com Link: http://lkml.kernel.org/r/1465541008-12476-1-git-send-email-lianwei.wang@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
cf392d10 |
|
12-Aug-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Add multi instance support This patch adds the ability for a given state to have multiple instances. Until now all states have a single instance and the startup / teardown callback use global variables. A few drivers need to perform a the same callbacks on multiple "instances". Currently we have three drivers in tree which all have a global list which they iterate over. With multi instance they support don't need their private list and the functionality has been moved into core code. Plus we hold the hotplug lock in core so no cpus comes/goes while instances are registered and we do rollback in error case :) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/1471024183-12666-3-git-send-email-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
a724632c |
|
12-Aug-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Rework callback invocation logic This is preparation for the following patch. This rework here changes the arguments of cpuhp_invoke_callback(). It passes now `state' and whether `startup' or `teardown' callback should be invoked. The callback then is looked up by the function. The following is a clanup of callers: - cpuhp_issue_call() has one argument less - struct cpuhp_cpu_state (which is used by the hotplug thread) gets also its callback removed. The decision if it is a single callback invocation moved to the `single' variable. Also a `bringup' variable has been added to distinguish between startup and teardown callback. - take_cpu_down() needs to start one step earlier. We always get here via CPUHP_TEARDOWN_CPU callback. Before that change cpuhp_ap_states + CPUHP_TEARDOWN_CPU pointed to an empty entry because TEARDOWN is saved in bp_states for this reason. Now that we use cpuhp_get_step() to lookup the state we must explicitly skip it in order not to invoke it twice. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/1471024183-12666-2-git-send-email-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d391e552 |
|
17-Aug-2016 |
James Morse <james.morse@arm.com> |
cpu/hotplug: Allow suspend/resume CPU to be specified disable_nonboot_cpus() assumes that the lowest numbered online CPU is the boot CPU, and that this is the correct CPU to run any power management code on. On x86 this is always correct, as CPU0 cannot (easily) by taken offline. On arm64 CPU0 can be taken offline. For hibernate/resume this means we may hibernate on a CPU other than CPU0. If the system is rebooted with kexec 'CPU0' will be assigned to a different physical CPU. This complicates hibernate/resume as now we can't trust the CPU numbers. Arch code can find the correct physical CPU, and ensure it is online before resume from hibernate begins, but also needs to influence disable_nonboot_cpus()s choice of CPU. Rename disable_nonboot_cpus() as freeze_secondary_cpus() and add an argument indicating which CPU should be left standing. Follow the logic in migrate_to_reboot_cpu() to use the lowest numbered online CPU if the requested CPU is not online. Add disable_nonboot_cpus() as an inline function that has the existing behaviour. Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
e1c105a9 |
|
03-Aug-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
hotplug: Prevent alloc/free of irq descriptors during cpu up/down (again) Now that Xen no longer allocates irqs in _cpu_up() we can restore commit a89941816726 ("hotplug: Prevent alloc/free of irq descriptors during cpu up/down") Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> CC: x86@kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
0c6d4576 |
|
17-Aug-2016 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
cpu/hotplug: Get rid of CPU_STARTING reference CPU_STARTING is scheduled for removal. There is no use of it in drivers and core code uses it only for compatibility with old-style CPU-hotplug notifiers. This patch removes therefore removes CPU_STARTING from an RCU-related comment. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
7ec99de3 |
|
30-Jun-2016 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Provide exact CPU-online tracking for RCU Up to now, RCU has assumed that the CPU-online process makes it from CPU_UP_PREPARE to set_cpu_online() within one jiffy. Given the recent rise of virtualized environments, this assumption is very clearly obsolete. Failing to meet this deadline can result in RCU paying attention to an incoming CPU for one jiffy, then ignoring it until the grace period following the one in which that CPU sets itself online. This situation might prove to be fatally disappointing to any RCU read-side critical sections that had the misfortune to execute during the time in which RCU was ignoring the slow-to-come-online CPU. This commit therefore updates RCU's internal CPU state-tracking information at notify_cpu_starting() time, thus providing RCU with an exact transition of the CPU's state from offline to online. Note that this means that incoming CPUs must not use RCU read-side critical section (other than those of SRCU) until notify_cpu_starting() time. Note also that the CPU_STARTING notifiers -are- allowed to use RCU read-side critical sections. (Of course, CPU-hotplug notifiers are rapidly becoming obsolete, so you need to act fast!) If a given architecture or CPU family needs to use RCU read-side critical sections earlier, the call to rcu_cpu_starting() from notify_cpu_starting() will need to be architecture-specific, with architectures that need early use being required to hand-place the call to rcu_cpu_starting() at some point preceding the call to notify_cpu_starting(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
aa877175 |
|
03-Aug-2016 |
Boris Ostrovsky <boris.ostrovsky@oracle.com> |
cpu/hotplug: Prevent alloc/free of irq descriptors during CPU up/down (again) Now that Xen no longer allocates irqs in _cpu_up() we can restore commit: a89941816726 ("hotplug: Prevent alloc/free of irq descriptors during cpu up/down") Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: david.vrabel@citrix.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1470244948-17674-3-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4fae16df |
|
27-Jul-2016 |
Richard Cochran <rcochran@linutronix.de> |
timers/core: Correct callback order during CPU hot plug On the tear-down path, the dead CPU callback for the timers was misplaced within the 'cpuhp_state' enumeration. There is a hidden dependency between the timers and block multiqueue. The timers callback must happen before the block multiqueue callback otherwise a RCU stall occurs. Move the timers callback to the proper place in the state machine. Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: kbuild test robot <lkp@intel.com> Fixes: 24f73b99716a ("timers/core: Convert to hotplug state machine") Signed-off-by: Richard Cochran <rcochran@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: John Stultz <john.stultz@linaro.org> Cc: rt@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1469610498-25914-1-git-send-email-rcochran@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4df83742 |
|
13-Jul-2016 |
Thomas Gleixner <tglx@linutronix.de> |
rcu: Convert rcutree to hotplug state machine Straight forward conversion to the state machine. Though the question arises whether this needs really all these state transitions to work. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153337.982013161@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
31487f83 |
|
13-Jul-2016 |
Richard Weinberger <richard@nod.at> |
smp/cfd: Convert core to hotplug state machine Install the callbacks via the state machine. They are installed at runtime so smpcfd_prepare_cpu() needs to be invoked by the boot-CPU. Signed-off-by: Richard Weinberger <richard@nod.at> [ Added the dropped CPU dying case back in. ] Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dave@stgolabs> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153337.818376366@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
24f73b99 |
|
13-Jul-2016 |
Richard Cochran <rcochran@linutronix.de> |
timers/core: Convert to hotplug state machine When tearing down, call timers_dead_cpu() before notify_dead(). There is a hidden dependency between: - timers - block multiqueue - rcutree If timers_dead_cpu() comes later than blk_mq_queue_reinit_notify() that latter function causes a RCU stall. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153337.566790058@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
27590dc1 |
|
15-Jul-2016 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimer: Convert to hotplug state machine Split out the clockevents callbacks instead of piggybacking them on hrtimers. This gets rid of a POST_DEAD user. See commit: 54e88fad223c ("sched: Make sure timers have migrated before killing the migration_thread") We just move the callback state to the proper place in the state machine. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153337.485419196@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
7ee681b2 |
|
13-Jul-2016 |
Thomas Gleixner <tglx@linutronix.de> |
workqueue: Convert to state machine callbacks Get rid of the prio ordering of the separate notifiers and use a proper state callback pair. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153335.197083890@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
00e16c3d |
|
13-Jul-2016 |
Thomas Gleixner <tglx@linutronix.de> |
perf/core: Convert to hotplug state machine Actually a nice symmetric startup/teardown pair which fits properly into the state machine concept. In the long run we should be able to invoke the startup callback for the boot CPU via the state machine and get rid of the init function which invokes it on the boot CPU. Note: This comes actually before the perf hardware callbacks. In the notifier model the hardware callbacks have a higher priority than the core callback. But that's solely for CPU offline so that hardware migration of events happens before the core is notified about the outgoing CPU. With the symetric state array model we have the following ordering: UP: core -> hardware DOWN: hardware -> core Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Siewior <bigeasy@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153333.587514098@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6a4e2451 |
|
13-Jul-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Handle early registration gracefully We switched the hotplug machinery to smpboot threads. Early registration of hotplug callbacks, i.e. from do_pre_smp_initcalls(), happens before the threads are initialized. Instead of moving the thread init, we simply handle it in the hotplug code itself and invoke the function directly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153332.896450738@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a7c73414 |
|
12-Jul-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble Xiaolong Ye reported lock debug warnings triggered by the following commit: 8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine") The bug is the following: the cpuhp_bp_states[] array is cut short when CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless and happily scribble outside of the array bounds... We need to store them in case that the state is unregistered so we can invoke the teardown function. That's independent of CONFIG_SMP. Make sure the array is large enough. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: lkp@01.org Cc: stable@vger.kernel.org Cc: tipbuild@zytor.com Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
aaddd7d1 |
|
09-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
sched/hotplug: Make activate() the last hotplug step The scheduler can handle per cpu threads before the cpu is set to active and it does not allow user space threads on the cpu before active is set. Attaching to the scheduling domains is also not required before user space threads can be handled. Move the activation to the end of the hotplug state space. That also means that deactivation is the first action when a cpu is shut down. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160310120025.597477199@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
f2785ddb |
|
09-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() Remove the hotplug notifier and make it an explicit state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160310120025.502222097@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
b2454caa |
|
09-Mar-2016 |
Peter Zijlstra <peterz@infradead.org> |
sched/hotplug: Move sync_rcu to be with set_cpu_active(false) The sync_rcu stuff is specificically for clearing bits in the active mask, such that everybody will observe the bit cleared and will not consider the cleared CPU for load-balancing etc. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160310120025.169219710@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
40190a78 |
|
09-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
sched/hotplug: Convert cpu_[in]active notifiers to state machine Now that we reduced everything into single notifiers, it's simple to move them into the hotplug state machine space. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9cf7243d |
|
09-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
sched: Make set_cpu_rq_start_time() a built in hotplug state Start distangling the maze of hotplug notifiers in the scheduler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
3b9d6da6 |
|
08-Apr-2016 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
cpu/hotplug: Fix rollback during error-out in __cpu_disable() The recent introduction of the hotplug thread which invokes the callbacks on the plugged cpu, cased the following regression: If takedown_cpu() fails, then we run into several issues: 1) The rollback of the target cpu states is not invoked. That leaves the smp threads and the hotplug thread in disabled state. 2) notify_online() is executed due to a missing skip_onerr flag. That causes that both CPU_DOWN_FAILED and CPU_ONLINE notifications are invoked which confuses quite some notifiers. 3) The CPU_DOWN_FAILED notification is not invoked on the target CPU. That's not an issue per se, but it is inconsistent and in consequence blocks the patches which rely on these states being invoked on the target CPU and not on the controlling cpu. It also does not preserve the strict call order on rollback which is problematic for the ongoing state machine conversion as well. To fix this we add a rollback flag to the remote callback machinery and invoke the rollback including the CPU_DOWN_FAILED notification on the remote cpu. Further mark the notify online state with 'skip_onerr' so we don't get a double invokation. This workaround will go away once we moved the unplug invocation to the target cpu itself. [ tglx: Massaged changelog and moved the CPU_DOWN_FAILED notifiaction to the target cpu ] Fixes: 4cb28ced23c4 ("cpu/hotplug: Create hotplug threads") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-s390@vger.kernel.org Cc: rt@linutronix.de Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: http://lkml.kernel.org/r/20160408124015.GA21960@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
d10ef6f9 |
|
08-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Document states better Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2a58c527 |
|
10-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Fix smpboot thread ordering Commit 931ef163309e moved the smpboot thread park/unpark invocation to the state machine. The move of the unpark invocation was premature as it depends on work in progress patches. As a result cpu down can fail, because rcu synchronization in takedown_cpu() eventually requires a functional softirq thread. I never encountered the problem in testing, but 0day testing managed to provide a reliable reproducer. Remove the smpboot_threads_park() call from the state machine for now and put it back into the original place after the rcu synchronization. I'm embarrassed as I knew about the dependency and still managed to get it wrong. Hotplug induced brain melt seems to be the only sensible explanation for that. Fixes: 931ef163309e "cpu/hotplug: Unpark smpboot threads from the state machine" Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
5d8eb842 |
|
02-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Remove redundant state check The check for the AP range in cpuhp_is_ap_state() is redundant after commit 8df3e07e7f21 "cpu/hotplug: Let upcoming cpu bring itself fully up" because all states above CPUHP_BRINGUP_CPU are invoked on the hotplugged cpu. Remove it. Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
71f87b2f |
|
03-Mar-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Plug death reporting race Paul noticed that the conversion of the death reporting introduced a race where the outgoing cpu might be delayed after waking the controll processor, so it might not be able to call rcu_report_dead() before being physically removed, leading to RCU stalls. We cant call complete after rcu_report_dead(), so instead of going back to busy polling, simply issue a function call to do the completion. Fixes: 27d50c7eeb0f "rcu: Make CPU_DYING_IDLE an explicit call" Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20160302201127.GA23440@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org>
|
#
27d50c7e |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
rcu: Make CPU_DYING_IDLE an explicit call Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets invoked at the proper place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
e69aab13 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Make wait for dead cpu completion based Kill the busy spinning on the control side and just wait for the hotplugged cpu to tell that it reached the dead state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.776157858@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
8df3e07e |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Let upcoming cpu bring itself fully up Let the upcoming cpu kick the hotplug thread and let itself complete the bringup. That way the controll side can just wait for the completion or later when we made the hotplug machinery async not care at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.697655464@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
1cf4f629 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Move online calls to hotplugged cpu Let the hotplugged cpu invoke the setup/teardown callbacks (CPU_ONLINE/CPU_DOWN_PREPARE) itself. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.536364371@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4cb28ced |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Create hotplug threads In order to let the hotplugged cpu take care of the setup/teardown, we need a seperate hotplug thread. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.454541272@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
2e1a3483 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Split out the state walk into functions We need that for running callbacks on the AP and the BP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.374946234@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
931ef163 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Unpark smpboot threads from the state machine Handle the smpboot threads in the state machine. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.295777684@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
949338e3 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Move scheduler cpu_online notifier to hotplug core Move the scheduler cpu online notifier part to the hotplug core. This is anyway the highest priority callback and we need that functionality right now for the next changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.200791046@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
5b7aa87e |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Implement setup/removal interface Implement function which allow to setup/remove hotplug state callbacks. The default behaviour for setup is to call the startup function for this state for (or on) all cpus which have a hotplug state >= the installed state. The default behaviour for removal is to call the teardown function for this state for (or on) all cpus which have a hotplug state >= the installed state. This includes rollback to the previous state in case of failure. A special state is CPUHP_ONLINE_DYN. Its for dynamically registering a hotplug callback pair. This is for drivers which have no dependencies to avoid that we need to allocate CPUHP states for each of them For both setup and remove helper functions are provided, which prevent the core to issue the callbacks. This simplifies the conversion of existing hotplug notifiers. [ Dynamic registering implemented by Sebastian Siewior ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.103464877@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
757c989b |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Make target state writeable Make it possible to write a target state to the per cpu state file, so we can switch between states. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.022814799@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
98f8cdce |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Add sysfs state interface Add a sysfs interface so we can actually see in which state the cpus are in. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.942257522@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
af1f4045 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Hand in target state to _cpu_up/down We want to be able to bringup/teardown the cpu to a particular state. Add a target argument to _cpu_up/down. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.862113133@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
4baa0afc |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Convert the hotplugged cpu work to a state machine Move the functions which need to run on the hotplugged processor into a state machine array and let the code iterate through these functions. In a later state, this will grow synchronization points between the control processor and the hotplugged processor, so we can move the various architecture implementations of the synchronizations to the core. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.770651526@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
cff7d378 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Convert to a state machine for the control processor Move the split out steps into a callback array and let the cpu_up/down code iterate through the array functions. For now most of the callbacks are asymmetric to resemble the current hotplug maze. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.671816690@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
98458172 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Split out cpu down functions Split cpu_down in separate functions in preparation for state machine conversion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.511796562@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
ba997462 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Restructure cpu_up code Split out into separate functions, so we can convert it to a state machine. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.429389195@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
090e77c3 |
|
26-Feb-2016 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Restructure FROZEN state handling There are only a few callbacks which really care about FROZEN vs. !FROZEN. No need to have extra states for this. Publish the frozen state in an extra variable which is updated under the hotplug lock and let the users interested deal with it w/o imposing that extra state checks on everyone. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182340.334912357@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9425676a |
|
20-Jan-2016 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
kernel/cpu.c: make set_cpu_* static inlines Almost all callers of the set_cpu_* functions pass an explicit true or false. Making them static inline thus replaces the function calls with a simple set_bit/clear_bit, saving some .text. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5aec01b8 |
|
20-Jan-2016 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
kernel/cpu.c: eliminate cpu_*_mask Replace the variables cpu_possible_mask, cpu_online_mask, cpu_present_mask and cpu_active_mask with macros expanding to expressions of the same type and value, eliminating some indirection. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4b804c85 |
|
20-Jan-2016 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
kernel/cpu.c: export __cpu_*_mask Exporting the cpumasks __cpu_possible_mask and friends will allow us to remove the extra indirection through the cpu_*_mask variables. It will also allow the set_cpu_* functions to become static inlines, which will give a .text reduction. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c4c54dd1 |
|
20-Jan-2016 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
kernel/cpu.c: change type of cpu_possible_bits and friends Change cpu_possible_bits and friends (online, present, active) from being bitmaps that happen to have the right size to actually being struct cpumasks. Also rename them to __cpu_xyz_mask. This is mostly a small cleanup in preparation for exporting them and, eventually, eliminating the extra indirection through the cpu_xyz_mask variables. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
07f06cb3 |
|
09-Oct-2015 |
Peter Zijlstra <peterz@infradead.org> |
sched: Start stopper early Ensure the stopper thread is active 'early', because the load balancer pretty much assumes that its available. And when 'online && active' the load-balancer is fully available. Not only the numa balancing stop_two_cpus() caller relies on it, but also the self migration stuff does, and at CPU_ONLINE time the cpu really is 'free' to run anything. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151009160054.GA10176@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
c00166d8 |
|
09-Oct-2015 |
Oleg Nesterov <oleg@redhat.com> |
stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 1. Change smpboot_unpark_thread() to check ->selfparking, just like smpboot_park_thread() does. 2. Introduce stop_machine_unpark() which sets ->enabled and calls kthread_unpark(). 3. Change smpboot_thread_call() and cpu_stop_init() to call stop_machine_unpark() by hand. This way: - IMO the ->selfparking logic becomes more consistent. - We can kill the smp_hotplug_thread->pre_unpark() method. - We can easily unpark the stopper thread earlier. Say, we can move stop_machine_unpark() from smpboot_thread_call() to sched_cpu_active() as Peter suggests. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151009160049.GA10166@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
233e7f26 |
|
08-Oct-2015 |
Oleg Nesterov <oleg@redhat.com> |
stop_machine: Ensure that a queued callback will be called before cpu_stop_park() cpu_stop_queue_work() checks stopper->enabled before it queues the work, but ->enabled == T can only guarantee cpu_stop_signal_done() if we race with cpu_down(). This is not enough for stop_two_cpus() or stop_machine(), they will deadlock if multi_cpu_stop() won't be called by one of the target CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex. But stop_two_cpus() has to check cpu_active() to avoid the same race with hotplug, and this check is very unobvious and probably not even correct if we race with cpu_up(). Change cpu_down() pass to clear ->enabled before cpu_stopper_thread() flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set. Note also that smpboot_thread_call() calls cpu_stop_unpark() which sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until cpu_stopper_thread() is called at least once. This all means that if cpu_stop_queue_work() succeeds, we know that work->fn() will be called. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151008145131.GA18139@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
02ef3c4a |
|
07-Aug-2015 |
Paul E. McKenney <paulmck@kernel.org> |
cpu: Remove try_get_online_cpus() Now that synchronize_sched_expedited() no longer uses it, there are no users of try_get_online_cpus() in mainline. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
a75a6068 |
|
10-Sep-2015 |
Oleg Nesterov <oleg@redhat.com> |
cpu/hotplug: Read_lock(tasklist_lock) doesn't need to disable irqs check_for_tasks() doesn't need to disable irqs, recursive read_lock() from interrupt is fine. While at it, s/do_each_thread/for_each_process_thread/. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kirill Tkhai <ktkhai@odin.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150910130750.GA20055@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
71cf5aee |
|
19-Jul-2015 |
Mathias Krause <minipli@googlemail.com> |
kernel, cpu: Remove bogus __ref annotations cpu_chain lost its __cpuinitdata annotation long ago in commit 5c113fbeed7a ("fix cpu_chain section mismatch..."). This and the global __cpuinit annotation drop in v3.11 vanished the need to mark all users, including transitive ones, with the __ref annotation. Just get rid of it to not wrongly hide section mismatches. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
32145c46 |
|
05-Aug-2015 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
cpu-hotplug: export cpu_hotplug_enable/cpu_hotplug_disable Hyper-V module needs to disable cpu hotplug (offlining) as there is no support from hypervisor side to reassign already opened event channels to a different CPU. Currently it is been done by altering smp_ops.cpu_disable but it is hackish. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
89af7ba5 |
|
05-Aug-2015 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
cpu-hotplug: convert cpu_hotplug_disabled to a counter As a prerequisite to exporting cpu_hotplug_enable/cpu_hotplug_disable functions to modules we need to convert cpu_hotplug_disabled to a counter to properly support disable -> disable -> enable call sequences. E.g. after Hyper-V vmbus module (which is supposed to be the first user of exported cpu_hotplug_enable/cpu_hotplug_disable) did cpu_hotplug_disable() hibernate path calls disable_nonboot_cpus() and if we hit an error in _cpu_down() enable_nonboot_cpus() will be called on the failure path (thus making cpu_hotplug_disabled = 0 and leaving cpu hotplug in 'enabled' state). Same problem is possible if more than 1 module use cpu_hotplug_disable/cpu_hotplug_enable on their load/unload paths. When one of these modules is been unloaded it is logical to leave cpu hotplug in 'disabled' state. To support the change we need to increse cpu_hotplug_disabled counter in disable_nonboot_cpus() unconditionally as all users of disable_nonboot_cpus() are supposed to do enable_nonboot_cpus() in case an error was returned. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
7eeb088e |
|
29-Jun-2015 |
Oleg Nesterov <oleg@redhat.com> |
stop_machine: Unexport __stop_machine() The only caller outside of stop_machine.c is _cpu_down(), it can use stop_machine(). get_online_cpus() is fine under cpu_hotplug_begin(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: viro@ZenIV.linux.org.uk Link: http://lkml.kernel.org/r/20150630012951.GA23934@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
779de6ce |
|
10-Jun-2015 |
Paul E. McKenney <paulmck@kernel.org> |
cpu: Wait for RCU grace periods concurrently In kernels built with CONFIG_PREEMPT, _cpu_down() waits for RCU and RCU-sched grace periods back-to-back, incurring quite a bit more latency than required. This commit therefore uses the new synchronize_rcu_mult() to allow waiting for both grace periods concurrently. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
ce0d3c0a |
|
14-Jul-2015 |
Thomas Gleixner <tglx@linutronix.de> |
genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now Boris reported that the sparse_irq protection around __cpu_up() in the generic code causes a regression on Xen. Xen allocates interrupts and some more in the xen_cpu_up() function, so it deadlocks on the sparse_irq_lock. There is no simple fix for this and we really should have the protection for all architectures, but for now the only solution is to move it to x86 where actual wreckage due to the lack of protection has been observed. Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down' Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: xen-devel <xen-devel@lists.xenproject.org>
|
#
a8994181 |
|
05-Jul-2015 |
Thomas Gleixner <tglx@linutronix.de> |
hotplug: Prevent alloc/free of irq descriptors during cpu up/down When a cpu goes up some architectures (e.g. x86) have to walk the irq space to set up the vector space for the cpu. While this needs extra protection at the architecture level we can avoid a few race conditions by preventing the concurrent allocation/free of irq descriptors and the associated data. When a cpu goes down it moves the interrupts which are targeted to this cpu away by reassigning the affinities. While this happens interrupts can be allocated and freed, which opens a can of race conditions in the code which reassignes the affinities because interrupt descriptors might be freed underneath. Example: CPU1 CPU2 cpu_up/down irq_desc = irq_to_desc(irq); remove_from_radix_tree(desc); raw_spin_lock(&desc->lock); free(desc); We could protect the irq descriptors with RCU, but that would require a full tree change of all accesses to interrupt descriptors. But fortunately these kind of race conditions are rather limited to a few things like cpu hotplug. The normal setup/teardown is very well serialized. So the simpler and obvious solution is: Prevent allocation and freeing of interrupt descriptors accross cpu hotplug. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
|
#
5ffa70f0 |
|
18-Apr-2015 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
kernel/cpu.c: remove new instance of __cpuinit that crept back in We removed __cpuinit support (leaving no-op stubs) quite some time ago. However a new instance was added in commit 00df35f991914db6b8bde8cf0980 ("cpu: Defer smpboot kthread unparking until CPU known to scheduler") Since we want to clobber the stubs soon, get this removed now. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
927da9df |
|
27-Apr-2015 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
cpu: Remove new instance of __cpuinit that crept back in We removed __cpuinit support (leaving no-op stubs) quite some time ago. However a new instance was added in commit 00df35f991914db6b8bde8cf0980 ("cpu: Defer smpboot kthread unparking until CPU known to scheduler") Since we want to clobber the stubs soon, get this removed now. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
64eaf974 |
|
15-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
cpu: Handle smpboot_unpark_threads() uniformly Commit 00df35f99191 (cpu: Defer smpboot kthread unparking until CPU known to scheduler) put the online path's call to smpboot_unpark_threads() into a CPU-hotplug notifier. This commit places the offline-failure paths call into the same notifier for the sake of uniformity. Note that it is not currently possible to place the offline path's call to smpboot_park_threads() into an existing notifier because the CPU_DYING notifiers run in a restricted environment, and the CPU_UP_PREPARE notifiers run too soon. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
00df35f9 |
|
12-Apr-2015 |
Paul E. McKenney <paulmck@kernel.org> |
cpu: Defer smpboot kthread unparking until CPU known to scheduler Currently, smpboot_unpark_threads() is invoked before the incoming CPU has been added to the scheduler's runqueue structures. This might potentially cause the unparked kthread to run on the wrong CPU, since the correct CPU isn't fully set up yet. That causes a sporadic, hard to debug boot crash triggering on some systems, reported by Borislav Petkov, and bisected down to: 2a442c9c6453 ("x86: Use common outgoing-CPU-notification code") This patch places smpboot_unpark_threads() in a CPU hotplug notifier with priority set so that these kthreads are unparked just after the CPU has been added to the runqueues. Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
a49b116d |
|
02-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
clockevents: Cleanup dead cpu explicitely clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the cleanup function for a dead cpu and invoke it directly from the cpu down code. Make it conditional on CPU_HOTPLUG as well. Temporary change, will be refined in the future. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebased, added clockevents_notify() removal ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
52c063d1 |
|
02-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
clockevents: Make tick handover explicit clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the tick_handover call and invoke it explicitely from the hotplug code. Temporary solution will be cleaned up in later patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
345527b1 |
|
30-Mar-2015 |
Preeti U Murthy <preeti@linux.vnet.ibm.com> |
clockevents: Fix cpu_down() race for hrtimer based broadcasting It was found when doing a hotplug stress test on POWER, that the machine either hit softlockups or rcu_sched stall warnings. The issue was traced to commit: 7cba160ad789 ("powernv/cpuidle: Redesign idle states management") which exposed the cpu_down() race with hrtimer based broadcast mode: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") The race is the following: Assume CPU1 is the CPU which holds the hrtimer broadcasting duty before it is taken down. CPU0 CPU1 cpu_down() take_cpu_down() disable_interrupts() cpu_die() while (CPU1 != CPU_DEAD) { msleep(100); switch_to_idle(); stop_cpu_timer(); schedule_broadcast(); } tick_cleanup_cpu_dead() take_over_broadcast() So after CPU1 disabled interrupts it cannot handle the broadcast hrtimer anymore, so CPU0 will be stuck forever. Fix this by explicitly taking over broadcast duty before cpu_die(). This is a temporary workaround. What we really want is a callback in the clockevent device which allows us to do that from the dying CPU by pushing the hrtimer onto a different cpu. That might involve an IPI and is definitely more complex than this immediate fix. Changelog was picked up from: https://lkml.org/lkml/2015/2/16/213 Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: nicolas.pitre@linaro.org Cc: peterz@infradead.org Cc: rjw@rjwysocki.net Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com [ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
528a25b0 |
|
28-Jan-2015 |
Paul E. McKenney <paulmck@kernel.org> |
cpu: Make CPU-offline idle-loop transition point more precise This commit uses a per-CPU variable to make the CPU-offline code path through the idle loop more precise, so that the outgoing CPU is guaranteed to make it into the idle loop before it is powered off. This commit is in preparation for putting the RCU offline-handling code on this code path, which will eliminate the magic one-jiffy wait that RCU uses as the maximum time for an outgoing CPU to get all the way through the scheduler. The magic one-jiffy wait for incoming CPUs remains a separate issue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
87af9e7f |
|
12-Dec-2014 |
David Hildenbrand <dahi@linux.vnet.ibm.com> |
hotplugcpu: Avoid deadlocks by waking active_writer Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited grace periods") introduced another problem that can easily be reproduced by starting/stopping cpus in a loop. E.g.: for i in `seq 5000`; do echo 1 > /sys/devices/system/cpu/cpu1/online echo 0 > /sys/devices/system/cpu/cpu1/online done Will result in: INFO: task /cpu_start_stop:1 blocked for more than 120 seconds. Call Trace: ([<00000000006a028e>] __schedule+0x406/0x91c) [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4 [<0000000000130ff6>] _cpu_up+0x3e/0x1c4 [<0000000000131232>] cpu_up+0xb6/0xd4 [<00000000004a5720>] device_online+0x80/0xc0 [<00000000004a57f0>] online_store+0x90/0xb0 ... And a deadlock. Problem is that if the last ref in put_online_cpus() can't get the cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer might never be woken up, therefore never exiting the loop in cpu_hotplug_begin(). This fix removes puts_pending and turns refcount into an atomic variable. We also introduce a wait queue for the active_writer, to avoid possible races and use-after-free. There is no need to take the lock in put_online_cpus() anymore. Can't reproduce it with this fix. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
62db99f4 |
|
22-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
cpu: Avoid puts_pending overflow A long string of get_online_cpus() with each followed by a put_online_cpu() that fails to acquire cpu_hotplug.lock can result in overflow of the cpu_hotplug.puts_pending counter. Although this is perhaps improbably, a system with absolutely no CPU-hotplug operations will have an arbitrarily long time in which this overflow could occur. This commit therefore adds overflow checks to get_online_cpus() and try_get_online_cpus(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
|
#
b2c4623d |
|
22-Oct-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: More on deadlock between CPU hotplug and expedited grace periods Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and expedited grace periods) was incomplete. Although it did eliminate deadlocks involving synchronize_sched_expedited()'s acquisition of cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar deadlock involving acquisition of this same lock via put_online_cpus(). This deadlock became apparent with testing involving hibernation. This commit therefore changes put_online_cpus() acquisition of this lock to be conditional, and increments a new cpu_hotplug.puts_pending field in case of acquisition failure. Then cpu_hotplug_begin() checks for this new field being non-zero, and applies any changes to cpu_hotplug.refcount. Reported-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Tested-by: Borislav Petkov <bp@suse.de>
|
#
dd56af42 |
|
25-Aug-2014 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Eliminate deadlock between CPU hotplug and expedited grace periods Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Tested-by: Lan Tianyu <tianyu.lan@intel.com>
|
#
b728ca06 |
|
24-Jun-2014 |
Kirill Tkhai <ktkhai@parallels.com> |
sched: Rework check_for_tasks() 1) Iterate thru all of threads in the system. Check for all threads, not only for group leaders. 2) Check for p->on_rq instead of p->state and cputime. Preempted task in !TASK_RUNNING state OR just created task may be queued, that we want to be reported too. 3) Use read_lock() instead of write_lock(). This function does not change any structures, and read_lock() is enough. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ben Segall <bsegall@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Konstantin Khorenko <khorenko@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Turner <pjt@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Todd E Brandt <todd.e.brandt@linux.intel.com> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1403684395.3462.44.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
bb3632c6 |
|
06-Jun-2014 |
Todd E Brandt <todd.e.brandt@linux.intel.com> |
PM / sleep: trace events for suspend/resume Adds trace events that give finer resolution into suspend/resume. These events are graphed in the timelines generated by the analyze_suspend.py script. They represent large areas of time consumed that are typical to suspend and resume. The event is triggered by calling the function "trace_suspend_resume" with three arguments: a string (the name of the event to be displayed in the timeline), an integer (case specific number, such as the power state or cpu number), and a boolean (where true is used to denote the start of the timeline event, and false to denote the end). The suspend_resume trace event reproduces the data that the machine_suspend trace event did, so the latter has been removed. Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
84117da5 |
|
04-Jun-2014 |
Fabian Frederick <fabf@skynet.be> |
kernel/cpu.c: convert printk to pr_foo() no level printk converted to pr_warn (if err) no level printk converted to pr_info (disabling non-boot cpus) Other printk converted to respective level. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6acbfb96 |
|
15-May-2014 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
sched: Fix hotplug vs. set_cpus_allowed_ptr() Lai found that: WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b() ... migration_cpu_stop+0x1d/0x22 was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is always a sub-set of cpu_online_mask. This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness"). So set active and online at the same time to avoid this particular problem. Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness") Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
93ae4f97 |
|
10-Mar-2014 |
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> |
CPU hotplug: Provide lockless versions of callback registration functions The following method of CPU hotplug callback registration is not safe due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock and the cpu_hotplug.lock. get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); The deadlock is shown below: CPU 0 CPU 1 ----- ----- Acquire cpu_hotplug.lock [via get_online_cpus()] CPU online/offline operation takes cpu_add_remove_lock [via cpu_maps_update_begin()] Try to acquire cpu_add_remove_lock [via register_cpu_notifier()] CPU online/offline operation tries to acquire cpu_hotplug.lock [via cpu_hotplug_begin()] *** DEADLOCK! *** The problem here is that callback registration takes the locks in one order whereas the CPU hotplug operations take the same locks in the opposite order. To avoid this issue and to provide a race-free method to register CPU hotplug callbacks (along with initialization of already online CPUs), introduce new variants of the callback registration APIs that simply register the callbacks without holding the cpu_add_remove_lock during the registration. That way, we can avoid the ABBA scenario. However, we will need to hold the cpu_add_remove_lock throughout the entire critical section, to protect updates to the callback/notifier chain. This can be achieved by writing the callback registration code as follows: cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ] for_each_online_cpu(cpu) init_cpu(cpu); /* This doesn't take the cpu_add_remove_lock */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ] Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin() because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD notifiers, and hence get_online_cpus() cannot provide the necessary synchronization to protect the callback/notifier chains against concurrent reads and writes. On the other hand, since the cpu_add_remove_lock protects the entire hotplug operation (including CPU_POST_DEAD), we can use cpu_maps_update_begin/done() to guarantee proper synchronization. Also, since cpu_maps_update_begin/done() is like a super-set of get/put_online_cpus(), the former naturally protects the critical sections from concurrent hotplug operations. Since the names cpu_maps_update_begin/done() don't make much sense in CPU hotplug callback registration scenarios, we'll introduce new APIs named cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done(). In summary, introduce the lockless variants of un/register_cpu_notifier() and also export the cpu_notifier_register_begin/done() APIs for use by modules. This way, we provide a race-free way to register hotplug callbacks as well as perform initialization for the CPUs that are already online. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
a19423b9 |
|
10-Mar-2014 |
Gautham R. Shenoy <ego@linux.vnet.ibm.com> |
CPU hotplug: Add lockdep annotations to get/put_online_cpus() Add lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin()/cpu_hotplug_end(). Cc: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
106dd5af |
|
12-Nov-2013 |
Michael wang <wangyun@linux.vnet.ibm.com> |
sched: Fix endless sync_sched/rcu() loop inside _cpu_down() Commit 6acce3ef8: sched: Remove get_online_cpus() usage tries to do sync_sched/rcu() inside _cpu_down() but triggers: INFO: task swapper/0:1 blocked for more than 120 seconds. ... [<ffffffff811263dc>] synchronize_rcu+0x2c/0x30 [<ffffffff81d1bd82>] _cpu_down+0x2b2/0x340 ... It was caused by that in the rcu boost case we rely on smpboot thread to finish the rcu callback, which has already been parked before sync in here and leads to the endless sync_sched/rcu(). This patch exchanges the sequence of smpboot_park_threads() and sync_sched/rcu() to fix the bug. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
01b0f197 |
|
12-Nov-2013 |
Toshi Kani <toshi.kani@hp.com> |
cpu/mem hotplug: add try_online_node() for cpu_up() cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call mem_online_node() to put its node online if offlined and then call build_all_zonelists() to initialize the zone list. These steps are specific to memory hotplug, and should be managed in mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the whole steps. For this reason, this patch replaces mem_online_node() with try_online_node(), which performs the whole steps with lock_memory_hotplug() held. try_online_node() is named after try_offline_node() as they have similar purpose. There is no functional change in this patch. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6acce3ef |
|
11-Oct-2013 |
Peter Zijlstra <peterz@infradead.org> |
sched: Remove get_online_cpus() usage Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b9d10be7 |
|
12-Aug-2013 |
Toshi Kani <toshi.kani@hp.com> |
ACPI / processor: Acquire writer lock to update CPU maps CPU system maps are protected with reader/writer locks. The reader lock, get_online_cpus(), assures that the maps are not updated while holding the lock. The writer lock, cpu_hotplug_begin(), is used to udpate the cpu maps along with cpu_maps_update_begin(). However, the ACPI processor handler updates the cpu maps without holding the the writer lock. acpi_map_lsapic() is called from acpi_processor_hotadd_init() to update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic() is called from acpi_processor_remove() to update cpu_possible_mask. Currently, they are either unprotected or protected with the reader lock, which is not correct. For example, the get_online_cpus() below is supposed to assure that cpu_possible_mask is not changed while the code is iterating with for_each_possible_cpu(). get_online_cpus(); for_each_possible_cpu(cpu) { : } put_online_cpus(); However, this lock has no protection with CPU hotplug since the ACPI processor handler does not use the writer lock when it updates cpu_possible_mask. The reader lock does not serialize within the readers. This patch protects them with the writer lock with cpu_hotplug_begin() along with cpu_maps_update_begin(), which must be held before calling cpu_hotplug_begin(). It also protects arch_register_cpu() / arch_unregister_cpu(), which creates / deletes a sysfs cpu device interface. For this purpose it changes cpu_hotplug_begin() and cpu_hotplug_done() to global and exports them in cpu.h. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
0db0628d |
|
19-Jun-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
kernel: delete __cpuinit usage from all core kernel files The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the uses of the __cpuinit macros from C files in the core kernel directories (kernel, init, lib, mm, and include) that don't really have a specific maintainer. [1] https://lkml.org/lkml/2013/5/20/589 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
16e53dbf |
|
12-Jun-2013 |
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> |
CPU hotplug: provide a generic helper to disable/enable CPU hotplug There are instances in the kernel where we would like to disable CPU hotplug (from sysfs) during some important operation. Today the freezer code depends on this and the code to do it was kinda tailor-made for that. Restructure the code and make it generic enough to be useful for other usecases too. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
14e568e7 |
|
30-Jan-2013 |
Thomas Gleixner <tglx@linutronix.de> |
stop_machine: Use smpboot threads Use the smpboot thread infrastructure. Mark the stopper thread selfparking and park it after it has finished the take_cpu_down() work. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Richard Weinberger <rw@linutronix.de> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
6fac4829 |
|
13-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Use accessors to read task cputime stats This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
5e5041f3 |
|
22-Oct-2012 |
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> |
ACPI / processor: prevent cpu from becoming online Even if acpi_processor_handle_eject() offlines cpu, there is a chance to online the cpu after that. So the patch closes the window by using get/put_online_cpus(). Why does the patch change _cpu_up() logic? The patch cares the race of hot-remove cpu and _cpu_up(). If the patch does not change it, there is the following race. hot-remove cpu | _cpu_up() ------------------------------------- ------------------------------------ call acpi_processor_handle_eject() | call cpu_down() | call get_online_cpus() | | call cpu_hotplug_begin() and stop here call arch_unregister_cpu() | call acpi_unmap_lsapic() | call put_online_cpus() | | start and continue _cpu_up() return acpi_processor_remove() | continue hot-remove the cpu | So _cpu_up() can continue to itself. And hot-remove cpu can also continue itself. If the patch changes _cpu_up() logic, the race disappears as below: hot-remove cpu | _cpu_up() ----------------------------------------------------------------------- call acpi_processor_handle_eject() | call cpu_down() | call get_online_cpus() | | call cpu_hotplug_begin() and stop here call arch_unregister_cpu() | call acpi_unmap_lsapic() | cpu's cpu_present is set | to false by set_cpu_present()| call put_online_cpus() | | start _cpu_up() | check cpu_present() and return -EINVAL return acpi_processor_remove() | continue hot-remove the cpu | Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
6e32d479 |
|
13-Nov-2012 |
Fenghua Yu <fenghua.yu@intel.com> |
kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback cpu_hotplug_pm_callback should have higher priority than bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug to avoid race during bsp online checking. This is to hightlight the priorities between the two callbacks in case people may overlook the order. Ideally the priorities should be defined in macro/enum instead of fixed values. To do that, a seperate patchset may be pushed which will touch serveral other generic files and is out of scope of this patchset. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.com Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
075663d1 |
|
08-Oct-2012 |
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> |
CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() The synchronization between CPU hotplug readers and writers is achieved by means of refcounting, safeguarded by the cpu_hotplug.lock. get_online_cpus() increments the refcount, whereas put_online_cpus() decrements it. If we ever hit an imbalance between the two, we end up compromising the guarantees of the hotplug synchronization i.e, for example, an extra call to put_online_cpus() can end up allowing a hotplug reader to execute concurrently with a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases where the refcount can go negative, and also attempt to fix it up, so that we can continue to run. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
816afe4f |
|
06-Aug-2012 |
Rusty Russell <rusty@rustcorp.com.au> |
x86/smp: Don't ever patch back to UP if we unplug cpus We still patch SMP instructions to UP variants if we boot with a single CPU, but not at any other time. In particular, not if we unplug CPUs to return to a single cpu. Paul McKenney points out: mean offline overhead is 6251/48=130.2 milliseconds. If I remove the alternatives_smp_switch() from the offline path [...] the mean offline overhead is 550/42=13.1 milliseconds Basically, we're never going to get those 120ms back, and the code is pretty messy. We get rid of: 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the documentation is wrong. It's now the default. 2) The skip_smp_alternatives flag used by suspend. 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end() which were only used to set this one flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paul.mckenney@us.ibm.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
f97f8f06 |
|
16-Jul-2012 |
Thomas Gleixner <tglx@linutronix.de> |
smpboot: Provide infrastructure for percpu hotplug threads Provide a generic interface for setting up and tearing down percpu threads. On registration the threads for already online cpus are created and started. On deregistration (modules) the threads are stoppped. During hotplug operations the threads are created, started, parked and unparked. The datastructure for registration provides a pointer to percpu storage space and optional setup, cleanup, park, unpark functions. These functions are called when the thread state changes. Each implementation has to provide a function which is queried and returns whether the thread should run and the thread function itself. The core code handles all state transitions and avoids duplicated code in the call sites. [ paulmck: Preemption leak fix ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
9adb62a5 |
|
31-Jul-2012 |
Jiang Liu <jiang.liu@huawei.com> |
mm/hotplug: correctly setup fallback zonelists when creating new pgdat When hotadd_new_pgdat() is called to create new pgdat for a new node, a fallback zonelist should be created for the new node. There's code to try to achieve that in hotadd_new_pgdat() as below: /* * The node we allocated has no zone fallback lists. For avoiding * to access not-initialized zonelist, build here. */ mutex_lock(&zonelists_mutex); build_all_zonelists(pgdat, NULL); mutex_unlock(&zonelists_mutex); But it doesn't work as expected. When hotadd_new_pgdat() is called, the new node is still in offline state because node_set_online(nid) hasn't been called yet. And build_all_zonelists() only builds zonelists for online nodes as: for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); build_zonelists(pgdat); build_zonelist_cache(pgdat); } Though we hope to create zonelist for the new pgdat, but it doesn't. So add a new parameter "pgdat" the build_all_zonelists() to build pgdat for the new pgdat too. Signed-off-by: Jiang Liu <liuj97@gmail.com> Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Keping Chen <chenkeping@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e4cc2f87 |
|
31-May-2012 |
Anton Vorontsov <anton.vorontsov@linaro.org> |
kernel/cpu.c: document clear_tasks_mm_cpumask() Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check: the function is only suitable for offlined CPUs, and if called inappropriately, the kernel should scream aloud. [akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols] Suggested-by: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cb79295e |
|
31-May-2012 |
Anton Vorontsov <anton.vorontsov@linaro.org> |
cpu: introduce clear_tasks_mm_cpumask() helper Many architectures clear tasks' mm_cpumask like this: read_lock(&tasklist_lock); for_each_process(p) { if (p->mm) cpumask_clear_cpu(cpu, mm_cpumask(p->mm)); } read_unlock(&tasklist_lock); Depending on the context, the code above may have several problems, such as: 1. Working with task->mm w/o getting mm or grabing the task lock is dangerous as ->mm might disappear (exit_mm() assigns NULL under task_lock(), so tasklist lock is not enough). 2. Checking for process->mm is not enough because process' main thread may exit or detach its mm via use_mm(), but other threads may still have a valid mm. This patch implements a small helper function that does things correctly, i.e.: 1. We take the task's lock while whe handle its mm (we can't use get_task_mm()/mmput() pair as mmput() might sleep); 2. To catch exited main thread case, we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with task lock held). Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in the new helper, instead we take the rcu read lock. We can do this because the function is called after the cpu is taken down and marked offline, so no new tasks will get this cpu set in their mm mask. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3bb5d2ee |
|
20-Apr-2012 |
Suresh Siddha <suresh.b.siddha@intel.com> |
smp, idle: Allocate idle thread for each possible cpu during boot percpu areas are already allocated during boot for each possible cpu. percpu idle threads can be considered as an extension of the percpu areas, and allocate them for each possible cpu during boot. This will eliminate the need for workqueue based idle thread allocation. In future we can move the idle thread area into the percpu area too. [ tglx: Moved the loop into smpboot.c and added an error check when the init code failed to allocate an idle thread for a cpu which should be onlined ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: venki@google.com Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
29d5e047 |
|
20-Apr-2012 |
Thomas Gleixner <tglx@linutronix.de> |
smp: Provide generic idle thread allocation All SMP architectures have magic to fork the idle task and to store it for reusage when cpu hotplug is enabled. Provide a generic infrastructure for it. Create/reinit the idle thread for the cpu which is brought up in the generic code and hand the thread pointer to the architecture code via __cpu_up(). Note, that fork_idle() is called via a workqueue, because this guarantees that the idle thread does not get a reference to a user space VM. This can happen when the boot process did not bring up all possible cpus and a later cpu_up() is initiated via the sysfs interface. In that case fork_idle() would be called in the context of the user space task and take a reference on the user space VM. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Richard Weinberger <richard@nod.at> Cc: x86@kernel.org Acked-by: Venkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/r/20120420124557.102478630@linutronix.de
|
#
38498a67 |
|
20-Apr-2012 |
Thomas Gleixner <tglx@linutronix.de> |
smp: Add generic smpboot facility Start a new file, which will hold SMP and CPU hotplug related generic infrastructure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Richard Weinberger <richard@nod.at> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20120420124557.035417523@linutronix.de
|
#
8239c25f |
|
20-Apr-2012 |
Thomas Gleixner <tglx@linutronix.de> |
smp: Add task_struct argument to __cpu_up() Preparatory patch to make the idle thread allocation for secondary cpus generic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Richard Weinberger <richard@nod.at> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
|
#
64861634 |
|
15-Dec-2011 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[S390] cputime: add sparse checking and cleanup Make cputime_t and cputime64_t nocast to enable sparse checking to detect incorrect use of cputime. Drop the cputime macros for simple scalar operations. The conversion macros are still needed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
a513f6ba |
|
11-Dec-2011 |
Paul E. McKenney <paulmck@kernel.org> |
cpu: Export cpu_up() Building rcutorture as a module requires cpu_up() as well as cpu_down() exported, so apply EXPORT_SYMBOL_GPL(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
d7268a31 |
|
15-Nov-2011 |
Fenghua Yu <fenghua.yu@intel.com> |
CPU: Add right qualifiers for alloc_frozen_cpus() and cpu_hotplug_pm_sync_init() Add __init for functions alloc_frozen_cpus() and cpu_hotplug_pm_sync_init() because they are only called during boot time. Add static for function cpu_hotplug_pm_sync_init() because its scope is limited in this file only. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
#
79cfbdfa |
|
02-Nov-2011 |
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> |
PM / Sleep: Fix race between CPU hotplug and freezer The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down() functions depend on the value of the 'tasks_frozen' argument passed to them (which indicates whether tasks have been frozen or not). (Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN, CPU_DEAD, CPU_DEAD_FROZEN). Thus, it is essential that while the callbacks for those notifications are running, the state of the system with respect to the tasks being frozen or not remains unchanged, *throughout that duration*. Hence there is a need for synchronizing the CPU hotplug code with the freezer subsystem. Since the freezer is involved only in the Suspend/Hibernate call paths, this patch hooks the CPU hotplug code to the suspend/hibernate notifiers PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent the race between CPU hotplug and freezer, thus ensuring that CPU hotplug notifications will always be run with the state of the system really being what the notifications indicate, _throughout_ their execution time. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
#
9984de1a |
|
23-May-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
kernel: Map most files to use export.h instead of module.h The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
4d51985e |
|
22-Mar-2011 |
Michael Rodriguez <dkingston02@gmail.com> |
kernel/cpu.c: fix many errors related to style. Change the printk() calls to have the KERN_INFO/KERN_ERROR stuff, and fixes other coding style errors. Not _all_ of them are gone, though. [akpm@linux-foundation.org: revert the bits I disagree with] Signed-off-by: Michael Rodriguez <dkingston02@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3fb82d56 |
|
23-Nov-2010 |
Suresh Siddha <suresh.b.siddha@intel.com> |
x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume During suspend, we disable all the non boot cpus. And during resume we bring them all back again. So no need to do alternatives_smp_switch() in between. On my core 2 based laptop, this speeds up the suspend path by 15msec and the resume path by 5 msec (suspend/resume speed up differences can be attributed to the different P-states that the cpu is in during suspend/resume). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
51a96c77 |
|
19-Nov-2010 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
cpu: Remove incorrect BUG_ON Oleg mentioned that there is no actual guarantee the dying cpu's migration thread is actually finished running when we get there, so replace the BUG_ON() with a spinloop waiting for it. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
2e01f474 |
|
18-Nov-2010 |
Dhaval Giani <dhaval.giani@gmail.com> |
cpu: Remove unused variable GCC warns us about: kernel/cpu.c: In function ‘take_cpu_down’: kernel/cpu.c:200:15: warning: unused variable ‘cpu’ This variable is unused since param->hcpu is directly used later on in cpu_notify. Signed-off-by: Dhaval Giani <dhaval_giani@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290091494.1145.5.camel@gondor.retis> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
48c5ccae |
|
13-Nov-2010 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
sched: Simplify cpu-hot-unplug task migration While discussing the need for sched_idle_next(), Oleg remarked that since try_to_wake_up() ensures sleeping tasks will end up running on a sane cpu, we can do away with migrate_live_tasks(). If we then extend the existing hack of migrating current from CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the need for the sched_idle_next() abomination disappears as well, since idle will be the only possible thread left after the migration thread stops. This greatly simplifies the hot-unplug task migration path, as can be seen from the resulting code reduction (and about half the new lines are comments). Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1289851597.2109.547.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3a101d05 |
|
08-Jun-2010 |
Tejun Heo <tj@kernel.org> |
sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining Currently, when a cpu goes down, cpu_active is cleared before CPU_DOWN_PREPARE starts and cpuset configuration is updated from a default priority cpu notifier. When a cpu is coming up, it's set before CPU_ONLINE but cpuset configuration again is updated from the same cpu notifier. For cpu notifiers, this presents an inconsistent state. Threads which a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be migrated to other cpus because the cpu is no more inactive. Fix it by updating cpu_active in the highest priority cpu notifier and cpuset configuration in the second highest when a cpu is coming up. Down path is updated similarly. This guarantees that all other cpu notifiers see consistent cpu_active and cpuset configuration. cpuset_track_online_cpus() notifier is converted to cpuset_update_active_cpus() which just updates the configuration and now called from cpuset_cpu_[in]active() notifiers registered from sched_init_smp(). If cpuset is disabled, cpuset_update_active_cpus() degenerates into partition_sched_domains() making separate notifier for !CONFIG_CPUSETS unnecessary. This problem is triggered by cmwq. During CPU_DOWN_PREPARE, hotplug callback creates a kthread and kthread_bind()s it to the target cpu, and the thread is expected to run on that cpu. * Ingo's test discovered __cpuinit/exit markups were incorrect. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Menage <menage@google.com>
|
#
5c113fbe |
|
31-May-2010 |
Daniel J Blueman <daniel.blueman@gmail.com> |
fix cpu_chain section mismatch... In commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(), __cpu_notify(), cpu_notify_nofail()") the new helper functions access cpu_chain. As a result, it shouldn't be marked __cpuinitdata (via section mismatch warning). Alternatively, the helper functions should be forced inline, or marked __ref or __cpuinit. In the meantime, this patch silences the warning the trivial way. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e9a5f426 |
|
27-May-2010 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
CPU: Avoid using unititialized error variable in disable_nonboot_cpus() If there's only one CPU online when disable_nonboot_cpus() is called, the error variable will not be initialized and that may lead to erroneous behavior. Fix this issue by initializing error in disable_nonboot_cpus() as appropriate. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
00b9b0af |
|
27-May-2010 |
Linus Torvalds <torvalds@linux-foundation.org> |
Avoid warning when CPU hotplug isn't enabled Commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(), __cpu_notify(), cpu_notify_nofail()") also introduced this annoying warning: kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used when CONFIG_HOTPLUG_CPU wasn't set. So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and simplify it while at it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
79a6cdeb |
|
26-May-2010 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
cpuhotplug: do not need cpu_hotplug_begin() when CONFIG_HOTPLUG_CPU=n Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't need cpu_hotplug_begin() either. This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code block of CONFIG_HOTPLUG_CPU=y. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e6bde73b |
|
26-May-2010 |
Akinobu Mita <akinobu.mita@gmail.com> |
cpu-hotplug: return better errno on cpu hotplug failure Currently, onlining or offlining a CPU failure by one of the cpu notifiers error always cause -EINVAL error. (i.e. writing 0 or 1 to /sys/devices/system/cpu/cpuX/online gets EINVAL) To get better error reporting rather than always getting -EINVAL, This changes cpu_notify() to return -errno value with notifier_to_errno() and fix the callers. Now that cpu notifiers can return encapsulate errno value. Currently, all cpu hotplug notifiers return NOTIFY_OK, NOTIFY_BAD, or NOTIFY_DONE. So cpu_notify() can returns 0 or -EPERM with this change for now. (notifier_to_errno(NOTIFY_OK) == 0, notifier_to_errno(NOTIFY_DONE) == 0, notifier_to_errno(NOTIFY_BAD) == -EPERM) Forthcoming patches convert several cpu notifiers to return encapsulate errno value with notifier_from_errno(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e9fb7631 |
|
26-May-2010 |
Akinobu Mita <akinobu.mita@gmail.com> |
cpu-hotplug: introduce cpu_notify(), __cpu_notify(), cpu_notify_nofail() No functional change. These are just wrappers of raw_cpu_notifier_call_chain. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4eaf3f64 |
|
24-May-2010 |
Haicheng Li <haicheng.li@linux.intel.com> |
mem-hotplug: fix potential race while building zonelist for new populated zone Add global mutex zonelists_mutex to fix the possible race: CPU0 CPU1 CPU2 (1) zone->present_pages += online_pages; (2) build_all_zonelists(); (3) alloc_page(); (4) free_page(); (5) build_all_zonelists(); (6) __build_all_zonelists(); (7) zone->pageset = alloc_percpu(); In step (3,4), zone->pageset still points to boot_pageset, so bad things may happen if 2+ nodes are in this state. Even if only 1 node is accessing the boot_pageset, (3) may still consume too much memory to fail the memory allocations in step (7). Besides, atomic operation ensures alloc_percpu() in step (7) will never fail since there is a new fresh memory block added in step(6). [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists] Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <andi.kleen@intel.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1f522509 |
|
24-May-2010 |
Haicheng Li <haicheng.li@linux.intel.com> |
mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset For each new populated zone of hotadded node, need to update its pagesets with dynamically allocated per_cpu_pageset struct for all possible CPUs: 1) Detach zone->pageset from the shared boot_pageset at end of __build_all_zonelists(). 2) Use mutex to protect zone->pageset when it's still shared in onlined_pages() Otherwises, multiple zones of different nodes would share same boot strapping boot_pageset for same CPU, which will finally cause below kernel panic: ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:1239! invalid opcode: 0000 [#1] SMP ... Call Trace: [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0 [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0 [<ffffffff81128407>] __page_cache_alloc+0x67/0x70 [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260 [<ffffffff81132751>] ra_submit+0x21/0x30 [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0 [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0 [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670 [<ffffffff81266cfa>] nfs_file_read+0xca/0x130 [<ffffffff8117b20a>] do_sync_read+0xfa/0x140 [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0 [<ffffffff8117c151>] sys_read+0x51/0x80 [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b RIP [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900 RSP <ffff88000d1e78a8> ---[ end trace 4bda28328b9990db ] [akpm@linux-foundation.org: merge fix] Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <andi.kleen@intel.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cf23422b |
|
24-May-2010 |
minskey guo <chaohong_guo@linux.intel.com> |
cpu/mem hotplug: enable CPUs online before local memory online Enable users to online CPUs even if the CPUs belongs to a numa node which doesn't have onlined local memory. The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either in system boot/init period, or at the time of local memory online. For a numa node without onlined local memory, its zonelists are not initialized at present. As a result, any memory allocation operations executed by CPUs within this node will fail. In fact, an out-of-memory error is triggered when attempt to online CPUs before memory comes to online. This patch tries to create zonelists for such numa nodes, so that the memory allocation for this node can be fallback'ed to other nodes. [akpm@linux-foundation.org: remove unneeded export] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: minskey guo<chaohong.guo@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3fc1f1e2 |
|
06-May-2010 |
Tejun Heo <tj@kernel.org> |
stop_machine: reimplement using cpu_stop Reimplement stop_machine using cpu_stop. As cpu stoppers are guaranteed to be available for all online cpus, stop_machine_create/destroy() are no longer necessary and removed. With resource management and synchronization handled by cpu_stop, the new implementation is much simpler. Asking the cpu_stop to execute the stop_cpu() state machine on all online cpus with cpu hotplug disabled is enough. stop_machine itself doesn't need to manage any global resources anymore, so all per-instance information is rolled into struct stop_machine_data and the mutex and all static data variables are removed. The previous implementation created and destroyed RT workqueues as necessary which made stop_machine() calls highly expensive on very large machines. According to Dimitri Sivanich, preventing the dynamic creation/destruction makes booting faster more than twice on very large machines. cpu_stop resources are preallocated for all online cpus and should have the same effect. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com>
|
#
6a1bdc1b |
|
15-Mar-2010 |
Oleg Nesterov <oleg@redhat.com> |
sched: _cpu_down(): Don't play with current->cpus_allowed _cpu_down() changes the current task's affinity and then recovers it at the end. The problems are well known: we can't restore old_allowed if it was bound to the now-dead-cpu, and we can race with the userspace which can change cpu-affinity during unplug. _cpu_down() should not play with current->cpus_allowed at all. Instead, take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable() removes the dying cpu from cpu_online_mask. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100315091023.GA9148@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
87d5e023 |
|
05-Mar-2010 |
Chen Gong <gong.chen@linux.intel.com> |
kernel/cpu.c: delete deprecated definition in cpu_up() Additional_cpus is only supported for IA64 now. X86_64 should not be included. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9d3cfc4c |
|
25-Jan-2010 |
Frans Pop <elendil@planet.nl> |
sched: Correct printk whitespace in warning from cpu down task check Due to an incorrect line break the output currently contains tabs. Also remove trailing space. The actual output that logcheck sent me looked like this: Task events/1 (pid = 10) is on cpu 1^I^I^I^I(state = 1, flags = 84208040) After this patch it becomes: Task events/1 (pid = 10) is on cpu 1 (state = 1, flags = 84208040) Signed-off-by: Frans Pop <elendilplanet.nl> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <201001251456.34996.elendil@planet.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
11854247 |
|
21-Jan-2010 |
Peter Zijlstra <peterz@infradead.org> |
sched: Fix incorrect sanity check We moved to migrate on wakeup, which means that sleeping tasks could still be present on offline cpus. Amend the check to only test running tasks. Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
9ee349ad |
|
16-Dec-2009 |
Xiaotian Feng <dfeng@redhat.com> |
sched: Fix set_cpu_active() in cpu_down() Sachin found cpu hotplug test failures on powerpc, which made the kernel hang on his POWER box. The problem is that we fail to re-activate a cpu when a hot-unplug fails. Fix this by moving the de-activation into _cpu_down after doing the initial checks. Remove the synchronize_sched() calls and rely on those implied by rebuilding the sched domains using the new mask. Reported-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091216170517.500272612@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6ad4c188 |
|
25-Nov-2009 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
sched: Fix balance vs hotplug race Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment) we have cpu_active_mask which is suppose to rule scheduler migration and load-balancing, except it never (fully) did. The particular problem being solved here is a crash in try_to_wake_up() where select_task_rq() ends up selecting an offline cpu because select_task_rq_fair() trusts the sched_domain tree to reflect the current state of affairs, similarly select_task_rq_rt() trusts the root_domain. However, the sched_domains are updated from CPU_DEAD, which is after the cpu is taken offline and after stop_machine is done. Therefore it can race perfectly well with code assuming the domains are right. Cure this by building the domains from cpu_active_mask on CPU_DOWN_PREPARE. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
feae3203 |
|
17-Nov-2009 |
Mike Travis <travis@sgi.com> |
timers, init: Limit the number of per cpu calibration bootup messages Limit the number of per cpu calibration messages by only printing out results for the first cpu to boot. Also, don't print "CPUx is down" as this is expected, and we don't need 4096 reminders... ;-) Signed-off-by: Mike Travis <travis@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
69575d38 |
|
01-Sep-2009 |
Shane Wang <shane.wang@intel.com> |
x86, intel_txt: clean up the impact on generic code, unbreak non-x86 Move tboot.h from asm to linux to fix the build errors of intel_txt patch on non-X86 platforms. Remove the tboot code from generic code init/main.c and kernel/cpu.c. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
d0af9eed |
|
19-Aug-2009 |
Suresh Siddha <suresh.b.siddha@intel.com> |
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init SDM Vol 3a section titled "MTRR considerations in MP systems" specifies the need for synchronizing the logical cpu's while initializing/updating MTRR. Currently Linux kernel does the synchronization of all cpu's only when a single MTRR register is programmed/updated. During an AP online (during boot/cpu-online/resume) where we initialize all the MTRR/PAT registers, we don't follow this synchronization algorithm. This can lead to scenarios where during a dynamic cpu online, that logical cpu is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical HT sibling continue to run (also with cache disabled because of cr0.cd=1 on its sibling). Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly (because of some VMX performance optimizations) and the above scenario (with one logical cpu doing VMX activity and another logical cpu coming online) can result in system crash. Fix the MTRR initialization by doing rendezvous of all the cpus. During boot and resume, we delay the MTRR/PAT init for APs till all the logical cpu's come online and the rendezvous process at the end of AP's bringup, will initialize the MTRR/PAT for all AP's. For dynamic single cpu online, we synchronize all the logical cpus and do the MTRR/PAT init on the AP that is coming online. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
86886e55 |
|
30-Jun-2009 |
Joseph Cihula <joseph.cihula@intel.com> |
x86, intel_txt: Intel TXT Sx shutdown support Support for graceful handling of sleep states (S3/S4/S5) after an Intel(R) TXT launch. Without this patch, attempting to place the system in one of the ACPI sleep states (S3/S4/S5) will cause the TXT hardware to treat this as an attack and will cause a system reset, with memory locked. Not only may the subsequent memory scrub take some time, but the platform will be unable to enter the requested power state. This patch calls back into the tboot so that it may properly and securely clean up system state and clear the secrets-in-memory flag, after which it will place the system into the requested sleep state using ACPI information passed by the kernel. arch/x86/kernel/smpboot.c | 2 ++ drivers/acpi/acpica/hwsleep.c | 3 +++ kernel/cpu.c | 7 ++++++- 3 files changed, 11 insertions(+), 1 deletion(-) Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
#
31950eb6 |
|
22-Jun-2009 |
Linus Torvalds <torvalds@linux-foundation.org> |
mm/init: cpu_hotplug_init() must be initialized before SLAB SLAB uses get/put_online_cpus() which use a mutex which is itself only initialized when cpu_hotplug_init() is called. Currently we hang suring boot in SLAB due to doing that too late. Reported by James Bottomley and Sachin Sant (and possibly others). Debugged by Benjamin Herrenschmidt. This just removes the dynamic initialization of the data structures, and replaces it with a static one, avoiding this dependency entirely, and removing one unnecessary special initcall. Tested-by: Sachin Sant <sachinp@in.ibm.com> Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com> Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2b17fa50 |
|
30-Mar-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: use set_cpu_active in init/main.c cpu_active_map is deprecated in favor of cpu_active_mask, which is const for safety: we use accessors now (set_cpu_active) is we really want to make a change. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
a0e280e0 |
|
07-Jan-2009 |
Heiko Carstens <hca@linux.ibm.com> |
stop_machine/cpu hotplug: fix disable_nonboot_cpus disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the caller already created the stop_machine workqueue (like cpu_down does). Otherwise a call to stop_machine will lead to accesses to random memory regions. When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85 "stop_machine: introduce stop_machine_create/destroy") I missed the second call site of _cpu_down. So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus as well. Fixes suspend-to-ram/disk and also this bug: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 [ 286.550598] Oops: 0002 [#1] SMP [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 [ 286.560580] Call Trace: [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 Reported-and-tested-by: "Justin P. Mattock" <justinmattock@gmail.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9ea09af3 |
|
21-Dec-2008 |
Heiko Carstens <hca@linux.ibm.com> |
stop_machine: introduce stop_machine_create/destroy. Introduce stop_machine_create/destroy. With this interface subsystems that need a non-failing stop_machine environment can create the stop_machine machine threads before actually calling stop_machine. When the threads aren't needed anymore they can be killed with stop_machine_destroy again. When stop_machine gets called and the threads aren't present they will be created and destroyed automatically. This restores the old behaviour of stop_machine. This patch also converts cpu hotplug to the new interface since it is special: cpu_down calls __stop_machine instead of stop_machine. However the kstop threads will only be created when stop_machine gets called. Changing the code so that the threads would be created automatically on __stop_machine is currently not possible: when __stop_machine gets called we hold cpu_add_remove_lock, which is the same lock that create_rt_workqueue would take. So the workqueue needs to be created before the cpu hotplug code locks cpu_add_remove_lock. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
e0b582ec |
|
31-Dec-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: convert kernel/cpu.c Impact: Reduce kernel stack and memory usage, use new cpumask API. Use cpumask_var_t for take_cpu_down() stack var, and frozen_cpus. Note that notify_cpu_starting() can be called before core_initcall allocates frozen_cpus, but the NULL check is optimized out by gcc for the CONFIG_CPUMASK_OFFSTACK=n case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
3fa41520 |
|
29-Dec-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: make set_cpu_*/init_cpu_* out-of-line They're only for use in boot/cpu hotplug code anyway, and this avoids the use of deprecated cpu_*_map. Stephen Rothwell points out that gcc 4.2.4 (on powerpc at least) didn't like the cast away of const anyway: include/linux/cpumask.h: In function 'set_cpu_possible': include/linux/cpumask.h:1052: warning: passing argument 2 of 'cpumask_set_cpu' discards qualifiers from pointer target type So this kills two birds with one stone. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
b3199c02 |
|
29-Dec-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: switch over to cpu_online/possible/active/present_mask: core Impact: cleanup This implements the obsolescent cpu_online_map in terms of cpu_online_mask, rather than the other way around. Same for the other maps. The documentation comments are also updated to refer to _mask rather than _map. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
|
#
98a79d6a |
|
13-Dec-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: centralize cpu_online_map and cpu_possible_map Impact: cleanup Each SMP arch defines these themselves. Move them to a central location. Twists: 1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a CONFIG_INIT_ALL_POSSIBLE for this rather than break them. 2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'. Those archs simply have phys_cpu_present_map replaced everywhere. 3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky so I just manipulate them both in sync. 4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map' declarations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Travis <travis@sgi.com> Cc: ink@jurassic.park.msu.ru Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: tony.luck@intel.com Cc: takata@linux-m32r.org Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: paulus@samba.org Cc: schwidefsky@de.ibm.com Cc: lethal@linux-sh.org Cc: wli@holomorphy.com Cc: davem@davemloft.net Cc: jdike@addtoit.com Cc: mingo@redhat.com
|
#
84196414 |
|
22-Nov-2008 |
Al Viro <viro@ftp.linux.org.uk> |
cpuinit fixes in kernel/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2d3854a3 |
|
04-Nov-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
cpumask: introduce new API, without changing anything Impact: introduce new APIs We want to deprecate cpumasks on the stack, as we are headed for gynormous numbers of CPUs. Eventually, we want to head towards an undefined 'struct cpumask' so they can never be declared on stack. 1) New cpumask functions which take pointers instead of copies. (cpus_* -> cpumask_*) 2) Several new helpers to reduce requirements for temporary cpumasks (cpumask_first_and, cpumask_next_and, cpumask_any_and) 3) Helpers for declaring cpumasks on or offstack for large NR_CPUS (cpumask_var_t, alloc_cpumask_var and free_cpumask_var) 4) 'struct cpumask' for explicitness and to mark new-style code. 5) Make iterator functions stop at nr_cpu_ids (a runtime constant), not NR_CPUS for time efficiency and for smaller dynamic allocations in future. 6) cpumask_copy() so we can allocate less than a full cpumask eventually (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask' definition eventually. 7) work_on_cpu() helper for doing task on a CPU, rather than saving old cpumask for current thread and manipulating it. 8) smp_call_function_many() which is smp_call_function_mask() except taking a cpumask pointer. Note that this patch simply introduces the new functions and leaves the obsolescent ones in place. This is to simplify the transition patches. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e545a614 |
|
07-Sep-2008 |
Manfred Spraul <manfred@colorfullife.com> |
kernel/cpu.c: create a CPU_STARTING cpu_chain notifier Right now, there is no notifier that is called on a new cpu, before the new cpu begins processing interrupts/softirqs. Various kernel function would need that notification, e.g. kvm works around by calling smp_call_function_single(), rcu polls cpu_online_map. The patch adds a CPU_STARTING notification. It also adds a helper function that sends the message to all cpu_chain handlers. Tested on x86-64. All other archs are untested. Especially on sparc, I'm not sure if I got it right. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3ba35573 |
|
31-Aug-2008 |
Manfred Spraul <manfred@colorfullife.com> |
kernel/cpu.c: Move the CPU_DYING notifiers When a cpu is taken offline, the CPU_DYING notifiers are called on the dying cpu. According to <linux/notifiers.h>, the cpu should be "not running any task, not handling interrupts, soon dead". For the current implementation, this is not true: - __cpu_disable can fail. If it fails, then the cpu will remain alive and happy. - At least on x86, __cpu_disable() briefly enables the local interrupts to handle any outstanding interrupts. What about moving CPU_DYING down a few lines, behind the __cpu_disable() line? There are only two CPU_DYING handlers in the kernel right now: one in kvm, one in the scheduler. Both should work with the patch applied [and: I'm not sure if either one handles a failing __cpu_disable()] The patch survives simple offlining a cpu. kvm untested due to lack of a test setup. Signed-off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3ee1062b |
|
12-Aug-2008 |
Heiko Carstens <hca@linux.ibm.com> |
cpu hotplug: s390 doesn't support additional_cpus anymore. s390 doesn't support the additional_cpus kernel parameter anymore since a long time. So we better update the code and documentation to reflect that. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
279ef6bb |
|
29-Jul-2008 |
Dmitry Adamushko <dmitry.adamushko@gmail.com> |
sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks Mark Langsdorf reported: > One of my co-workers noticed that the powernow-k8 > driver no longer restarts when a CPU core is > hot-disabled and then hot-enabled on AMD quad-core > systems. > > The following comands work fine on 2.6.26 and fail > on 2.6.27-rc1: > > echo 0 > /sys/devices/system/cpu/cpu3/online > echo 1 > /sys/devices/system/cpu/cpu3/online > find /sys -name cpufreq > > For 2.6.26, the find will return a cpufreq > directory for each processor. In 2.6.27-rc1, > the cpu3 directory is missing. > > After digging through the code, the following > logic is failing when the core is hot-enabled > at runtime. The code works during the boot > sequence. > > cpumask_t = current->cpus_allowed; > set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu)); > if (smp_processor_id() != cpu) > return -ENODEV; So set the CPU active before calling the CPU_ONLINE notifier chain, there are a handful of notifiers that use set_cpus_allowed(). This fix also solves the problem with x86-microcode. I've sent alternative patches for microcode, but as this "rely on set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)" assumption seems to be more broad than what we thought, perhaps this fix should be applied. With this patch we define that by the moment CPU_ONLINE is being sent, a 'cpu' is online and ready for tasks to be migrated onto it. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Reported-by: Mark Langsdorf <mark.langsdorf@amd.com> Tested-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e56b3bc7 |
|
28-Jul-2008 |
Linus Torvalds <torvalds@linux-foundation.org> |
cpu masks: optimize and clean up cpumask_of_cpu() Clean up and optimize cpumask_of_cpu(), by sharing all the zero words. Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns creating a huge array of constant bitmasks, realize that the zero words can be shared. In other words, on a 64-bit architecture, we only ever need 64 of these arrays - with a different bit set in one single world (with enough zero words around it so that we can create any bitmask by just offsetting in that big array). And then we just put enough zeroes around it that we can point every single cpumask to be one of those things. So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each, with one bit set in each array - 2MB memory total), we have exactly 64 arrays instead, each 8k bits in size (64kB total). And then we just point cpumask(n) to the right position (which we can calculate dynamically). Once we have the right arrays, getting "cpumask(n)" ends up being: static inline const cpumask_t *get_cpu_mask(unsigned int cpu) { const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG]; p -= cpu / BITS_PER_LONG; return (const cpumask_t *)p; } This brings other advantages and simplifications as well: - we are not wasting memory that is just filled with a single bit in various different places - we don't need all those games to re-create the arrays in some dense format, because they're already going to be dense enough. if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory is a non-issue (especially since by doing this "overlapping" trick we probably get better cache behaviour anyway). [ mingo@elte.hu: Converted Linus's mails into a commit. See: http://lkml.org/lkml/2008/7/27/156 http://lkml.org/lkml/2008/7/28/320 Also applied a family filter - which also has the side-effect of leaving out the bits where Linus calls me an idio... Oh, never mind ;-) ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
eeec4fad |
|
27-Jul-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
stop_machine(): stop_machine_run() changed to use cpu mask Instead of a "cpu" arg with magic values NR_CPUS (any cpu) and ~0 (all cpus), pass a cpumask_t. Allow NULL for the common case (where we don't care which CPU the function is run on): temporary cpumask_t's are usually considered bad for stack space. This deprecates stop_machine_run, to be removed soon when all the callers are dead. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
04321587 |
|
27-Jul-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
Hotplug CPU: don't check cpu_online after take_cpu_down Akinobu points out that if take_cpu_down() succeeds, the cpu must be offline. Remove the cpu_online() check, and put a BUG_ON(). Quoting Akinobu Mita: Actually the cpu_online() check was necessary before appling this stop_machine: simplify patch. With old __stop_machine_run(), __stop_machine_run() could succeed (return !IS_ERR(p) value) even if take_cpu_down() returned non-zero value. The return value of take_cpu_down() was obtained through kthread_stop().. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: "Akinobu Mita" <akinobu.mita@gmail.com>
|
#
ffdb5976 |
|
27-Jul-2008 |
Rusty Russell <rusty@rustcorp.com.au> |
Simplify stop_machine stop_machine creates a kthread which creates kernel threads. We can create those threads directly and simplify things a little. Some care must be taken with CPU hotunplug, which has special needs, but that code seems more robust than it was in the past. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
#
5a7a201c |
|
26-Jul-2008 |
Ingo Molnar <mingo@elte.hu> |
cpumask: export cpumask_of_cpu_map fix: ERROR: "cpumask_of_cpu_map" [drivers/acpi/processor.ko] undefined! ERROR: "cpumask_of_cpu_map" [arch/x86/kernel/microcode.ko] undefined! ERROR: "cpumask_of_cpu_map" [arch/x86/kernel/cpu/cpufreq/speedstep-ich.ko] undefined! ERROR: "cpumask_of_cpu_map" [arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.ko] undefined! Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
6524d938 |
|
24-Jul-2008 |
Mike Travis <travis@sgi.com> |
cpumask: put cpumask_of_cpu_map in the initdata section * Create the cpumask_of_cpu_map statically in the init data section using NR_CPUS but replace it during boot up with one sized by nr_cpu_ids (num possible cpus). Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b8d317d1 |
|
24-Jul-2008 |
Mike Travis <travis@sgi.com> |
cpumask: make cpumask_of_cpu_map generic If an arch doesn't define cpumask_of_cpu_map, create a generic statically-initialized one for them. This allows removal of the buggy cpumask_of_cpu() macro (&cpumask_of_cpu() gives address of out-of-scope var). An arch with NR_CPUS of 4096 probably wants to allocate this itself based on the actual number of CPUs, since otherwise they're using 2MB of rodata (1024 cpus means 128k). That's what CONFIG_HAVE_CPUMASK_OF_CPU_MAP is for (only x86/64 does so at the moment). In future as we support more CPUs, we'll need to resort to a get_cpu_map()/put_cpu_map() allocation scheme. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
3da1c84c |
|
25-Jul-2008 |
Oleg Nesterov <oleg@tv-sign.ru> |
workqueues: make get_online_cpus() useable for work->func() workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under cpu_maps_update_begin(). This means that the multithreaded workqueues can't use get_online_cpus() due to the possible deadlock, very bad and very old problem. Introduce the new state, CPU_POST_DEAD, which is called after cpu_hotplug_done() but before cpu_maps_update_done(). Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD. This means that create/destroy functions can't rely on get_online_cpus() any longer and should take cpu_add_remove_lock instead. [akpm@linux-foundation.org: fix CONFIG_SMP=n] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
39b0fad7 |
|
15-Jul-2008 |
Max Krasnyansky <maxk@qualcomm.com> |
cpu hotplug: Make cpu_active_map synchronization dependency clear This goes on top of the cpu_active_map (take 2) patch. Currently we depend on the stop_machine to provide nescessesary synchronization for the cpu_active_map updates. As Dmitry Adamushko pointed this is fragile and is not much clearer than the previous scheme. In other words we do not want to depend on the internal stop machine operation here. So make the synchronization rules clear by doing synchronize_sched() after clearing out cpu active bit. Tested on quad-Core2 with: while true; do for i in 1 2 3; do echo 0 > /sys/devices/system/cpu/cpu$i/online done for i in 1 2 3; do echo 1 > /sys/devices/system/cpu/cpu$i/online done done and stress -c 200 No lockdep, preempt or other complaints. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
e761b772 |
|
15-Jul-2008 |
Max Krasnyansky <maxk@qualcomm.com> |
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) This is based on Linus' idea of creating cpu_active_map that prevents scheduler load balancer from migrating tasks to the cpu that is going down. It allows us to simplify domain management code and avoid unecessary domain rebuilds during cpu hotplug event handling. Please ignore the cpusets part for now. It needs some more work in order to avoid crazy lock nesting. Although I did simplfy and unify domain reinitialization logic. We now simply call partition_sched_domains() in all the cases. This means that we're using exact same code paths as in cpusets case and hence the test below cover cpusets too. Cpuset changes to make rebuild_sched_domains() callable from various contexts are in the separate patch (right next after this one). This not only boots but also easily handles while true; do make clean; make -j 8; done and while true; do on-off-cpu 1; done at the same time. (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing). Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing this on right now in gnome-terminal and things are moving just fine. Also this is running with most of the debug features enabled (lockdep, mutex, etc) no BUG_ONs or lockdep complaints so far. I believe I addressed all of the Dmitry's comments for original Linus' version. I changed both fair and rt balancer to mask out non-active cpus. And replaced cpu_is_offline() with !cpu_active() in the main scheduler code where it made sense (to me). Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Gregory Haskins <ghaskins@novell.com> Cc: dmitry.adamushko@gmail.com Cc: pj@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
b62b8ef9 |
|
29-Apr-2008 |
Zhang Rui <rui.zhang@intel.com> |
force offline the processor during hot-removal The ACPI device node for the cpu has already been unregistered when acpi_processor_handle_eject is called. Thus we should offline the cpu and continue, rather than a failure here. http://bugzilla.kernel.org/show_bug.cgi?id=9772 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
#
68f4f1ec |
|
29-May-2008 |
Max Krasnyansky <maxk@qualcomm.com> |
sched: Move cpu masks from kernel/sched.c into kernel/cpu.c kernel/cpu.c seems a more logical place for those maps since they do not really have much to do with the scheduler these days. kernel/cpu.c is now built for the UP kernel too, but it does not affect the size the kernel sections. $ size vmlinux before text data bss dec hex filename 3313797 307060 310352 3931209 3bfc49 vmlinux after text data bss dec hex filename 3313797 307060 310352 3931209 3bfc49 vmlinux Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Cc: pj@sgi.com Cc: menage@google.com Cc: rostedt@goodmis.org Cc: mingo@elte.hu Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
363ab6f1 |
|
12-May-2008 |
Mike Travis <travis@sgi.com> |
core: use performance variant for_each_cpu_mask_nr Change references from for_each_cpu_mask to for_each_cpu_mask_nr where appropriate Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
af1f16d0 |
|
30-Apr-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
kernel: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d2ba7e2a |
|
29-Apr-2008 |
Oleg Nesterov <oleg@tv-sign.ru> |
simplify cpu_hotplug_begin()/put_online_cpus() cpu_hotplug_begin() must be always called under cpu_add_remove_lock, this means that only one process can be cpu_hotplug.active_writer. So we don't need the cpu_hotplug.writer_queue, we can wake up the ->active_writer directly. Also, fix the comment. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Dipankar Sarma <dipankar@in.ibm.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f7b16c10 |
|
29-Apr-2008 |
Sam Ravnborg <sam@ravnborg.org> |
cpu: fix section mismatch warning in reference to register_cpu_notifier Fix following warnings: WARNING: vmlinux.o(.text+0xc60): Section mismatch in reference from the function kvm_init() to the function .cpuinit.text:register_cpu_notifier() WARNING: vmlinux.o(.text+0x33869a): Section mismatch in reference from the function xfs_icsb_init_counters() to the function .cpuinit.text:register_cpu_notifier() WARNING: vmlinux.o(.text+0x5556a1): Section mismatch in reference from the function acpi_processor_install_hotplug_notify() to the function .cpuinit.text:register_cpu_notifier() WARNING: vmlinux.o(.text+0xfe6b28): Section mismatch in reference from the function cpufreq_register_driver() to the function .cpuinit.text:register_cpu_notifier() register_cpu_notifier() are only really defined when HOTPLUG_CPU is enabled. So references to the function are OK. Annotate it with __ref so we do not get warnings from callers and do not get warnings for the functions/data used by register_cpu_notifier(). Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
514a20a5 |
|
29-Apr-2008 |
Sam Ravnborg <sam@ravnborg.org> |
cpu: fix section mismatch warnings in *cpu_down Fix following warnings: WARNING: vmlinux.o(.text+0x75c8d): Section mismatch in reference from the function take_cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75d2a): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75d4d): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75de4): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain WARNING: vmlinux.o(.text+0x75e33): Section mismatch in reference from the function _cpu_down() to the variable .cpuinit.data:cpu_chain cpu_down is only used from code surrounded by HOTPLUG_CPU so any references to __cpuinit is OK. Add a few __ref to tech modpost to ignore the references. This is just papering over the fact that the cpu hotplug code is fragile with respect to use of HOTPLUG_CPU and in many cases rely on __cpuinit to get rid of code when HOTPLUG_CPU is not enabled. For now this is the least invasive change. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9647155f |
|
29-Apr-2008 |
Sam Ravnborg <sam@ravnborg.org> |
cpu: fix section mismatch warning in unregister_cpu_notifier Fix following warning: WARNING: vmlinux.o(.text+0x75f4e): Section mismatch in reference from the function unregister_cpu_notifier() to the variable .cpuinit.data:cpu_chain We know that unregister_cpu_notifier is using HOTPLUG_CPU stuff - so ignore these references. Annotating unregister_cpu_notifier had been another option but this caused far more warnings since not all callers were annotated __cpuinit. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f70316da |
|
04-Apr-2008 |
Mike Travis <travis@sgi.com> |
generic: use new set_cpus_allowed_ptr function * Use new set_cpus_allowed_ptr() function added by previous patch, which instead of passing the "newly allowed cpus" cpumask_t arg by value, pass it by pointer: -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask) +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask) * Modify CPU_MASK_ALL Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
fa7303e2 |
|
08-Feb-2008 |
Sam Ravnborg <sam@ravnborg.org> |
cpu: fix section mismatch warnings for enable_nonboot_cpus Fix following warning: WARNING: o-x86_64/kernel/built-in.o(.text+0x36d8b): Section mismatch in reference from the function enable_nonboot_cpus() to the function .cpuinit.text:_cpu_up() enable_nonboot_cpus() are used solely from CONFIG_CONFIG_PM_SLEEP_SMP=y and PM_SLEEP_SMP imply HOTPLUG_CPU therefore the reference to _cpu_up() is valid. Annotate enable_nonboot_cpus() with __ref to silence modpost. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
95402b38 |
|
25-Jan-2008 |
Gautham R Shenoy <ego@in.ibm.com> |
cpu-hotplug: replace per-subsystem mutexes with get_online_cpus() This patch converts the known per-subsystem mutexes to get_online_cpus put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug notification events. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
86ef5c9a |
|
25-Jan-2008 |
Gautham R Shenoy <ego@in.ibm.com> |
cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus() Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
d221938c |
|
25-Jan-2008 |
Gautham R Shenoy <ego@in.ibm.com> |
cpu-hotplug: refcount based cpu hotplug This patch implements a Refcount + Waitqueue based model for cpu-hotplug. Now, a thread which wants to prevent cpu-hotplug, will bump up a global refcount and the thread which wants to perform a cpu-hotplug operation will block till the global refcount goes to zero. The readers, if any, during an ongoing cpu-hotplug operation are blocked until the cpu-hotplug operation is over. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
73e753a5 |
|
19-Oct-2007 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified cpu-hot-add should be fail if cpu is not set in cpu_possible_map. If go ahead, the system will panic soon. Especially, arch which requires additional_cpus= parameter should handle this. Tested on ia64. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ba25f9dc |
|
19-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
Use helpers to obtain task pid in printks The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a0d8cdb6 |
|
18-Oct-2007 |
Akinobu Mita <akinobu.mita@gmail.com> |
cpu hotplug: cpu: deliver CPU_UP_CANCELED only to NOTIFY_OKed callbacks with CPU_UP_PREPARE The functions in a CPU notifier chain is called with CPU_UP_PREPARE event before making the CPU online. If one of the callback returns NOTIFY_BAD, it stops to deliver CPU_UP_PREPARE event, and CPU online operation is canceled. Then CPU_UP_CANCELED event is delivered to the functions in a CPU notifier chain again. This CPU_UP_CANCELED event is delivered to the functions which have been called with CPU_UP_PREPARE, not delivered to the functions which haven't been called with CPU_UP_PREPARE. The problem that makes existing cpu hotplug error handlings complex is that the CPU_UP_CANCELED event is delivered to the function that has returned NOTIFY_BAD, too. Usually we don't expect to call destructor function against the object that has failed to initialize. It is like: err = register_something(); if (err) { unregister_something(); return err; } So it is natural to deliver CPU_UP_CANCELED event only to the functions that have returned NOTIFY_OK with CPU_UP_PREPARE event and not to call the function that have returned NOTIFY_BAD. This is what this patch is doing. Otherwise, every cpu hotplug notifiler has to track whether notifiler event is failed or not for each cpu. (drivers/base/topology.c is doing this with topology_dev_map) Similary this patch makes same thing with CPU_DOWN_PREPARE and CPU_DOWN_FAILED evnets. Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f3de4be9 |
|
31-Aug-2007 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION Dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION introduced by commit 296699de6bdc717189a331ab6bbe90e05c94db06 "Introduce CONFIG_SUSPEND for suspend-to-Ram and standby" are incorrect, as they don't cover the facts that (1) not all architectures support suspend and (2) SMP hibernation is only possible on X86 and PPC64 (if CONFIG_PPC64_SWSUSP is set). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
db912f96 |
|
23-May-2007 |
Avi Kivity <avi@qumranet.com> |
HOTPLUG: Add CPU_DYING notifier KVM wants a notification when a cpu is about to die, so it can disable hardware extensions, but at a time when user processes cannot be scheduled on the cpu, so it doesn't try to use virtualization extensions after they have been disabled. This adds a CPU_DYING notification. The notification is called in atomic context on the doomed cpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
|
#
455c017a |
|
09-May-2007 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
microcode: use suspend-related CPU hotplug notifications Make the microcode driver use the suspend-related CPU hotplug notifications to handle the CPU hotplug events occuring during system-wide suspend and resume transitions. Remove the global variable suspend_cpu_hotplug previously used for this purpose. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
8bb78442 |
|
09-May-2007 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
Add suspend-related notifications for CPU hotplug Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7b0834c2 |
|
09-May-2007 |
Gautham R Shenoy <ego@in.ibm.com> |
Remove kthread_bind() call from _cpu_down() We are anyway kthread_stop()ping other per-cpu kernel threads after move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread as well. I just checked with Vatsa if there was any subtle reason why they had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect any and I can't see any. So let us just remove the kthread_bind. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e7407dcc |
|
09-May-2007 |
Heiko Carstens <hca@linux.ibm.com> |
call cpu_chain with CPU_DOWN_FAILED if CPU_DOWN_PREPARE failed This makes cpu hotplug symmetrical: if CPU_UP_PREPARE fails we get CPU_UP_CANCELED, so we can undo what ever happened on PREPARE. The same should happen for CPU_DOWN_PREPARE. [akpm@linux-foundation.org: fix for reduce-size-of-task_struct-on-64-bit-machines] Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
baaca49f |
|
09-May-2007 |
Gautham R Shenoy <ego@in.ibm.com> |
Define and use new events,CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE This is an attempt to provide an alternate mechanism for postponing a hotplug event instead of using a global mechanism like lock_cpu_hotplug. The proposal is to add two new events namely CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. The notification for these two events would be sent out before and after a cpu_hotplug event respectively. During the CPU_LOCK_ACQUIRE event, a cpu-hotplug-aware subsystem is supposed to acquire any per-subsystem hotcpu mutex ( Eg. workqueue_mutex in kernel/workqueue.c ). During the CPU_LOCK_RELEASE release event the cpu-hotplug-aware subsystem is supposed to release the per-subsystem hotcpu mutex. The reasons for defining new events as opposed to reusing the existing events like CPU_UP_PREPARE/CPU_UP_FAILED/CPU_ONLINE for locking/unlocking of per-subsystem hotcpu mutexes are as follow: - CPU_LOCK_ACQUIRE: All hotcpu mutexes are taken before subsystems start handling pre-hotplug events like CPU_UP_PREPARE/CPU_DOWN_PREPARE etc, thus ensuring a clean handling of these events. - CPU_LOCK_RELEASE: The hotcpu mutexes will be released only after all subsystems have handled post-hotplug events like CPU_DOWN_FAILED, CPU_DEAD,CPU_ONLINE etc thereby ensuring that there are no subsequent clashes amongst the interdependent subsystems after a cpu hotplugs. This patch also uses __raw_notifier_call chain in _cpu_up to take care of the dependency between the two consequetive calls to raw_notifier_call_chain. [akpm@linux-foundation.org: fix a bug] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1d64b9cb |
|
02-Apr-2007 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
[PATCH] Fix microcode-related suspend problem Fix the regression resulting from the recent change of suspend code ordering that causes systems based on Intel x86 CPUs using the microcode driver to hang during the resume. The problem occurs since the microcode driver uses request_firmware() in its CPU hotplug notifier, which is called after tasks has been frozen and hangs. It can be fixed by telling the microcode driver to use the microcode stored in memory during the resume instead of trying to load it from disk. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Adrian Bunk <bunk@stusta.de> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Pavel Machek <pavel@ucw.cz> Cc: Maxim <maximlevitsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ed746e3b |
|
10-Feb-2007 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
[PATCH] swsusp: Change code ordering in disk.c Change the ordering of code in kernel/power/disk.c so that device_suspend() is called before disable_nonboot_cpus() and platform_finish() is called after enable_nonboot_cpus() and before device_resume(), as indicated by the recent discussion on Linux-PM (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). The changes here only affect the built-in swsusp. [alexey.y.starikovskiy@linux.intel.com: fix LED blinking during image load] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Greg KH <greg@kroah.com> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Patrick Mochel <mochel@digitalimplant.org> Cc: Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b282b6f8 |
|
11-Jan-2007 |
Gautham R Shenoy <ego@in.ibm.com> |
[PATCH] Change cpu_up and co from __devinit to __cpuinit Compiling the kernel with CONFIG_HOTPLUG = y and CONFIG_HOTPLUG_CPU = n with CONFIG_RELOCATABLE = y generates the following modpost warnings WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141b7d) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141b9c) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.text:__cpu_up from .text between '_cpu_up' (at offset 0xc0141bd8) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c05) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c26) and 'cpu_up' WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between '_cpu_up' (at offset 0xc0141c37) and 'cpu_up' This is because cpu_up, _cpu_up and __cpu_up (in some architectures) are defined as __devinit AND __cpu_up calls some __cpuinit functions. Since __cpuinit would map to __init with this kind of a configuration, we get a .text refering .init.data warning. This patch solves the problem by converting all of __cpu_up, _cpu_up and cpu_up from __devinit to __cpuinit. The approach is justified since the callers of cpu_up are either dependent on CONFIG_HOTPLUG_CPU or are of __init type. Thus when CONFIG_HOTPLUG_CPU=y, all these cpu up functions would land up in .text section, and when CONFIG_HOTPLUG_CPU=n, all these functions would land up in .init section. Tested on a i386 SMP machine running linux-2.6.20-rc3-mm1. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e1d9fd2e |
|
23-Dec-2006 |
Ingo Molnar <mingo@elte.hu> |
[PATCH] suspend: fix suspend on single-CPU systems Clark Williams reported that suspend doesnt work on his laptop on 2.6.20-rc1-rt kernels. The bug was introduced by the following cleanup commit: commit 112cecb2cc0e7341db92281ba04b26c41bb8146d Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Wed Dec 6 20:34:31 2006 -0800 [PATCH] suspend: don't change cpus_allowed for task initiating the suspend because with this change 'error' is not initialized to 0 anymore, if there are no other online CPUs. (i.e. if the system is single-CPU). the fix is the initialize it to 0. The really weird thing is that my version of gcc does not warn about this non-initialized variable situation ... (also fix the kernel printk in the error branch, it was missing a newline) Reported-by: Clark Williams <williams@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
112cecb2 |
|
06-Dec-2006 |
Siddha, Suresh B <suresh.b.siddha@intel.com> |
[PATCH] suspend: don't change cpus_allowed for task initiating the suspend Don't modify the cpus_allowed of the task initiating the suspend. _cpu_down() already makes sure that the task doing the suspend doesn't run on dying cpu. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Nigel Cunningham <nigel@suspend2.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
4b96b1a1 |
|
06-Nov-2006 |
Gautham R Shenoy <ego@in.ibm.com> |
[PATCH] Fix the spurious unlock_cpu_hotplug false warnings Cpu-hotplug locking has a minor race case caused because of setting the variable "recursive" to NULL *after* releasing the cpu_bitmask_lock in the function unlock_cpu_hotplug,instead of doing so before releasing the cpu_bitmask_lock. This was the cause of most of the recent false spurious lock_cpu_unlock warnings. This should fix the problem reported by Martin Lorenz reported in http://lkml.org/lkml/2006/10/29/127. Thanks to Srinivasa DS for pointing it out. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
8fa1d7d3 |
|
28-Oct-2006 |
Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> |
[PATCH] cpu-hotplug: release `workqueue_mutex' properly on CPU hot-remove _cpu_down() acquires `workqueue_mutex' on its process, but doen't release it if __cpu_disable() fails. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
bd5349cf |
|
17-Oct-2006 |
Neil Brown <neilb@suse.de> |
[PATCH] Convert cpu hotplug notifiers to use raw_notifier instead of blocking_notifier The use of blocking notifier by _cpu_up and _cpu_down in cpu.c has two problem. 1/ An interaction with the workqueue notifier causes lockdep to spit a warning. 2/ A notifier could conceivable be added or removed while _cpu_up or _cpu_down are in process. As each notifier is called twice (prepare then commit/abort) this could be unhealthy. To fix to we simply take cpu_add_remove_lock while adding or removing notifiers to/from the list. This makes the 'blocking' usage unnecessary as all accesses to cpu_chain are now protected by cpu_add_remove_lock. So change "blocking" to "raw" in all relevant places. This fixes 1. Credit: Andrew Morton Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> (reporter) Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e3920fb4 |
|
26-Sep-2006 |
Rafael J. Wysocki <rjw@rjwysocki.net> |
[PATCH] Disable CPU hotplug during suspend The current suspend code has to be run on one CPU, so we use the CPU hotplug to take the non-boot CPUs offline on SMP machines. However, we should also make sure that these CPUs will not be enabled by someone else after we have disabled them. The functions disable_nonboot_cpus() and enable_nonboot_cpus() are moved to kernel/cpu.c, because they now refer to some stuff in there that should better be static. Also it's better if disable_nonboot_cpus() returns an error instead of panicking if something goes wrong, and enable_nonboot_cpus() has no reason to panic(), because the CPUs may have been enabled by the userland before it tries to take them online. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
aa953877 |
|
23-Jul-2006 |
Linus Torvalds <torvalds@macmini.osdl.org> |
cpu hotplug: simplify and hopefully fix locking The CPU hotplug locking was quite messy, with a recursive lock to handle the fact that both the actual up/down sequence wanted to protect itself from being re-entered, but the callbacks that it called also tended to want to protect themselves from CPU events. This splits the lock into two (one to serialize the whole hotplug sequence, the other to protect against the CPU present bitmaps changing). The latter still allows recursive usage because some subsystems (ondemand policy for cpufreq at least) had already gotten too used to the lax locking, but the locking mistakes are hopefully now less fundamental, and we now warn about recursive lock usage when we see it, in the hope that it can be fixed. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
65edc68c |
|
27-Jun-2006 |
Chandra Seetharaman <sekharan@us.ibm.com> |
[PATCH] cpu hotplug: make [un]register_cpu_notifier init time only CPUs come online only at init time (unless CONFIG_HOTPLUG_CPU is defined). So, cpu_notifier functionality need to be available only at init time. This patch makes register_cpu_notifier() available only at init time, unless CONFIG_HOTPLUG_CPU is defined. This patch exports register_cpu_notifier() and unregister_cpu_notifier() only if CONFIG_HOTPLUG_CPU is defined. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
81615b62 |
|
26-Jun-2006 |
Ingo Molnar <mingo@elte.hu> |
[PATCH] Convert kernel/cpu.c to mutexes Convert kernel/cpu.c from semaphore to mutex. I've reviewed all lock_cpu_hotplug() critical sections, and they all seem to fit mutex semantics. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e041c683 |
|
27-Mar-2006 |
Alan Stern <stern@rowland.harvard.edu> |
[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6978c705 |
|
24-Mar-2006 |
Eric Sesterhenn <snakebyte@gmx.de> |
BUG_ON() Conversion in kernel/cpu.c this changes if() BUG(); constructs to BUG_ON() which is cleaner, contains unlikely() and can better optimized away. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
a9d9baa1 |
|
28-Nov-2005 |
Ashok Raj <ashok.raj@intel.com> |
[PATCH] clean up lock_cpu_hotplug() in cpufreq There are some callers in cpufreq hotplug notify path that the lowest function calls lock_cpu_hotplug(). The lock is already held during cpu_up() and cpu_down() calls when the notify calls are broadcast to registered clients. Ideally if possible, we could disable_preempt() at the highest caller and make sure we dont sleep in the path down in cpufreq->driver_target() calls but the calls are so intertwined and cumbersome to cleanup. Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in all places. - Removed export of cpucontrol semaphore and made it static. - removed explicit uses of up/down with lock_cpu_hotplug() so we can keep track of the the callers in same thread context and just keep refcounts without calling a down() that causes a deadlock. - Removed current_in_hotplug() uses - Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug() temporary workaround. Tested with insmod of cpufreq_stat.ko, and logical online/offline to make sure we dont have any hang situations. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
90d45d17 |
|
08-Nov-2005 |
Ashok Raj <ashok.raj@intel.com> |
[PATCH] cpu hotplug: fix locking in cpufreq drivers When calling target drivers to set frequency, we take cpucontrol lock. When we modified the code to accomodate CPU hotplug, there was an attempt to take a double lock of cpucontrol leading to a deadlock. Since the current thread context is already holding the cpucontrol lock, we dont need to make another attempt to acquire it. Now we leave a trace in current->flags indicating current thread already is under cpucontrol lock held, so we dont attempt to do this another time. Thanks to Andrew Morton for the beating:-) From: Brice Goglin <Brice.Goglin@ens-lyon.org> Build fix (akpm: this patch is still unpleasant. Ashok continues to look for a cleaner solution, doesn't he? ;)) Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c32b6b8e |
|
30-Oct-2005 |
Ashok Raj <ashok.raj@intel.com> |
[PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers cpufreq entries in sysfs should only be populated when CPU is online state. When we either boot with maxcpus=x and then boot the other cpus by echoing to sysfs online file, these entries should be created and destroyed when CPU_DEAD is notified. Same treatement as cache entries under sysfs. We place the processor in the lowest frequency, so hw managed P-State transitions can still work on the other threads to save power. Primary goal was to just make these directories appear/disapper dynamically. There is one in this patch i had to do, which i really dont like myself but probably best if someone handling the cpufreq infrastructure could give this code right treatment if this is not acceptable. I guess its probably good for the first cut. - Converting lock_cpu_hotplug()/unlock_cpu_hotplug() to disable/enable preempt. The locking was smack in the middle of the notification path, when the hotplug is already holding the lock. I tried another solution to avoid this so avoid taking locks if we know we are from notification path. The solution was getting very ugly and i decided this was probably good for this iteration until someone who understands cpufreq could do a better job than me. (akpm: export cpucontrol to GPL modules: drivers/cpufreq/cpufreq_stats.c now does lock_cpu_hotplug()) Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Zwane Mwaikambo <zwane@holomorphy.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f3705136 |
|
25-Jun-2005 |
Zwane Mwaikambo <zwane@linuxpower.ca> |
[PATCH] i386 CPU hotplug (The i386 CPU hotplug patch provides infrastructure for some work which Pavel is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua <shaohua.li@intel.com> is doing) The following provides i386 architecture support for safely unregistering and registering processors during runtime, updated for the current -mm tree. In order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being that on cpu offline, fixup_irqs() is called before we clear the cpu from cpu_online_map and a long delay in order to ensure that we never have any queued external interrupts on the APICs. There are additional changes to s390 and ppc64 to account for this change. 1) Add CONFIG_HOTPLUG_CPU 2) disable local APIC timer on dead cpus. 3) Disable preempt around irq balancing to prevent CPUs going down. 4) Print irq stats for all possible cpus. 5) Debugging check for interrupts on offline cpus. 6) Hacky fixup_irqs() to redirect irqs when cpus go off/online. 7) play_dead() for offline cpus to spin inside. 8) Handle offline cpus set in flush_tlb_others(). 9) Grab lock earlier in smp_call_function() to prevent CPUs going down. 10) Implement __cpu_disable() and __cpu_die(). 11) Enable local interrupts in cpu_enable() after fixup_irqs() 12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus. 13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline. Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|