Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_switch.c
Revision	Date	Author	Comments
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# 1029dab6	09-Feb-2023	Mitchell Horne <mhorne@FreeBSD.org>	mi_switch(): clean up switch types and their usage Overall, this is a non-functional change, except for kernels built with SCHED_STATS. However, the switch types are useful for communicating the intent of the caller. 1. Ensure that every caller provides a type. In most cases, we upgrade the basic yield to sched_relinquish() aka SWT_RELINQUISH. 2. The case of sched_bind() is distinct, so add a new switch type SWT_BIND. 3. Remove the two unused types, SWT_PREEMPT and SWT_SLEEPQTIMO. 4. Remove SWT_NONE altogether and assert that callers always provide a type flag. 5. Reference the mi_switch(9) man page in the comments, as these flags will be documented there. Reviewed by: kib, markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38184
# 6622e299	26-Sep-2022	Alan Somers <asomers@FreeBSD.org>	Fix the build with SCHED_STATS after d3f96f661050 MFC with: d3f96f661050e9bd21fe29931992a8b9e67ff189 Sponsored by: Axcient
# c6c52d8e	26-Dec-2021	Alexander Motin <mav@FreeBSD.org>	kern: Remove CTLFLAG_NEEDGIANT from some more sysctls. MFC after: 2 weeks
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# 3ff65f71	30-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	Remove duplicated empty lines from kern/*.c No functional changes.
# 879e0604	11-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	Add KERNEL_PANICKED macro for use in place of direct panicstr tests
# 686bcb5c	15-Dec-2019	Jeff Roberson <jeff@FreeBSD.org>	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778
# 6443773d	02-Jul-2018	Matt Macy <mmacy@FreeBSD.org>	make critical_{enter, exit} inline Avoid pulling in all of the <sys/proc.h> dependencies by automatically generating a stripped down thread_lite exporting only the fields of interest. The field declarations are type checked against the original and the offsets of the generated result is automatically checked. kib has expressed disagreement and would have preferred to simply use genassym style offsets (which loses type check enforcement). jhb has expressed dislike of it due to header pollution and a duplicate structure. He would have preferred to just have defined thread in _thread.h. Nonetheless, he admits that this is the only viable solution at the moment. The impetus for this came from mjg's D15331: "Inline critical_enter/exit for amd64" Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D16078
# 6fee84e3	23-May-2018	Mateusz Guzik <mjg@FreeBSD.org>	Remove incorrect owepreempt assertion added in r334062 Yet another preemption request hitting between the counter being 0 and the check being reached will result in the flag no longer being set. Note the situation was already present prior to r334062 and is harmless. Reported by: pho Reviewed by: kib
# 748b15fc	22-May-2018	Mateusz Guzik <mjg@FreeBSD.org>	Move preemption handling out of critical_exit. In preperataion for making the enter/exit pair inline. Reviewed by: kib
# 8a36da99	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
# 32aef9ff	16-Nov-2017	Mateusz Guzik <mjg@FreeBSD.org>	sched: move panic handling code out of choosethread This avoids jumps in the common case of the kernel not being panicked.
# 3467f88c	22-Jan-2017	Konstantin Belousov <kib@FreeBSD.org>	Add comments explaining unobvious td_critnest adjustments in critical_exit(). Based on the discussion with: jhb Reviewed by: imp Sponsored by: The FreeBSD Foundation Differential revision: D9276 MFC after: 1 week
# a115fb62	22-Jan-2015	Hans Petter Selasky <hselasky@FreeBSD.org>	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438
# 1a26c3c0	15-Jan-2015	Hans Petter Selasky <hselasky@FreeBSD.org>	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste
# e68ccbe8	08-Dec-2012	Attilio Rao <attilio@FreeBSD.org>	Add a comment on why inlining critical_enter() may not be a good idea for the general case. Reviewed by: bde MFC after: 1 week
# 5e27a603	04-Dec-2011	Andriy Gapon <avg@FreeBSD.org>	critical_exit: ignore td_owepreempt if kdb_active is set calling mi_switch in such a context results in a recursion via kdb_switch Suggested by: jhb Reviewed by: jhb MFC after: 5 weeks
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 3aa6d94e	11-Jun-2010	John Baldwin <jhb@FreeBSD.org>	Update several places that iterate over CPUs to use CPU_FOREACH().
# 791d9a6e	24-Jun-2009	Jeff Roberson <jeff@FreeBSD.org>	- Use DPCPU for SCHED_STATS. This is somewhat awkward because the offset of the stat is not known until link time so we must emit a function to call SYSCTL_ADD_PROC rather than using SYSCTL_PROC directly. - Eliminate the atomic from SCHED_STAT_INC now that it's using per-cpu variables. Sched stats are always incremented while we're holding a spinlock so no further protection is required. Reviewed by: sam
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 681e4062	12-May-2008	Julian Elischer <julian@FreeBSD.org>	fix typo in runz_fuzz noticed by:Elijah Buck
# 8df78c41	16-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
# 9727e637	19-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Restore runq to manipulating threads directly by putting runq links and rqindex back in struct thread. - Compile kern_switch.c independently again and stop #include'ing it from schedulers. - Remove the ts_thread backpointers and convert most code to go from struct thread to struct td_sched. - Cleanup the ts_flags #define garbage that was causing us to sometimes do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD. - Export the kern.sched sysctl node in sysctl.h
# 52e95411	19-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Remove the unused and redundant sched_newproc() function. - Remove the unused and redundant sched_newthread() which peaks into scheduler private structures.
# a90f3f25	19-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Move maybe_preempt() from kern_switch.c to sched_4bsd.c. This is function is only used by 4bsd. - Create a new runq_choose_fuzz() function rather than polluting runq_choose() with 4BSD specific code. - Move the fuzz sysctl into sched_4bsd.c - Remove some dead code from kern_switch.c
# 237fdd78	16-Mar-2008	Robert Watson <rwatson@FreeBSD.org>	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# 6617724c	12-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 431f8906	13-Nov-2007	Julian Elischer <julian@FreeBSD.org>	generally we are interested in what thread did something as opposed to what process. Since threads by default have teh name of the process unless over-written with more useful information, just print the thread name instead.
# 5bce4ae3	08-Oct-2007	Jeff Roberson <jeff@FreeBSD.org>	- Fix ULE in kernels without PREEMPTION compiled in by always enabling the critical_exit() owepreempt check. ULE will always use owepreempt to preempt the idle thread. This change does not effect 4BSD since it will never set owepreempt without PREEMPTION enabled. - Remove some unused code from choosethread(). Discussed with: jhb Approved by: re
# c8790f5d	20-Sep-2007	Attilio Rao <attilio@FreeBSD.org>	Fix some entries in the locks static table of witness. In particular: - smp_tlb_mtx is no longer used, so it is axed. - smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in the table, however, has been the source of a false positive LOR reporting with the dt_lock. However, smp rendezvous lock would have had sched_lock there for older lock, so it wasn't still a leaf lock. - allpmaps is only used in ia32 architecture, so it is inserted in the appropriate stub. Addictionally: - kse_zombie_lock is no longer present, so its definition is axed out. - zombie_lock doesn't need to have an exported symbol, so just let's it be declared as static. Tested by: kris Approved by: jeff (mentor) Approved by: re
# b61ce5b0	16-Sep-2007	Jeff Roberson <jeff@FreeBSD.org>	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
# 67e20930	20-Aug-2007	Jeff Roberson <jeff@FreeBSD.org>	- Improve runq_findbit_from() which is used by ULE's circular queue. Mask of the bits we want to ignore on the first pass rather than doing a linear scan. This puts us within a few instructions of the cost of runq_findbit() and removes this function from the top of profiling output for context switch heavy workloads. Approved by: re
# 413ea6f5	03-Aug-2007	Jeff Roberson <jeff@FreeBSD.org>	- Set SW_PREEMPT when we preempt in critical_exit(). Approved by: re
# 56696bd1	19-Jul-2007	Jeff Roberson <jeff@FreeBSD.org>	- Remove explicit references to sched_lock. A simpler assert will do. Approved by: re
# 671f2709	12-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	- Garbage collect unused concurrency functions.
# 7b20fb19	04-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	Commit 1/14 of sched_lock decomposition. - Move all scheduler locking into the schedulers utilizing a technique similar to solaris's container locking. - A per-process spinlock is now used to protect the queue of threads, thread count, suspension count, p_sflags, and other process related scheduling fields. - The new thread lock is actually a pointer to a spinlock for the container that the thread is currently owned by. The container may be a turnstile, sleepqueue, or run queue. - thread_lock() is now used to protect access to thread related scheduling fields. thread_unlock() unlocks the lock and thread_set_lock() implements the transition from one lock to another. - A new "blocked_lock" is used in cases where it is not safe to hold the actual thread's lock yet we must prevent access to the thread. - sched_throw() and sched_fork_exit() are introduced to allow the schedulers to fix-up locking at these points. - Add some minor infrastructure for optionally exporting scheduler statistics that were invaluable in solving performance problems with this patch. Generally these statistics allow you to differentiate between different causes of context switches. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# ed0e8f2f	07-Feb-2007	Jeff Roberson <jeff@FreeBSD.org>	- Change types for necent runq additions to u_char rather than int. - Fix these types in ULE as well. This fixes bugs in priority index calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64. Reported by: kkenn using pho's stress2.
# f0393f06	23-Jan-2007	Jeff Roberson <jeff@FreeBSD.org>	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
# cd49bb70	03-Jan-2007	Jeff Roberson <jeff@FreeBSD.org>	- Don't pass a pointer into runq_choose_from(). The caller can adjust the index if it chooses to.
# 3fed7d23	04-Jan-2007	Jeff Roberson <jeff@FreeBSD.org>	- Add three new functions to support circular run queues. - runq_add_pri allows the caller to position the thread at any rqindex regardless of priority. - runq_choose_from() chooses the lowest priority thread starting from a given index. The index is updated with the rqindex of the chosen thread. This routine is used to pick the lowest priority relative to a given index. - runq_remove_idx() updates the index if the run queue that held the removed thread is now empty.
# 2da78e38	31-Dec-2006	Robert Watson <rwatson@FreeBSD.org>	Prefer a more traditional spelling of inhibited in comments and panic messages.
# ad1e7d28	05-Dec-2006	Julian Elischer <julian@FreeBSD.org>	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# 8460a577	26-Oct-2006	John Birrell <jb@FreeBSD.org>	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# b41f1452	13-Jun-2006	David Xu <davidxu@FreeBSD.org>	Add scheduler CORE, the work I have done half a year ago, recent, I picked it up again. The scheduler is forked from ULE, but the algorithm to detect an interactive process is almost completely different with ULE, it comes from Linux paper "Understanding the Linux 2.6.8.1 CPU Scheduler", although I still use same word "score" as a priority boost in ULE scheduler. Briefly, the scheduler has following characteristic: 1. Timesharing process's nice value is seriously respected, timeslice and interaction detecting algorithm are based on nice value. 2. per-cpu scheduling queue and load balancing. 3. O(1) scheduling. 4. Some cpu affinity code in wakeup path. 5. Support POSIX SCHED_FIFO and SCHED_RR. Unlike scheduler 4BSD and ULE which using fuzzy RQ_PPQ, the scheduler uses 256 priority queues. Unlike ULE which using pull and push, the scheduelr uses pull method, the main reason is to let relative idle cpu do the work, but current the whole scheduler is protected by the big sched_lock, so the benefit is not visible, it really can be worse than nothing because all other cpu are locked out when we are doing balancing work, which the 4BSD scheduelr does not have this problem. The scheduler does not support hyperthreading very well, in fact, the scheduler does not make the difference between physical CPU and logical CPU, this should be improved in feature. The scheduler has priority inversion problem on MP machine, it is not good for realtime scheduling, it can cause realtime process starving. As a result, it seems the MySQL super-smack runs better on my Pentium-D machine when using libthr, despite on UP or SMP kernel.
# 4bb0f51d	01-Jun-2006	Olivier Houchard <cognet@FreeBSD.org>	sched_rem() already sets ke->ke_state to KES_THREAD, so there's no need to redo it.
# 3f349776	28-Dec-2005	Alexander Kabaev <kan@FreeBSD.org>	Trim trailing whitespace.
# 1335c4df	18-Dec-2005	Nate Lawson <njl@FreeBSD.org>	Restore KTR_CRITICAL but conditionally compile it in as KTR_SCHED. Requested by: scottl, jhb
# 8615fd86	16-Dec-2005	Nate Lawson <njl@FreeBSD.org>	Clean up unused or poorly utilized KTR values. Remove KTR_FS, KTR_KGDB, and KTR_IO as they were never used. Remove KTR_CLK since it was only used for hardclock firing and use KTR_INTR there instead. Remove KTR_CRITICAL since it was only used for crit enter/exit and use KTR_CONTENTION instead.
# 3c424d14	02-Aug-2005	David Xu <davidxu@FreeBSD.org>	In adjustrunqueue(), add code to handle thread migrating case for ULE scheduler. In original code, local run queue of threaded ksegrp is corrupted if adjustrunqueue() is called while thread is migrating.
# 3ea6bbc5	09-Jun-2005	Stephan Uphoff <ups@FreeBSD.org>	Restore preemption of idle threads. Submitted by: jhb
# a3f2d842	09-Jun-2005	Stephan Uphoff <ups@FreeBSD.org>	Lots of whitespace cleanup. Fix for broken if condition. Submitted by: nate@
# f3a0f873	09-Jun-2005	Stephan Uphoff <ups@FreeBSD.org>	Fix some race conditions for pinned threads that may cause them to run on the wrong CPU. Add IPI support for preempting a thread on another CPU. MFC after:3 weeks
# d13ec713	23-May-2005	Stephan Uphoff <ups@FreeBSD.org>	Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.
# 503c2ea3	18-May-2005	Stephan Uphoff <ups@FreeBSD.org>	Fix a bug that caused preemption to happen for a thread in the same ksegrp with the same priority as the currently running thread. This can cause propagate_priority() to panic. Pointy hat to: ups
# 77918643	07-Apr-2005	Stephan Uphoff <ups@FreeBSD.org>	Sprinkle some volatile magic and rearrange things a bit to avoid race conditions in critical_exit now that it no longer blocks interrupts. Reviewed by: jhb
# c6a37e84	04-Apr-2005	John Baldwin <jhb@FreeBSD.org>	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
# 6220dcba	20-Mar-2005	Robert Watson <rwatson@FreeBSD.org>	Add a read-only kern.sched.preemption sysctl so that user space can tell if "options PREEMPTION" is compiled into the kernel.
# bc608306	17-Mar-2005	Robert Watson <rwatson@FreeBSD.org>	A further step on the journey of meaking panics and debugging more reliable: in the window between the beginning of panic() and entering the debugger, it's possible to receive interrupts. If we receive an interrupt, don't preempt if panicstr != NULL, as the system is in the process of failing, and the preempting thread is likely to stumble over the failure. The typical scenario is during the printf() in panic() prior to entering the debugger, but when running with a slower console type such as serial console. It could be that the panic string should be passed to the debugger to print, so that it can run from the debugger's environment rather than a regular kernel printf. Glanced at by: jhb
# 9454b2d8	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for copyright notices, minor format tweaks as necessary
# 85da7a56	25-Dec-2004	Jeff Roberson <jeff@FreeBSD.org>	- Define KTR points for KTR_SCHED.
# 7842f65e	14-Dec-2004	Jeff Roberson <jeff@FreeBSD.org>	- Garbage collect several unused members of struct kse and struce ksegrp. As best as I can tell, some of these were never used.
# 6db36923	20-Nov-2004	David Schultz <das@FreeBSD.org>	Remove local definitions of RANGEOF() and use __rangeof() instead. Also remove a few bogus casts.
# f42a43fa	07-Nov-2004	Robert Watson <rwatson@FreeBSD.org>	Add basic critical section tracing to KTR using event type KTR_CRITICAL. This generates a KTR event for each critical section entered and exited. It would be desirable to also log the filename and line number of the source entering or exiting the critical section, but this requires hacking up the critical section API, so I've not done that yet.
# b96741f4	16-Oct-2004	Scott Long <scottl@FreeBSD.org>	If a process needs to be swapped in, wakeup the swapper from within critical_exit as the process is getting scheduled to run. This is subotimal but for now avoid the LOR between the scheduler and the sleepq systems. This is a 5.3 candidate. Submitted by: davidxu MFC After: 3 days
# 7c71b645	13-Oct-2004	Stephan Uphoff <ups@FreeBSD.org>	Fix maybe_preempt_in_ksegrp for !SMP. Tested by: tegge Reviewed by: julian Approved by: sam (mentor) MFC after: 3 days
# 13e7430f	12-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Make !SMP kernels compile, and as far as I can tell, work again.
# 84f9d4b1	12-Oct-2004	Stephan Uphoff <ups@FreeBSD.org>	Prevent preemption in slot_fill. Implement preemption between threads in the same ksegp in out of slot situations to prevent priority inversion. Tested by: pho Reviewed by: jhb, julian Approved by: sam (mentor) MFC: ASAP
# 042b7b1a	09-Oct-2004	Julian Elischer <julian@FreeBSD.org>	Don't release the slot twice.. sched_rem() has already done it. Submitted by: stephan uphoff (ups at tree dot com) MFC after: 3 days
# c20c691b	05-Oct-2004	Julian Elischer <julian@FreeBSD.org>	When preempting a thread, put it back on the HEAD of its run queue. (Only really implemented in 4bsd) MFC after: 4 days
# d39063f2	05-Oct-2004	Julian Elischer <julian@FreeBSD.org>	Use some macros to trach available scheduler slots to allow easier debugging. MFC after: 4 days
# 8daa8c60	19-Sep-2004	David Schultz <das@FreeBSD.org>	The zone from which proc structures are allocated is marked UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should never be called. Move an assertion from proc_fini() to proc_dtor() and garbage-collect the rest of the unreachable code. I have retained vm_proc_dispose(), since I consider its disuse a bug.
# 14f0e2e9	16-Sep-2004	Julian Elischer <julian@FreeBSD.org>	clean up thread runq accounting a bit. MFC after: 3 days
# 9da3e923	15-Sep-2004	Julian Elischer <julian@FreeBSD.org>	e specific code to revert a partial add ot teh run queue, not remrunqueue() which can't handle a partially added thread. MFC after: 1 week
# e8807f22	14-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Oops accidentally removed #ifdef SCHED_4BSD as part of another commit This function is not yet used in ULE
# 1f9f5df6	13-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Commit a fix for some panics we've been seeing with preemption. MFC after: 2 days
# b2578c6c	13-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Add some kasserts
# 3389af30	10-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Add some code to allow threads to nominat a sibling to run if theyu are going to sleep. MFC after: 1 week
# 54983505	07-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Make debug printf less threatenning and make it only print out once. MFC after: 2 days
# 6a574b2a	06-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Don't do IPIs on behalf of interrupt threads. just punt straight on through to teh preemption code. Make a KASSSERT out of a condition that can no longer occur. MFC after: 1 week
# ed062c8d	04-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
# 44692526	02-Sep-2004	Julian Elischer <julian@FreeBSD.org>	remove unused code MFC after: 2 days
# 9923b511	02-Sep-2004	Scott Long <scottl@FreeBSD.org>	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.
# 6804a3ab	01-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Give the 4bsd scheduler the ability to wake up idle processors when there is new work to be done. MFC after: 5 days
# 2630e4c9	31-Aug-2004	Julian Elischer <julian@FreeBSD.org>	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days
# 6f96710c	27-Aug-2004	Peter Wemm <peter@FreeBSD.org>	Backout the previous backout (with scott's ok). sched_ule.c:1.122 is believed to fix the problem with ULE that this change triggered.
# 2384290c	19-Aug-2004	Scott Long <scottl@FreeBSD.org>	Revert the previous change. It works great for 4BSD but causes major problems for ULE. The reason is quite unknown and worrisome.
# 2c86298c	19-Aug-2004	Scott Long <scottl@FreeBSD.org>	In maybe_preempt(), ignore threads that are in an inconsistent state. This is an effective band-aid for at least some of the scheduler corruption seen recently. The real fix will involve protecting threads while they are inconsistent, and will come later. Submitted by: julian
# 0f4ad918	09-Aug-2004	Scott Long <scottl@FreeBSD.org>	Add a temporary debugging hack to detect a deadlock in setrunqueue(). This is here so that we can gather stats on the nature of the recent rash of hard lockups, and in this particular case panic the machine instead of letting it deadlock forever.
# 1a5cd27b	09-Aug-2004	Julian Elischer <julian@FreeBSD.org>	Make kg->kg_runnable actually count runnable threads in the ksegrp run queue instead of only doing it sometimes.. This is not used outdide of debugging code in the current code, but that will probably change.
# 732d9528	09-Aug-2004	Julian Elischer <julian@FreeBSD.org>	Increase the amount of data exported by KTR in the KTR_RUNQ setting. This extra data is needed to really follow what is going on in the threaded case.
# 44fe3c1f	06-Aug-2004	John Baldwin <jhb@FreeBSD.org>	Don't scare users with a warning about preemption being off when it isn't yet safe to have on by default.
# 1a8cfbc4	27-Jul-2004	Robert Watson <rwatson@FreeBSD.org>	Pass a thread argument into cpu_critical_{enter,exit}() rather than dereference curthread. It is called only from critical_{enter,exit}(), which already dereferences curthread. This doesn't seem to affect SMP performance in my benchmarks, but improves MySQL transaction throughput by about 1% on UP on my Xeon. Head nodding: jhb, bmilekic
# 18f480f8	23-Jul-2004	Scott Long <scottl@FreeBSD.org>	Remove the previous hack since it doesn't make a difference and is getting in the way of debugging.
# 9493183e	22-Jul-2004	Scott Long <scottl@FreeBSD.org>	Disable the PREEMPTION-enabled code in critical_exit() that encourages switching to a different thread. This is just a hack to try to improve stability some more, but likely points closer to the real culprit.
# 52eb8464	16-Jul-2004	John Baldwin <jhb@FreeBSD.org>	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
# 2d50560a	10-Jul-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Update for the KDB framework: o Make debugging code conditional upon KDB instead of DDB. o Call kdb_enter() instead of Debugger(). o Call kdb_backtrace() instead of db_print_backtrace() or backtrace(). kern_mutex.c: o Replace checks for db_active with checks for kdb_active and make them unconditional. kern_shutdown.c: o s/DDB_UNATTENDED/KDB_UNATTENDED/g o s/DDB_TRACE/KDB_TRACE/g o Save the TID of the thread doing the kernel dump so the debugger knows which thread to select as the current when debugging the kernel core file. o Clear kdb_active instead of db_active and do so unconditionally. o Remove backtrace() implementation. kern_synch.c: o Call kdb_reenter() instead of db_error().
# 8b44a2e2	02-Jul-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Unbreak build for the the !PREEMPTION case: don't define variables that aren't used in that case.
# 0c0b25ae	02-Jul-2004	John Baldwin <jhb@FreeBSD.org>	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
# b209e5e3	02-Feb-2004	Jeff Roberson <jeff@FreeBSD.org>	- style fixes to the critical_exit() KASSERT(). Submitted by: bde
# fca542bc	31-Jan-2004	Robert Watson <rwatson@FreeBSD.org>	Move KASSERT regarding td_critnest to after the value of td is set to curthread, to avoid warning and incorrect behavior. Hoped not to mind: jeff
# 6767c654	31-Jan-2004	Jeff Roberson <jeff@FreeBSD.org>	- Assert that td_critnest > 0 in critical_exit() to catch cases of unbalanced uses of the critical_* api.
# 09a4a69c	12-Dec-2003	Robert Watson <rwatson@FreeBSD.org>	Although sometimes to the uninitiated, it may seem like goup, KSEGOUP is actually spelt KSEGROUP. Go figure. Reported by: samy@kerneled.com
# 0d2a2989	17-Nov-2003	Peter Wemm <peter@FreeBSD.org>	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.
# 94816f6d	17-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- Remove the correct thread from the run queue in setrunqueue(). This fixes ULE + KSE.
# 7cf90fb3	16-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- Update the sched api. sched_{add,rem,clock,pctcpu} now all accept a td argument rather than a kse.
# 0e2a4d3a	14-Jun-2003	David Xu <davidxu@FreeBSD.org>	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# faaa20f6	21-May-2003	Julian Elischer <julian@FreeBSD.org>	When we are spilling threads out of the run queue during panic, make sure we keep the thread state variable consistent with its real state. i.e. Don't say it's on the run queue when it isn't. Also clarify the associated comment. Turns a double panic back to a single panic :-/ Approved by: re@ (jhb)
# cc66ebe2	02-Apr-2003	Peter Wemm <peter@FreeBSD.org>	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb
# 6ce75196	18-Mar-2003	David Xu <davidxu@FreeBSD.org>	Adjust code for userland preemptive. Userland can set a quantum in kse_mailbox to schedule an upcall, this is useful for userland timeout routine, for example pthread_cond_timedwait(). Also extract upcall scheduling code from kse_reassign and create a new function called thread_switchout to include these code. Reviewed by: julain
# d03c79ee	08-Mar-2003	David Xu <davidxu@FreeBSD.org>	Cosmetic change, make it QUEUE_MACRO_DEBUG friendly
# ac2e4153	26-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
# 4f6cfa45	19-Feb-2003	David Xu <davidxu@FreeBSD.org>	Update comments to reflect new KSE code.
# 02bbffaf	17-Feb-2003	David Xu <davidxu@FreeBSD.org>	Move code for detecting PS_NEEDSIGCHK into thread_schedule_upcall, I think it is a better place to handle it.
# 4a338afd	17-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
# 5215b187	16-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Split the struct kse into struct upcall and struct kse. struct kse will soon be visible only to schedulers. This greatly simplifies much the KSE code. Submitted by: davidxu
# 6f8132a8	31-Jan-2003	Julian Elischer <julian@FreeBSD.org>	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# 0dbb100b	26-Jan-2003	David Xu <davidxu@FreeBSD.org>	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
# 67f7c1bb	19-Jan-2003	Julian Elischer <julian@FreeBSD.org>	Remove a KASSERT that can now happen and add a missing setrunnable.
# 93a7aa79	27-Dec-2002	Julian Elischer <julian@FreeBSD.org>	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.
# 24c5baae	14-Oct-2002	Julian Elischer <julian@FreeBSD.org>	Did you ever notice how stupid bugs show up much clearer when you see them in a commit message?
# 1f955e2d	14-Oct-2002	Julian Elischer <julian@FreeBSD.org>	Tidy up the scheduler's code for changing the priority of a thread. Logically pretty much a NOP.
# b43179fb	11-Oct-2002	Jeff Roberson <jeff@FreeBSD.org>	- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
# 48bfcddd	08-Oct-2002	Julian Elischer <julian@FreeBSD.org>	Round out the facilty for a 'bound' thread to loan out its KSE in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do. Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand". DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project) Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.
# 5da2b58a	02-Oct-2002	David Xu <davidxu@FreeBSD.org>	set ke_bound to NULL when kse owner thread becomes runnable. Reviewed by: julian (mentor)
# 9eb1fdea	29-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Implement basic KSE loaning. This stops a hread that is blocked in BOUND mode from stopping another thread from completing a syscall, and this allows it to release its resources etc. Probably more related commits to follow (at least one I know of) Initial concept by: julian, dillon Submitted by: davidxu
# 33c06e1d	22-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Indentation does not define a block.. you need breces {} as well.. also add a mutex assert. (threaded path only) Submitted by: davidxu
# 4f0db5e0	15-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
# 71fad9fd	11-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Completely redo thread states. Reviewed by: davidxu@freebsd.org
# 472be958	29-Aug-2002	Julian Elischer <julian@FreeBSD.org>	Rejig the code to figure out estcpu and work out how long a KSEGRP has been idle. What was there before was surprisingly ALMOST correct. Peter and I fried our brains on this for a couple of hours figuring out what this actually means in the context of multiple threads. Reviewed by: peter@freebsd.org
# 9eb881f8	30-Jul-2002	Seigo Tanimura <tanimura@FreeBSD.org>	- Optimize wakeup() and its friends; if a thread waken up is being swapped in, we do not have to ask for the scheduler thread to do that. - Assert that a process is not swapped out in runq functions and swapout(). - Introduce thread_safetoswapout() for readability. - In swapout_procs(), perform a test that may block (check of a thread working on its vm map) first. This lets us call swapout() with the sched_lock held, providing a better atomicity.
# fe799533	16-Jul-2002	Andrew Gallatin <gallatin@FreeBSD.org>	Allow alphas to do crashdumps: Refuse to run anything in choosethread() after a panic which is not an interrupt thread, or the thread which caused the panic. Also, remove panicstr checks from msleep() and from cv_wait() in order to allow threads to go to sleep and yeild the cpu to the panicing thread, or to an interrupt thread which might be doing the crashdump. Reviewed by: jhb (and it was mostly his idea too)
# c3b98db0	13-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Thinking about it I came to the conclusion that the KSE states were incorrectly formulated. The correct states should be: IDLE: On the idle KSE list for that KSEG RUNQ: Linked onto the system run queue. THREAD: Attached to a thread and slaved to whatever state the thread is in. This means that most places where we were adjusting kse state can go away as it is just moving around because the thread is.. The only places we need to adjust the KSE state is in transition to and from the idle and run queues. Reviewed by: jhb@freebsd.org
# 40e55026	12-Jul-2002	Julian Elischer <julian@FreeBSD.org>	also set the KSE state for the idle KSE/thread case.
# 33d7ad1a	12-Jul-2002	John Baldwin <jhb@FreeBSD.org>	Set the thread state of the newly chosen to run thread to TDS_RUNNING in choosethread() in MI C code instead of doing it in in assembly in all the various cpu_switch() functions. This fixes problems on ia64 and sparc64. Reviewed by: julian, peter, benno Tested on: i386, alpha, sparc64
# 5e3da64e	11-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Remove debugging code that I originally only wanted to be there for a couple of days after merge. Reminded with pointy stick by: jhb
# e602ba25	29-Jun-2002	Julian Elischer <julian@FreeBSD.org>	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 2f9267ec	20-Jun-2002	Peter Wemm <peter@FreeBSD.org>	Move the "- 1" into the RQB_FFS(mask) macro itself so that implementations can provide a base zero ffs function if they wish. This changes #define RQB_FFS(mask) (ffs64(mask)) foo = RQB_FFS(mask) - 1; to #define RQB_FFS(mask) (ffs64(mask) - 1) foo = RQB_FFS(mask); On some platforms we can get the "- 1" for free, eg: those that use the C code for ffs64(). Reviewed by: jake (in principle)
# d2ac2316	24-May-2002	Jake Burkholder <jake@FreeBSD.org>	Make the run queue parameters machine dependent. Optimize 64 bit architectures by using a 64 bit word for the bit array which keeps track of non-empty queues. Reviewed by: peter
# 0cce52f8	07-May-2002	Jake Burkholder <jake@FreeBSD.org>	Remove runq_findproc. This never worked right in the first place and can be prohibitively expensive.
# 182da820	01-Apr-2002	Matthew Dillon <dillon@FreeBSD.org>	Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter() and cpu_critical_exit() and moves associated critical prototypes into their own header file, <arch>/<arch>/critical.h, which is only included by the three MI source files that need it. Backout and re-apply improperly comitted syntactical cleanups made to files that were still under active development. Backout improperly comitted program structure changes that moved localized declarations to the top of two procedures. Partially re-apply one of the program structure changes to move 'mask' into an intermediate block rather then in three separate sub-blocks to make the code more readable. Re-integrate bug fixes that Jake made to the sparc64 code. Note: In general, developers should not gratuitously move declarations out of sub-blocks. They are where they are for reasons of structure, grouping, readability, compiler-localizability, and to avoid developer-introduced bugs similar to several found in recent years in the VFS and VM code. Reviewed by: jake
# d74ac681	26-Mar-2002	Matthew Dillon <dillon@FreeBSD.org>	Compromise for critical()/cpu_critical() recommit. Cleanup the interrupt disablement assumptions in kern_fork.c by adding another API call, cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it from MI to MD. Temporarily move cpu_critical() from <arch>/include/cpufunc.h to <arch>/<arch>/critical.c (stage-2 will clean this up). Implement interrupt deferral for i386 that allows interrupts to remain enabled inside critical sections. This also fixes an IPI interlock bug, and requires uses of icu_lock to be enclosed in a true interrupt disablement. This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized, and will move cpu_critical() into its own header file(s) + other things. This commit may break non-i386 architectures in trivial ways. This should be temporary. Reviewed by: core Approved by: core
# e97c3e3d	06-Mar-2002	Dag-Erling Smørgrav <des@FreeBSD.org>	Rename runq_find() to runq_findproc(), and hide it behind #ifdef DIAGNOSTIC, as it can have a severe impact on performance under high load, and the bug it was meant to catch was fixed ages ago.
# 181df8c9	26-Feb-2002	Matthew Dillon <dillon@FreeBSD.org>	revert last commit temporarily due to whining on the lists.
# f96ad4c2	26-Feb-2002	Matthew Dillon <dillon@FreeBSD.org>	STAGE-1 of 3 commit - allow (but do not require) interrupts to remain enabled in critical sections and streamline critical_enter() and critical_exit(). This commit allows an architecture to leave interrupts enabled inside critical sections if it so wishes. Architectures that do not wish to do this are not effected by this change. This commit implements the feature for the I386 architecture and provides a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For now you can turn the sysctl on and off at any time in order to test the architectural changes or track down bugs. This commit is just the first stage. Some areas of the code, specifically the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will be cleaned up in the STAGE-2 commit when the critical_() functions are moved entirely into MD files. The following changes have been made: critical_enter() and critical_exit() for I386 now simply increment and decrement curthread->td_critnest. They no longer disable hard interrupts. When critical_exit() decrements the counter to 0 it effectively calls a routine to deal with whatever interrupts were deferred during the time the code was operating in a critical section. Other architectures are unaffected. * fork_exit() has been conditionalized to remove MD assumptions for the new code. Old code will still use the old MD assumptions in regards to hard interrupt disablement. In STAGE-2 this will be turned into a subroutine call into MD code rather then hardcoded in MI code. The new code places the burden of entering the critical section in the trampoline code where it belongs. * I386: interrupts are now enabled while we are in a critical section. The interrupt vector code has been adjusted to deal with the fact. If it detects that we are in a critical section it currently defers the interrupt by adding the appropriate bit to an interrupt mask. * In order to accomplish the deferral, icu_lock is required. This is i386-specific. Thus icu_lock can only be obtained by mainline i386 code while interrupts are hard disabled. This change has been made. * Because interrupts may or may not be hard disabled during a context switch, cpu_switch() can no longer simply assume that PSL_I will be in a consistent state. Therefore, it now saves and restores eflags. * FAST INTERRUPT PROVISION. Fast interrupts are currently deferred. The intention is to eventually allow them to operate either while we are in a critical section or, if we are able to restrict the use of sched_lock, while we are not holding the sched_lock. * ICU and APIC vector assembly for I386 cleaned up. The ICU code has been cleaned up to match the APIC code in regards to format and macro availability. Additionally, the code has been adjusted to deal with deferred interrupts. * Deferred interrupts use a per-cpu boolean int_pending, and masks ipending, spending, and fpending. Being per-cpu variables it is not currently necessary to lock; bus cycles modifying them. Note that the same mechanism will enable preemption to be incorporated as a true software interrupt without having to further hack up the critical nesting code. * Note: the old critical_enter() code in kern/kern_switch.c is currently #ifdef to be compatible with both the old and new methodology. In STAGE-2 it will be moved entirely to MD code. Performance issues: One of the purposes of this commit is to enhance critical section performance, specifically to greatly reduce bus overhead to allow the critical section code to be used to protect per-cpu caches. These caches, such as Jeff's slab allocator work, can potentially operate very quickly making the effective savings of the new critical section code's performance very significant. The second purpose of this commit is to allow architectures to enable certain interrupts while in a critical section. Specifically, the intention is to eventually allow certain FAST interrupts to operate rather then defer. The third purpose of this commit is to begin to clean up the critical_enter()/critical_exit()/cpu_critical_enter()/ cpu_critical_exit() API which currently has serious cross pollution in MI code (in fork_exit() and ast() for example). The fourth purpose of this commit is to provide a framework that allows kernel-preempting software interrupts to be implemented cleanly. This is currently used for two forward interrupts in I386. Other architectures will have the choice of using this infrastructure or building the functionality directly into critical_enter()/ critical_exit(). Finally, this commit is designed to greatly improve the flexibility of various architectures to manage critical section handling, software interrupts, preemption, and other highly integrated architecture-specific details.
# 2c100766	11-Feb-2002	Julian Elischer <julian@FreeBSD.org>	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# 7e1f6dfe	17-Dec-2001	John Baldwin <jhb@FreeBSD.org>	Modify the critical section API as follows: - The MD functions critical_enter/exit are renamed to start with a cpu_ prefix. - MI wrapper functions critical_enter/exit maintain a per-thread nesting count and a per-thread critical section saved state set when entering a critical section while at nesting level 0 and restored when exiting to nesting level 0. This moves the saved state out of spin mutexes so that interlocking spin mutexes works properly. - Most low-level MD code that used critical_enter/exit now use cpu_critical_enter/exit. MI code such as device drivers and spin mutexes use the MI wrappers. Note that since the MI wrappers store the state in the current thread, they do not have any return values or arguments. - mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is assigned to curthread->td_savecrit during fork_exit(). Tested on: i386, alpha
# 6a494eeb	17-Sep-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Change p into ke->ke_proc, this was hidden behind INVARIANTS.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# f583b1d9	04-Jul-2001	John Baldwin <jhb@FreeBSD.org>	Spelling fix in a KASSERT: runq_chose -> runq_choose.
# f34fa851	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
# 6fe01250	14-Mar-2001	Peter Wemm <peter@FreeBSD.org>	Jake essentially rewrote this. It is not by any stretch of the imagination a derivative of what I did before.
# 9cbd0393	11-Mar-2001	Dag-Erling Smørgrav <des@FreeBSD.org>	Assert that the process we're trying to enqueue isn't already there.
# 3a3f6082	08-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Add a new informative KASSERT to ensure that a process is in the SRUN state before we return it to cpu_switch().
# f32ded2f	24-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	- Assert that the proc to return is not NULL in runq_choose the same as runq_remove. - bzero the whole struct runq in runq_init just in case its not statically allocated.
# d5a08a60	11-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# ef73ae4b	09-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
# 35e0e5b3	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# f6a0af80	15-Sep-2000	John Baldwin <jhb@FreeBSD.org>	Idle processes are always runnable, so let them state at SRUN.
# 4a6404df	11-Sep-2000	John Baldwin <jhb@FreeBSD.org>	Fix some printf format string warnings due to sizeof(int) != sizeof(long) on the alpha.
# 0384fff8	06-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# 36e9f877	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it). A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated. Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion. This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur. This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later. Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch
# 42cef09b	19-Aug-1999	Peter Wemm <peter@FreeBSD.org>	Fix a typo and a bug. - One RTP_PRIO_REALTIME was meant to be RTP_PRIO_IDLE. - RTP_PRIO_FIFO was not handled. - Move the usual case first for setrunqueue() etc.
# dba6c5a6	18-Aug-1999	Peter Wemm <peter@FreeBSD.org>	Extract the next runnable process selection out of cpu_switch() into a fairly machine independent C routine. gcc actually does a pretty good job of this. Reviewed by: msmith (in principle)