Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_condvar.c
Revision	Date	Author	Comments
# a58813fd	08-Mar-2024	Mark Johnston <markj@FreeBSD.org>	ktrace: Fix the build when options KTRACE is not configured MFC after: 1 week Reported by: John Nielsen <lists@jnielsen.net>
# 6b353101	18-Jan-2024	Olivier Certner <olce@FreeBSD.org>	SCHEDULER_STOPPED(): Rely on a global variable A commit from 2012 (5d7380f8e34f0083, r228424) introduced 'td_stopsched', on the ground that a global variable would cause all CPUs to have a copy of it in their cache, and consequently of all other variables sharing the same cache line. This is really a problem only if that cache line sees relatively frequent modifications. This was unlikely to be the case back then because nearby variables are almost never modified as well. In any case, today we have a new tool at our disposal to ensure that this variable goes into a read-mostly section containing frequently-accessed variables ('__read_frequently'). Most of the cache lines covering this section are likely to always be in every CPU cache. This makes the second reason stated in the commit message (ensuring the field is in the same cache line as some lock-related fields, since these are accessed in close proximity) moot, as well as the second order effect of requiring an additional line to be present in the cache (the one containing the new 'scheduler_stopped' boolean, see below). From a pure logical point of view, whether the scheduler is stopped is a global state and is certainly not a per-thread quality. Consequently, remove 'td_stopsched', which immediately frees a byte in 'struct thread'. Currently, the latter's size (and layout) stays unchanged, but some of the later re-orderings will probably benefit from this removal. Available bytes at the original position for 'td_stopsched' have been made explicit with the addition of the '_td_pad0' member. Store the global state in the new 'scheduler_stopped' boolean, which is annotated with '__read_frequently'. Replace uses of SCHEDULER_STOPPED_TD() with SCHEDULER_STOPPER() and remove the former as it is now unnecessary. Reviewed by: markj, kib Approved by: markj (mentor) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D43572
# a5ef95cd	14-Jan-2024	Mark Johnston <markj@FreeBSD.org>	condvar: Fix a user-after-free in _cv_wait() when ktrace is enabled When a thread wakes up after sleeping on a CV, it must not dereference the CV structure, as it may already have been freed. At least ZFS relies on this invariant, see commit c636f94bd2ff15be5b904939872b4bce31456c18 for example. Thus, when logging context-switch events, copy the wmesg into a stack buffer while it is still safe to do so, and log that after waking up. While here, move the initial ktrcsw() call later, after assertions and the SCHEDULER_STOPPED_TD() condition are checked. Reported by: syzkaller Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D43450
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# 63ca9ea4	09-Jul-2021	Alexander Motin <mav@FreeBSD.org>	Use sleepq_signal(SLEEPQ_DROP) in cv_signal(). Same as wakeup_one()/wakeup_any() commit before it reduces the lock hold time and so contention. MFC after: 1 week
# 8a36da99	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
# 91fa4707	16-Feb-2017	Mateusz Guzik <mjg@FreeBSD.org>	Introduce SCHEDULER_STOPPED_TD for use when the thread pointer was already read Sprinkle in few places.
# 5b7d9ae2	06-Sep-2016	Mateusz Guzik <mjg@FreeBSD.org>	cv: do a lockless check for no waiters in cv_signal and cv_broadcastpri In case of some consumers like zfs there are no waiters vast majority of the time Reviewed by: jhb MFC after: 1 week
# e3043798	29-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/kern: spelling fixes in comments. No functional change.
# b4f1d267	31-Mar-2016	John Baldwin <jhb@FreeBSD.org>	Rework handling of thread sleeps before timers are working. Previously, calls to sleep() and cv_wait*() immediately returned during early boot. Instead, permit threads that request a sleep without a timeout to sleep as wakeup() works during early boot. Sleeps with timeouts are harder to emulate without working timers, so just punt and panic explicitly if any thread tries to use those before timers are working. Any threads that depend on timeouts should either wait until SI_SUB_KICK_SCHEDULER to start or they should use DELAY() until timers are available. Until APs are started earlier this should be a no-op as other kthreads shouldn't get a chance to start running until after timers are working regardless of when they were created. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5724
# 7b8cfe26	01-Mar-2016	John Baldwin <jhb@FreeBSD.org>	Use SCHEDULER_STOPPED() in cv_wait() instead of checking panicstr. Reviewed by: kib MFC after: 1 month Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5516
# 11f9ca69	08-Jan-2016	Mark Johnston <markj@FreeBSD.org>	Prevent cv_waiters wraparound. r282971 attempted to fix this problem by decrementing cv_waiters after waking up from sleeping on a condition variable, but this can result in a use-after-free if the CV is freed before all woken threads have had a chance to run. Instead, avoid incrementing cv_waiters past INT_MAX, and have cv_signal() explicitly check for sleeping threads once cv_waiters has reached this bound. Reviewed by: jhb MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4822
# c636f94b	21-May-2015	John Baldwin <jhb@FreeBSD.org>	Revert r282971. It depends on condvar consumers not destroying condvars until all threads sleeping on a condvar have resumed execution after being awakened. However, there are cases where that guarantee is very hard to provide.
# 5c894ee2	15-May-2015	John Baldwin <jhb@FreeBSD.org>	Previously, cv_waiters was only updated by cv_signal or cv_wait. If a thread awakened due to a time out, then cv_waiters was not decremented. If INT_MAX threads timed out on a cv without an intervening cv_broadcast, then cv_waiters could overflow. To fix this, have each sleeping thread decrement cv_waiters when it resumes. Note that previously cv_waiters was protected by the sleepq chain lock. However, that lock is not held when threads resume from sleep. In addition, the interlock is also not always reacquired after resuming (cv_wait_unlock), nor is it always held by callers of cv_signal() or cv_broadcast(). Instead, use atomic ops to update cv_waiters. Since the sleepq chain lock is still held on every increment, it should still be safe to compare cv_waiters against zero while holding the lock in the wakeup routines as the only way the race should be lost would result in extra calls to sleepq_signal() or sleepq_broadcast(). Differential Revision: https://reviews.freebsd.org/D2427 Reviewed by: benno Reported by: benno (wrap of cv_waiters in the field) MFC after: 2 weeks
# a115fb62	22-Jan-2015	Hans Petter Selasky <hselasky@FreeBSD.org>	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438
# 1a26c3c0	15-Jan-2015	Hans Petter Selasky <hselasky@FreeBSD.org>	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste
# 7faf4d90	20-Sep-2013	Davide Italiano <davide@FreeBSD.org>	Fix lc_lock/lc_unlock() support for rmlocks held in shared mode. With current lock classes KPI it was really difficult because there was no way to pass an rmtracker object to the lock/unlock routines. In order to accomplish the task, modify the aforementioned functions so that they can return (or pass as argument) an uinptr_t, which is in the rm case used to hold a pointer to struct rm_priotracker for current thread. As an added bonus, this fixes rm_sleep() in the rm shared case, which right now can communicate priotracker structure between lc_unlock()/lc_lock(). Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)
# 46153735	03-Mar-2013	Davide Italiano <davide@FreeBSD.org>	MFcalloutng: Extend condvar(9) KPI introducing sbt variant of cv_timedwait. This rely on the previously committed sleepq_set_timeout_sbt(). Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil
# 0a15e5d3	13-Sep-2012	Attilio Rao <attilio@FreeBSD.org>	Remove all the checks on curthread != NULL with the exception of some MD trap checks (eg. printtrap()). Generally this check is not needed anymore, as there is not a legitimate case where curthread != NULL, after pcpu 0 area has been properly initialized. Reviewed by: bde, jhb MFC after: 1 week
# 88bf5036	20-Apr-2012	John Baldwin <jhb@FreeBSD.org>	Include the associated wait channel message for context switch ktrace records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 52255936	26-Feb-2009	Ed Schouten <ed@FreeBSD.org>	Remove unused variables `p' and unneeded assignments of `rval'. Found by: LLVM's scan-build
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 7d43ca69	25-Sep-2008	John Baldwin <jhb@FreeBSD.org>	- Don't do a WITNESS_SAVE() on the interlock if it is Giant in the condition variable wait routines. DROP_GIANT() already manages that state in the Giant interlock case. - Assert that Giant is held when it is passed as a sleep interlock.
# 414e7679	07-Aug-2008	John Baldwin <jhb@FreeBSD.org>	Permit Giant to be passed as the explicit interlock either to msleep/mtx_sleep or the various cv_wait() routines. Currently, the "unlock" behavior of PDROP and cv_wait_unlock() with Giant is not permitted as it is will be confusing since Giant is fully unrecursed and unlocked during a thread sleep. This is handy for subsystems which wish to allow unlocked drivers to continue to use Giant such as CAM, the new TTY layer, and the new USB stack. CAM currently uses a hack that I told Scott to use because I really didn't want to permit this behavior, and the TTY and USB patches both have various patches to permit this. MFC after: 2 weeks
# da7bbd2c	05-Aug-2008	John Baldwin <jhb@FreeBSD.org>	If a thread that is swapped out is made runnable, then the setrunnable() routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
# c5aa6b58	12-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Pass the priority argument from sleep() into sleepq and down into sched_sleep(). This removes extra thread_lock() acquisition and allows the scheduler to decide what to do with the static boost. - Change the priority arguments to cv_ to match sleepq/msleep/etc. where 0 means no priority change. Catch -1 in cv_broadcastpri() and convert it to 0 for now. - Set a flag when sleeping in a way that is compatible with swapping since direct priority comparisons are meaningless now. - Add a sysctl to ule, kern.sched.static_boost, that defaults to on which controls the boost behavior. Turning it off gives better performance in some workloads but needs more investigation. - While we're modifying sleepq, change signal and broadcast to both return with the lock held as the lock was held on enter. Reviewed by: jhb, peter
# d72e80f0	04-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	Commit 2/14 of sched_lock decomposition. - Adapt sleepqueues to the new thread_lock() mechanism. - Delay assigning the sleep queue spinlock as the thread lock until after we've checked for signals. It is illegal for a thread to return in mi_switch() with any lock assigned to td_lock other than the scheduler locks. - Change sleepq_catch_signals() to do the switch if necessary to simplify the callers. - Simplify timeout handling now that locking a sleeping thread has the side-effect of locking the sleepqueue. Some previous races are no longer possible. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 9fa7ce0f	08-May-2007	John Baldwin <jhb@FreeBSD.org>	Fix a potential LOR with sx_sleep() and cv_wait() with sx locks by 1) adding the thread to the sleepq via sleepq_add() before dropping the lock, and 2) dropping the sleepq lock around calls to lc_unlock() for sleepable locks (i.e. locks that use sleepq's in their implementation).
# 8f27b08e	21-Mar-2007	John Baldwin <jhb@FreeBSD.org>	Rename the cv_wait() functions to _cv_wait() and change their second argument from a mutex to a lock_object. Add cv_wait() wrapper macros that accept either a mutex, rwlock, or sx lock as the second argument and convert it to a lock_object and then call _cv_wait(). Basically, the visible difference is that you can now use rwlocks and sx locks with condition variables using the same API as with mutexes.
# aa89d8cd	21-Mar-2007	John Baldwin <jhb@FreeBSD.org>	Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes, rwlocks, and sx locks to 'lock_object'.
# 503916a7	21-Mar-2007	John Baldwin <jhb@FreeBSD.org>	Don't use cv_wait_unlock() to implement cv_wait(). Instead, implement cv_wait() fully and add missing KTRACE context switch traces.
# 6cbb70e2	15-Dec-2006	Kip Macy <kmacy@FreeBSD.org>	Add second sleep queue so that sx and lockmgr can have separate sleep queues for shared and exclusive acquisitions Submitted by: Attilio Rao Approved by: jhb
# 7ee07175	15-Nov-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Change sleepq_add(9) argument from 'struct mtx ' to 'struct lock_object ', which allows to use it with different kinds of locks. For example it allows to implement Solaris conditions variables which will be used in ZFS port on top of sx(9) locks. Reviewed by: jhb
# c008d517	22-Feb-2006	David Xu <davidxu@FreeBSD.org>	Fix a sleep queue race for KSE thread. Reviewed by: jhb
# 94f0972b	15-Feb-2006	David Xu <davidxu@FreeBSD.org>	Fix a long standing race between sleep queue and thread suspension code. When a thread A is going to sleep, it calls sleepq_catch_signals() to detect any pending signals or thread suspension request, if nothing happens, it returns without holding process lock or scheduler lock, this opens a race window which allows thread B to come in and do process suspension work, however since A is still at running state, thread B can do nothing to A, thread A continues, and puts itself into actually sleeping state, but B has never seen it, and it sits there forever until B is woken up by other threads sometimes later(this can be very long delay or never happen). Fix this bug by forcing sleepq_catch_signals to return with scheduler lock held. Fix sleepq_abort() by passing it an interrupted code, previously, it worked as wakeup_one(), and the interruption can not be identified correctly by sleep queue code when the sleeping thread is resumed. Let thread_suspend_check() returns EINTR or ERESTART, so sleep queue no longer has to use SIGSTOP as a hack to build a return value. Reviewed by: jhb MFC after: 1 week
# 92f44a3f	11-Dec-2005	Craig Rodrigues <rodrigc@FreeBSD.org>	Contributions from XFS for FreeBSD project: - Implement cv_wait_unlock() method which has semantics compatible with the sv_wait() method in IRIX. For cv_wait_unlock(), the lock must be held before entering the function, but is not held when the function is exited. - Implement the existing cv_wait() function in terms of cv_wait_unlock(). Submitted by: kan Feedback from: jhb, trhodes, Christoph Hellwig <hch at infradead dot org>
# 2ff0e645	12-Oct-2004	John Baldwin <jhb@FreeBSD.org>	Refine the turnstile and sleep queue interfaces just a bit: - Add a new _lock() call to each API that locks the associated chain lock for a lock_object pointer or wait channel. The _lookup() functions now require that the chain lock be locked via _lock() when they are called. - Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup the associated queue structure internally via _lookup() rather than accepting a pointer from the caller. For turnstiles, this means that the actual lookup of the turnstile in the hash table is only done when the thread actually blocks rather than being done on each loop iteration in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup() is no longer used outside of the sleep queue code except to implement an assertion in cv_destroy(). - Change sleepq_broadcast() and sleepq_signal() to require that the chain lock is already required. For condition variables, this lets the cv_broadcast() and cv_signal() functions lock the sleep queue chain lock while testing the waiters count. This means that the waiters count internal to condition variables is no longer protected by the interlock mutex and cv_broadcast() and cv_signal() now no longer require that the interlock be held when they are called. This lets consumers of condition variables drop the lock before waking other threads which can result in fewer context switches. MFC after: 1 month
# 007ddf7e	19-Aug-2004	John Baldwin <jhb@FreeBSD.org>	Now that the return value semantics of cv's for multithreaded processes have been unified with that of msleep(9), further refine the sleepq interface and consolidate some duplicated code: - Move the pre-sleep checks for theaded processes into a thread_sleep_check() function in kern_thread.c. - Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c. Specifically, if a thread is awakened by something other than a signal while checking for signals before going to sleep, clear TDF_SINTR in sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in that edge case during an interruptible sleep. Also, fix sleepq_check_signals() to properly handle the condition if TDF_SINTR is clear rather than requiring the callers of the sleepq API to notice this edge case and call a non-_sig variant of sleepq_wait(). - Clarify the flags arguments to sleepq_add(), sleepq_signal() and sleepq_broadcast() by creating an explicit submask for sleepq types. Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of 0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and move the setting of TDF_SINTR to sleepq_add() if this flag is set rather than sleepq_catch_signals(). Note that it is the caller's responsibility to ensure that sleepq_catch_signals() is called if and only if this flag is passed to the preceeding sleepq_add(). Note that this also removes a sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures that for an interruptible sleep, TDF_SINTR is always set when TD_ON_SLEEPQ() is true.
# 274f8f48	10-Aug-2004	John Baldwin <jhb@FreeBSD.org>	Synchronize the extra SA threading checks and return value handling of condition variables with that of msleep(). Reviewed by: davidxu
# a5471e4e	28-Jun-2004	John Baldwin <jhb@FreeBSD.org>	Remove the signal_caught argument from sleepq_timedwait() as it was effectively always zero.
# 9000d57d	06-Apr-2004	John Baldwin <jhb@FreeBSD.org>	Associate a simple count of waiters with each condition variable. The count is protected by the mutex that protects the condition, so the count does not require any extra locking or atomic operations. It serves as an optimization to avoid calling into the sleepqueue code at all if there are no waiters. Note that the count can get temporarily out of sync when threads sleeping on a condition variable time out or are aborted. However, it doesn't hurt to call the sleepqueue code for either a signal or a broadcast when there are no waiters, and the count is never out of sync in the opposite direction unless we have more than INT_MAX sleeping threads.
# 1ed3e44f	12-Mar-2004	John Baldwin <jhb@FreeBSD.org>	- Remove old sleep queues. - Remove sleepqueue argument from sleepq_set_timeout() since it is not used.
# 44f3b092	27-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Switch the sleep/wakeup and condition variable implementations to use the sleep queue interface: - Sleep queues attempt to merge some of the benefits of both sleep queues and condition variables. Having sleep qeueus in a hash table avoids having to allocate a queue head for each wait channel. Thus, struct cv has shrunk down to just a single char * pointer now. However, the hash table does not hold threads directly, but queue heads. This means that once you have located a queue in the hash bucket, you no longer have to walk the rest of the hash chain looking for threads. Instead, you have a list of all the threads sleeping on that wait channel. - Outside of the sleepq code and the sleep/cv code the kernel no longer differentiates between cv's and sleep/wakeup. For example, calls to abortsleep() and cv_abort() are replaced with a call to sleepq_abort(). Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and cv_waitq_remove() have been replaced with calls to sleepq_remove(). - The sched_sleep() function no longer accepts a priority argument as sleep's no longer inherently bump the priority. Instead, this is soley a propery of msleep() which explicitly calls sched_prio() before blocking. - The TDF_ONSLEEPQ flag has been dropped as it was never used. The associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been dropped and replaced with a single explicit clearing of td_wchan. TD_SET_ONSLEEPQ() would really have only made sense if it had taken the wait channel and message as arguments anyway. Now that that only happens in one place, a macro would be overkill.
# 29bcc451	24-Jan-2004	Jeff Roberson <jeff@FreeBSD.org>	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
# 512824f8	09-Nov-2003	Seigo Tanimura <tanimura@FreeBSD.org>	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current
# 34178711	01-Jul-2003	David Xu <davidxu@FreeBSD.org>	Allow SA process unblocks a thread blocked in condition variable. Reviewed by: deischen
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 90af4afa	13-May-2003	John Baldwin <jhb@FreeBSD.org>	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
# 53862173	17-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Test the P_WEXIT flag while already hold the proc lock instead of right after dropping it.
# 0d49bb4b	31-Mar-2003	Julian Elischer <julian@FreeBSD.org>	Do NOT return from an non-interruptable cv_wait, falsely claiming to have timed out. I don't know what I was thinking..
# 26306795	04-Mar-2003	John Baldwin <jhb@FreeBSD.org>	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().
# b89bc9e6	27-Feb-2003	Hartmut Brandt <harti@FreeBSD.org>	When a process has been waiting on a condition variable or mutex the td_wmesg field in the thread structure points to the description string of the condition variable or mutex. If the condvar or the mutex had been initialized from a loadable module that was unloaded in the meantime, td_wmesg may now point to invalid memory. Retrieving the process table now may panic the kernel (or access junk). Setting the td_wmesg field to NULL after unblocking on the condvar/mutex prevents this panic. PR: kern/47408 Approved by: jake (mentor)
# 4e997f4b	25-Jan-2003	Jeff Roberson <jeff@FreeBSD.org>	- Call sched_sleep() instead of rolling our own in cv_waitq_add().
# 93a7aa79	27-Dec-2002	Julian Elischer <julian@FreeBSD.org>	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.
# 9d102777	25-Oct-2002	Julian Elischer <julian@FreeBSD.org>	More work on the interaction between suspending and sleeping threads. Also clean up some code used with 'single-threading'. Reviewed by: davidxu
# 48bfcddd	08-Oct-2002	Julian Elischer <julian@FreeBSD.org>	Round out the facilty for a 'bound' thread to loan out its KSE in specific situations. The owner thread must be blocked, and the borrower can not proceed back to user space with the borrowed KSE. The borrower will return the KSE on the next context switch where teh owner wants it back. This removes a lot of possible race conditions and deadlocks. It is consceivable that the borrower should inherit the priority of the owner too. that's another discussion and would be simple to do. Also, as part of this, the "preallocatd spare thread" is attached to the thread doing a syscall rather than the KSE. This removes the need to lock the scheduler when we want to access it, as it's now "at hand". DDB now shows a lot mor info for threaded proceses though it may need some optimisation to squeeze it all back into 80 chars again. (possible JKH project) Upcalls are now "bound" threads, but "KSE Lending" now means that other completing syscalls can be completed using that KSE before the upcall finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is one of the highest priorities in the KSE system.) The upcall when it happens will present all the completed syscalls to the KSE for selection.
# 71fad9fd	11-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Completely redo thread states. Reviewed by: davidxu@freebsd.org
# 67bdda97	02-Sep-2002	David Xu <davidxu@FreeBSD.org>	fix bogus CTR3 message. Reviewed by: julian@freebsd.org (mentor)
# d13947c3	28-Aug-2002	Peter Wemm <peter@FreeBSD.org>	updatepri() works on a ksegrp (where the scheduling parameters are), so directly give it the ksegrp instead of the thread. The only thing it used to use in the thread was the ksegrp. Reviewed by: julian
# b8e45df7	30-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Remove code that removes thread from sleep queue before adding it to a condvar wait. We do not have asleep() any more so this can not happen.
# 13326777	30-Jul-2002	Seigo Tanimura <tanimura@FreeBSD.org>	In endtsleep() and cv_timedwait_end(), a thread marked TDF_TIMEOUT may be swapped out. Do not put such the thread directly back to the run queue. Spotted by: David Xu <davidx@viasoft.com.cn> While I am here, s/PS_TIMEOUT/TDF_TIMEOUT/.
# 9eb881f8	30-Jul-2002	Seigo Tanimura <tanimura@FreeBSD.org>	- Optimize wakeup() and its friends; if a thread waken up is being swapped in, we do not have to ask for the scheduler thread to do that. - Assert that a process is not swapped out in runq functions and swapout(). - Introduce thread_safetoswapout() for readability. - In swapout_procs(), perform a test that may block (check of a thread working on its vm map) first. This lets us call swapout() with the sched_lock held, providing a better atomicity.
# 1d7b9ed2	29-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Create a new thread state to describe threads that would be ready to run except for the fact tha they are presently swapped out. Also add a process flag to indicate that the process has started the struggle to swap back in. This will be needed for the case where multiple threads start the swapin action top a collision. Also add code to stop a process fropm being swapped out if one of the threads in this process is actually off running on another CPU.. that might hurt... Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
# fe799533	16-Jul-2002	Andrew Gallatin <gallatin@FreeBSD.org>	Allow alphas to do crashdumps: Refuse to run anything in choosethread() after a panic which is not an interrupt thread, or the thread which caused the panic. Also, remove panicstr checks from msleep() and from cv_wait() in order to allow threads to go to sleep and yeild the cpu to the panicing thread, or to an interrupt thread which might be doing the crashdump. Reviewed by: jhb (and it was mostly his idea too)
# d5cb7e14	01-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Fix failure to correctly transition back to sleep mode.
# e602ba25	29-Jun-2002	Julian Elischer <julian@FreeBSD.org>	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 9ba7fe1b	06-Jun-2002	John Baldwin <jhb@FreeBSD.org>	- Catch up to new ktrace API. - ktrace trace points in msleep() and cv_wait() no longer need Giant.
# 628855e7	29-May-2002	Julian Elischer <julian@FreeBSD.org>	CURSIG() is not a macro so rename it cursig(). Obtained from: KSE tree
# 4bc37205	23-Apr-2002	Jeffrey Hsu <hsu@FreeBSD.org>	The cold and panicstr variables do not need to be protected by sched_lock. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Reviewed by: jake & jhb in principle
# e7876c09	29-Mar-2002	Dan Moschuk <dan@FreeBSD.org>	Nuke CV_DEBUG in favour of INVARIANTS. Approved by: jhb
# 2c100766	11-Feb-2002	Julian Elischer <julian@FreeBSD.org>	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# c86b6ff5	05-Jan-2002	John Baldwin <jhb@FreeBSD.org>	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
# a48740b6	09-Dec-2001	David E. O'Brien <obrien@FreeBSD.org>	Update to C99, s/__FUNCTION__/__func__/.
# 66f769fe	18-Sep-2001	Peter Wemm <peter@FreeBSD.org>	Add missing ; in last commit Pointy-hat-to: jhb
# 9ef3a985	18-Sep-2001	John Baldwin <jhb@FreeBSD.org>	Use a 'p' variable instead of repetitively indirecting td->td_proc for signal things that are still per-process and won't be per-thread.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 91a4536f	21-Aug-2001	John Baldwin <jhb@FreeBSD.org>	- Fix a bug in the previous workaround for the tsleep/endtsleep race. callout_stop() would fail in two cases: 1) The timeout was currently executing, and 2) The timeout had already executed. We only needed to work around the race for 1). We caught some instances of 2) via the PS_TIMEOUT flag, however, if endtsleep() fired after the process had been woken up but before it had resumed execution, PS_TIMEOUT would not be set, but callout_stop() would fail, so we would block the process until endtsleep() resumed it. Except that endtsleep() had already run and couldn't resume it. This adds a new flag PS_TIMOFAIL to indicate the case of 2) when PS_TIMEOUT isn't set. - Implement this race fix for condition variables as well. Tested by: sos
# d652b3d9	05-Jul-2001	Jake Burkholder <jake@FreeBSD.org>	Backout mwakeup, etc.
# 9316aed2	03-Jul-2001	Jake Burkholder <jake@FreeBSD.org>	Implement mwakeup, mwakeup_one, cv_signal_drop and cv_broadcast_drop. These take an additional mutex argument, which is dropped before any processes are made runnable. This can avoid contention on the mutex if the processes would immediately acquire it, and is done in such a way that wakeups will not be lost. Reviewed by: jhb
# 87f9ffb8	22-Jun-2001	John Baldwin <jhb@FreeBSD.org>	- Lock CURSIG with the proc lock and don't release the proc lock until after grabbing the sched lock to close a race. - Lock ktrace points with Giant.
# fb919e4d	01-May-2001	Mark Murray <markm@FreeBSD.org>	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# c739adbf	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Pass in a pointer to the mutex's lock_object as the second argument to WITNESS_SLEEP() rather than the mutex itself.
# 19284646	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Rework the witness code to work with sx locks as well as mutexes. - Introduce lock classes and lock objects. Each lock class specifies a name and set of flags (or properties) shared by all locks of a given type. Currently there are three lock classes: spin mutexes, sleep mutexes, and sx locks. A lock object specifies properties of an additional lock along with a lock name and all of the extra stuff needed to make witness work with a given lock. This abstract lock stuff is defined in sys/lock.h. The lockmgr constants, types, and prototypes have been moved to sys/lockmgr.h. For temporary backwards compatability, sys/lock.h includes sys/lockmgr.h. - Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin locks held. By making this per-cpu, we do not have to jump through magic hoops to deal with sched_lock changing ownership during context switches. - Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with proc->p_sleeplocks, which is a list of held sleep locks including sleep mutexes and sx locks. - Add helper macros for logging lock events via the KTR_LOCK KTR logging level so that the log messages are consistent. - Add some new flags that can be passed to mtx_init(): - MTX_NOWITNESS - specifies that this lock should be ignored by witness. This is used for the mutex that blocks a sx lock for example. - MTX_QUIET - this is not new, but you can pass this to mtx_init() now and no events will be logged for this lock, so that one doesn't have to change all the individual mtx_lock/unlock() operations. - All lock objects maintain an initialized flag. Use this flag to export a mtx_initialized() macro that can be safely called from drivers. Also, we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness performs the corresponding checks using the initialized flag. - The lock order reversal messages have been improved to output slightly more accurate file and line numbers.
# 28aa95b6	06-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Use the proc lock to protect access to p_sigacts->ps_sigintr.
# d5a08a60	11-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 981808d1	24-Jan-2001	John Baldwin <jhb@FreeBSD.org>	Catch up to P_FOO -> PS_FOO changes in proc flags.
# 238510fc	15-Jan-2001	Jason Evans <jasone@FreeBSD.org>	Implement condition variables.