History log of /freebsd-current/sys/kern/kern_sx.c
Revision Date Author Comments
# 6b353101 18-Jan-2024 Olivier Certner <olce@FreeBSD.org>

SCHEDULER_STOPPED(): Rely on a global variable

A commit from 2012 (5d7380f8e34f0083, r228424) introduced
'td_stopsched', on the ground that a global variable would cause all
CPUs to have a copy of it in their cache, and consequently of all other
variables sharing the same cache line.

This is really a problem only if that cache line sees relatively
frequent modifications. This was unlikely to be the case back then
because nearby variables are almost never modified as well. In any
case, today we have a new tool at our disposal to ensure that this
variable goes into a read-mostly section containing frequently-accessed
variables ('__read_frequently'). Most of the cache lines covering this
section are likely to always be in every CPU cache. This makes the
second reason stated in the commit message (ensuring the field is in the
same cache line as some lock-related fields, since these are accessed in
close proximity) moot, as well as the second order effect of requiring
an additional line to be present in the cache (the one containing the
new 'scheduler_stopped' boolean, see below).

From a pure logical point of view, whether the scheduler is stopped is
a global state and is certainly not a per-thread quality.

Consequently, remove 'td_stopsched', which immediately frees a byte in
'struct thread'. Currently, the latter's size (and layout) stays
unchanged, but some of the later re-orderings will probably benefit from
this removal. Available bytes at the original position for
'td_stopsched' have been made explicit with the addition of the
'_td_pad0' member.

Store the global state in the new 'scheduler_stopped' boolean, which is
annotated with '__read_frequently'.

Replace uses of SCHEDULER_STOPPED_TD() with SCHEDULER_STOPPER() and
remove the former as it is now unnecessary.

Reviewed by: markj, kib
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43572


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 8bd79453 23-Oct-2023 Mateusz Guzik <mjg@FreeBSD.org>

sx: fixup copy pasto in previous

Spotted by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")


# c35f527e 23-Oct-2023 Mateusz Guzik <mjg@FreeBSD.org>

sx: unset td_wantedlock around going to sleep

Otherwise it can crash in sleepq_wait_sig -> sleepq_catch_signals ->
sig_ast_checksusp -> thread_suspend_check due to a mutex acquire.

Reported by: pho
Sponsored by: Rubicon Communications, LLC ("Netgate")


# 7530de77 22-Oct-2023 Mateusz Guzik <mjg@FreeBSD.org>

thread: add td_wantedlock

This enables obtaining lock information threads are actively waiting for
while sampling. Without the change one would only see a bunch of calls
to lock_delay(), where the stacktrace often does not reveal what the
lock might be.

Note this is not the same as lock profiling, which only produces data
for cases which wait for locks.

struct thread already has a td_lockname field, but I did not use it
because it has different semantics -- denotes when the thread is off
cpu. At the same time it could not be converted to hold a lock_object
pointer because non-curthread access would no longer be guaranteed to be
safe -- by the time it reads the pointer the lock might have been taken,
released and the object containing it freed.

Sample usage with dtrace:
rm /tmp/out.kern_stacks ; dtrace -x stackframes=100 -n 'profile-997 { @[curthread->td_wantedlock != NULL ? stringof(curthread->td_wantedlock->lo_name) : stringof("\n"), stack()] = count(); }' -o /tmp/out.kern_stacks

This also facilitates addition of lock information to traces produced by
hwpmc.

Note: spinlocks are not supported at the moment.

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# de709b14 21-Feb-2023 Mateusz Guzik <mjg@FreeBSD.org>

sx: whack set-but-not-used warn in _sx_slock_hard

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 6a467cc5 23-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

lockprof: pass lock type as an argument instead of reading the spin flag


# f1f98706 18-Apr-2021 Warner Losh <imp@FreeBSD.org>

Minor style cleanup

We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.

MFC After: 3 days
Sponsored by: Netflix


# f90d57b8 23-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

locks: push lock_delay_arg_init calls down

Minor cleanup to skip doing them when recursing on locks and so that
they can act on found lock value if need be.


# 094c148b 23-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

sx: drop spurious volatile keyword


# c795344f 23-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

locks: fix a long standing bug for primitives with kdtrace but without spinning

In such a case the second argument to lock_delay_arg_init was NULL which was
immediately causing a null pointer deref.

Since the sructure is only used for spin count, provide a dedicate routine
initializing it.

Reported by: andrew


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 2e77cad1 04-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

locks: add default delay struct

Use it for all primitives. This makes everything fit in 8 bytes.


# 6b8dd26e 04-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

locks: convert delay times to u_short

int is just a waste of space for this purpose.


# fea73412 24-Dec-2019 Conrad Meyer <cem@FreeBSD.org>

sleep(9), sleepqueue(9): const'ify wchan pointers

_sleep(9), wakeup(9), sleepqueue(9), et al do not dereference or modify the
channel pointers provided in any way; they are merely used as intptrs into a
dictionary structure to match waiters with wakers. Correctly annotate this
such that _sleep() and wakeup() may be used on const pointers without
invoking ugly patterns like __DECONST(). Plumb const through all of the
underlying sleepqueue bits.

No functional change.

Reviewed by: rlibby
Discussed with: kib, markj
Differential Revision: https://reviews.freebsd.org/D22914


# befd3e35 05-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

sx: check for SX_LOCK_SHARED | SX_LOCK_WRITE_SPINNER when exclusive-locking

First, this removes a spurious difference compared to rw locks.
More importantly though this avoids a trip through sleepq code if the lock
happens to be caught in this state.


# f26db694 05-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

sx: retire SX_NOADAPTIVE

The flag is not used by anything for years and supporting it requires an
explicit read from the lock when entering slow path.

Flag value is left unused on purpose.

Sponsored by: The FreeBSD Foundation


# d54474e6 13-Nov-2018 Eric van Gyzen <vangyzen@FreeBSD.org>

Make no assertions about lock state when the scheduler is stopped.

Change the assert paths in rm, rw, and sx locks to match the lock
and unlock paths. I did this for mutexes in r306346.

Reported by: Travis Lane <tlane@isilon.com>
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon


# ee252fc9 22-May-2018 Mateusz Guzik <mjg@FreeBSD.org>

sx: fixup a braino in r334024

If a thread waiting on sx dropped Giant it would not be properly
reacquired on exit from the routine, later resulting in panics
indicating Giant is not held (when it should be).

The bug was not present in the original patch sent to pho, I wittingly
added it just prior to the commit and only smoke-tested it.

Reported by: pho


# 2466d12b 22-May-2018 Mateusz Guzik <mjg@FreeBSD.org>

sx: port over writer starvation prevention measures from rwlock

A constant stream of readers could completely starve writers and this is not
a hypothetical scenario.

The 'poll2_threads' test from the will-it-scale suite reliably starves writers
even with concurrency < 10 threads.

The problem was run into and diagnosed by dillon@backplane.com

There was next to no change in lock contention profile during -j 128 pkg build,
despite an sx lock being at the top.

Tested by: pho


# 1dce110f 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

fix uninitialized variable warning in reader locks


# e0e259a8 10-Apr-2018 Mateusz Guzik <mjg@FreeBSD.org>

locks: extend speculative spin waiting for readers to drain

Now that 10 years have passed since the original limit of 10000 was
committed, bump it a little bit.

Spinning waiting for writers is semi-informed in the sense that we always
know if the owner is running and base the decision to spin on that.
However, no such information is provided for read-locking. In particular
this means that it is possible for a write-spinner to completely waste cpu
time waiting for the lock to be released, while the reader holding it was
preempted and is now waiting for the spinner to go off cpu.

Nonetheless, in majority of cases it is an improvement to spin instead of
instantly giving up and going to sleep.

The current approach is pretty simple: snatch the number of current readers
and performs that many pauses before checking again. The total number of
pauses to execute is limited to 10k. If the lock is still not free by
that time, go to sleep.

Given the previously noted problem of not knowing whether spinning makes
any sense to begin with the new limit has to remain rather conservative.
But at the very least it should also be related to the machine. Waiting
for writers uses parameters selected based on the number of activated
hardware threads. The upper limit of pause instructions to be executed
in-between re-reads of the lock is typically 16384 or 32678. It was
selected as the limit of total spins. The lower bound is set to
already present 10000 as to not change it for smaller machines.

Bumping the limit reduces system time by few % during benchmarks like
buildworld, buildkernel and others. Tested on 2 and 4 socket machines
(Broadwell, Skylake).

Figuring out how to make a more informed decision while not pessimizing
the fast path is left as an exercise for the reader.


# 09bdec20 17-Mar-2018 Mateusz Guzik <mjg@FreeBSD.org>

locks: slightly depessimize lockstat

The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.

Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.

This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total

So note this is still a significant hit.

LOCK_PROFILING results are not affected.


# a8e747c5 04-Mar-2018 Mateusz Guzik <mjg@FreeBSD.org>

sx: don't do an atomic op in upgrade if it cananot succeed

The code already pays the cost of reading the lock to obtain the waiters
flag. Checking whether there is more than one reader is not a problem and
avoids dirtying the line.

This also fixes a small corner case: if waiters were to show up between
reading the flag and upgrading the lock, the operation would fail even
though it should not. No correctness change here though.


# d94df98c 04-Mar-2018 Mateusz Guzik <mjg@FreeBSD.org>

locks: fix a corner case in r327399

If there were exactly rowner_retries/asx_retries (by default: 10) transitions
between read and write state and the waiters still did not get the lock, the
next owner -> reader transition would result in the code correctly falling
back to turnstile/sleepq where it would incorrectly think it was waiting
for a writer and decide to leave turnstile/sleepq to loop back. From this
point it would take ts/sq trips until the lock gets released.

The bug sometimes manifested itself in stalls during -j 128 package builds.

Refactor the code to fix the bug, while here remove some of the gratituous
differences between rw and sx locks.


# c505b599 02-Mar-2018 Mateusz Guzik <mjg@FreeBSD.org>

sx: fix adaptive spinning broken in r327397

The condition was flipped.

In particular heavy multithreaded kernel builds on zfs started suffering
due to nested sx locks.

For instance make -s -j 128 buildkernel:

before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total
after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total

ps.
.-'---`-. .-'---`-.
,' `. ,' `.
| \ | \
| \ | \
\ _ \ \ _ \
,\ _ ,'-,/-)\ ,\ _ ,'-,/-)\
( * \ \,' ,' ,'-) ( * \ \,' ,' ,'-)
`._,) -',-') `._,) -',-')
\/ ''/ \/ ''/
) / / ) / /
/ ,'-' / ,'-'


# e4ccf57f 16-Feb-2018 Mateusz Guzik <mjg@FreeBSD.org>

Undo LOCK_PROFILING pessimisation after r313454 and r313455

With the option used to compile the kernel both sx and rw shared ops would
always go to the slow path which added avoidable overhead even when the
facility is disabled.

Furthermore the increased time spent doing uncontested shared lock acquire
would be bogusly added to total wait time, somewhat skewing the results.

Restore old behaviour of going there only when profiling is enabled.

This change is a no-op for kernels without LOCK_PROFILING (which is the
default).


# 1b54ffc8 13-Jan-2018 Mateusz Guzik <mjg@FreeBSD.org>

sx: retry hard shared unlock just like in r327905 for rwlocks


# efa9f177 30-Dec-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: adjust loop limit check when waiting for readers

The check was for the exact value, but since the counter started being
incremented by the number of readers it could have jumped over.


# cde25ed4 30-Dec-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: fix up non-smp compilation after r327397


# 28f1a9e3 30-Dec-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: re-check the reason to go to sleep after locking sleepq/turnstile

In both rw and sx locks we always go to sleep if the lock owner is not
running.

We do spin for some time if the lock is read-locked.

However, if we decide to go to sleep due to the lock owner being off cpu
and after sleepq/turnstile gets acquired the lock is read-locked, we should
fallback to the aforementioned wait.


# fb106123 30-Dec-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: read the SX_NOADAPTIVE flag and Giant ownership only once

These used to be read multiple times when waiting for the lock the become
free, which had the potential to issue completely avoidable traffic.


# 8a36da99 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# cec17473 25-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: change sunlock to wake waiters up if it locked sleepq

sleepq is only locked if the curhtread is the last reader. By the time
the lock gets acquired new ones could have arrived. The previous code
would unlock and loop back. This results spurious relocking of sleepq.

This is a step towards xadd-based unlock routine.


# 93118b62 25-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: retry turnstile/sleepq loops on failed cmpset

In order to go to sleep threads set waiter flags, but that can spuriously
fail e.g. when a new reader arrives. Instead of unlocking everything and
looping back, re-evaluate the new state while still holding the lock necessary
to go to sleep.


# dbe4541d 24-Nov-2017 Mark Johnston <markj@FreeBSD.org>

Have lockstat:::sx-release fire only after the lock state has changed.

MFC after: 1 week


# 26d94f99 24-Nov-2017 Mark Johnston <markj@FreeBSD.org>

Add a missing lockstat:::sx-downgrade probe.

We were returning without firing the probe when the lock had no shared
waiters.

MFC after: 1 week


# 2d96bd88 22-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: unbreak debug after r326107

An assertion was modified to use the found value, but it was not updated to
handle a race where blocked threads appear after the entrance to the func.

Move the assertion down to the area protected with sleepq lock where the
lock is read anyway. This does not affect coverage of the assertion and
is consistent with what rw locks are doing.

Reported by: Shawn Webb


# b584eb2e 22-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: pass the found lock value to unlock slow path

This avoids an explicit read later.

While here whack the cheaply obtainable 'tid' argument.


# 013c0b49 22-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: remove the file + line argument from internal primitives when not used

The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed)
for many locks even in production kernels.

While here whack the tid argument from wlock hard and xlock hard.

There is no kbi change of any sort - "external" primitives still accept the
pair.


# 284194f1 17-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: fix compilation issues without SMP or KDTRACE_HOOKS


# bc24577c 16-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: perform a minor cleanup of the unlock slowpath

No functional changes.


# ae7d25a4 16-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: pull up PMC_SOFT_CALLs out of slow path loops


# e41d6166 16-Nov-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: avoid branches if in the slow path if lockstat is disabled


# d07e22cd 05-Oct-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: take the number of readers into account when waiting

Previous code would always spin once before checking the lock. But a lock
with e.g. 6 readers is not going to become free in the duration of once spin
even if they start draining immediately.

Conservatively perform one for each reader.

Note that the total number of allowed spins is still extremely small and is
subject to change later.

MFC after: 1 week


# 20a15d17 05-Oct-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: partially tidy up waiting on readers

spin first instant of instantly re-readoing and don't re-read after
spinning is finished - the state is already known.

Note the code is subject to significant changes later.

MFC after: 1 week


# 574adb65 06-Sep-2017 Mateusz Guzik <mjg@FreeBSD.org>

Sprinkle __read_frequently on few obvious places.

Note that some of annotated variables should probably change their types
to something smaller, preferably bit-sized.


# 704cb42f 19-Jun-2017 Mark Johnston <markj@FreeBSD.org>

Fix the !TD_IS_IDLETHREAD(curthread) locking assertions.

Most of the lock slowpaths assert that the calling thread isn't an idle
thread. However, this may not be true if the system has panicked, and in
some cases the assertion appears before a SCHEDULER_STOPPED() check.

MFC after: 3 days
Sponsored by: Dell EMC Isilon


# a2101806 28-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: ensure proper barriers are used with atomic ops when necessary

Unclear how, but the locking routine for mutexes was using the *release*
barrier instead of acquire. This must have been either a copy-pasto or bad
completion.

Going through other uses of atomics shows no barriers in:
- upgrade routines (addressed in this patch)
- sections protected with turnstile locks - this should be fine as necessary
barriers are in the worst case provided by turnstile unlock

I would like to thank Mark Millard and andreast@ for reporting the problem and
testing previous patches before the issue got identified.

ps.
.-'---`-.
,' `.
| \
| \
\ _ \
,\ _ ,'-,/-)\
( * \ \,' ,' ,'-)
`._,) -',-')
\/ ''/
) / /
/ ,'-'

Hardware provided by: IBM LTC


# b247fd39 19-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: make trylock routines check for 'unowned' value

Since fcmpset can fail without lock contention e.g. on arm, it was possible
to get spurious failures when the caller was expecting the primitive to succeed.

Reported by: mmel


# 5c5df0d9 18-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: clean up trylock primitives

In particular thius reduces accesses of the lock itself.


# 0108a980 17-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: fix compilation on UP kernels after r313855

sx primitives use inlines as opposed to macros. Change the tested condition
to LOCK_DEBUG which covers the case, but is slightly overzelaous.

Reported by: kib


# ffd5c94c 16-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: let primitives for modules unlock without always goging to the slsow path

It is only needed if the LOCK_PROFILING is enabled. It has to always check if
the lock is about to be released which requires an avoidable read if the option
is not specified..


# afa39f7a 16-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: remove SCHEDULER_STOPPED checks from primitives for modules

They all fallback to the slow path if necessary and the check is there.

This means a panicked kernel executing code from modules will be able to
succeed doing actual lock/unlock, but this was already the case for core code
which has said primitives inlined.


# 3b3cf014 09-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: tidy up unlock fallback paths

Update comments to note these functions are reachable if lockstat is
enabled.

Check if the lock has any bits set before attempting unlock, which saves
an unnecessary atomic operation.


# 834f70f3 08-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: implement slock/sunlock fast path

See r313454.


# 8e5a3e9a 07-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: change backoff to exponential

Previous implementation would use a random factor to spread readers and
reduce chances of starvation. This visibly reduces effectiveness of the
mechanism.

Switch to the more traditional exponential variant. Try to limit starvation
by imposing an upper limit of spins after which spinning is half of what
other threads get. Note the mechanism is turned off by default.

Reviewed by: kib (previous version)


# c1aaf63c 06-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

locks: fix recursion support after recent changes

When a relevant lockstat probe is enabled the fallback primitive is called with
a constant signifying a free lock. This works fine for typical cases but breaks
with recursion, since it checks if the passed value is that of the executing
thread.

Read the value if necessary.


# 6ebb77b6 05-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: move lockstat handling out of inline primitives

See r313275 for details.


# 3ae56ce9 04-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: add witness support missed in r313272


# 9d2e4290 04-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: uninline slock/sunlock

Shared locking routines explicitly read the value and test it. If the
change attempt fails, they fall back to a regular function which would
retry in a loop.

The problem is that with many concurrent readers the risk of failure is pretty
high and even the value returned by fcmpset is very likely going to be stale
by the time the loop in the fallback routine is reached.

Uninline said primitives. It gives a throughput increase when doing concurrent
slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s.

Interestingly, rwlock primitives are already not inlined.


# fa474043 04-Feb-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: switch to fcmpset

Discussed with: jhb
Tested by: pho (previous version)


# 29051116 27-Jan-2017 Mateusz Guzik <mjg@FreeBSD.org>

Sprinkle __read_mostly on backoff and lock profiling code.

MFC after: 1 month


# c5f61e6f 18-Jan-2017 Mateusz Guzik <mjg@FreeBSD.org>

sx: reduce lock accesses similarly to r311172

Discussed with: jhb
Tested by: pho (previous version)


# c365a293 09-Dec-2016 Mark Johnston <markj@FreeBSD.org>

Return a non-NULL owner only if the lock is exclusively held in owner_sx().

Fix some whitespace bugs while here.

MFC after: 2 weeks


# 0453ade5 03-Aug-2016 Mateusz Guzik <mjg@FreeBSD.org>

locks: fix sx compilation on mips after r303643

The kernel.h header is required for the SYSINIT macro, which apparently
was present on amd64 by accident.

Reported by: kib


# fa5000a4 01-Aug-2016 Mateusz Guzik <mjg@FreeBSD.org>

locks: fix compilation for KDTRACE_HOOKS && !ADAPTIVE_* case

Reported by: Michael Butler <imb protected-networks.net>


# 04126895 01-Aug-2016 Mateusz Guzik <mjg@FreeBSD.org>

locks: fix up ifdef guards introduced in r303643

Both sx and rwlocks had copy-pasted ADAPTIVE_MUTEXES instead of the correct
define.

MFC after: 1 week


# 1ada9041 01-Aug-2016 Mateusz Guzik <mjg@FreeBSD.org>

Implement trivial backoff for locking primitives.

All current spinning loops retry an atomic op the first chance they get,
which leads to performance degradation under load.

One classic solution to the problem consists of delaying the test to an
extent. This implementation has a trivial linear increment and a random
factor for each attempt.

For simplicity, this first thouch implementation only modifies spinning
loops where the lock owner is running. spin mutexes and thread lock were
not modified.

Current parameters are autotuned on boot based on mp_cpus.

Autotune factors are very conservative and are subject to change later.

Reviewed by: kib, jhb
Tested by: pho
MFC after: 1 week


# 61852185 30-Jul-2016 Mateusz Guzik <mjg@FreeBSD.org>

locks: change sleep_cnt and spin_cnt types to u_int

Both variables are uint64_t, but they only count spins or sleeps.
All reasonable values which we can get here comfortably hit in 32-bit range.

Suggested by: kib
MFC after: 1 week


# e0c45af9 30-Jul-2016 Mateusz Guzik <mjg@FreeBSD.org>

sx: increment spin_cnt before cpu_spinwait in xlock

The change is a no-op only done for consistency with the rest of the file.


# fc4f686d 01-Jun-2016 Mateusz Guzik <mjg@FreeBSD.org>

Microoptimize locking primitives by avoiding unnecessary atomic ops.

Inline version of primitives do an atomic op and if it fails they fallback to
actual primitives, which immediately retry the atomic op.

The obvious optimisation is to check if the lock is free and only then proceed
to do an atomic op.

Reviewed by: jhb, vangyzen


# ce1c953e 01-Aug-2015 Mark Johnston <markj@FreeBSD.org>

Don't modify curthread->td_locks unless INVARIANTS is enabled.

This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.

Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3205


# de2c95cc 19-Jul-2015 Mark Johnston <markj@FreeBSD.org>

Consistently use a reader/writer flag for lockstat probes in rwlock(9) and
sx(9), rather than using the probe function name to determine whether a
given lock is a read lock or a write lock. Update lockstat(1) accordingly.


# 32cd0147 19-Jul-2015 Mark Johnston <markj@FreeBSD.org>

Implement the lockstat provider using SDT(9) instead of the custom provider
in lockstat.ko. This means that lockstat probes now have typed arguments and
will utilize SDT probe hot-patching support when it arrives.

Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D2993


# e2b25737 17-Jul-2015 Mark Johnston <markj@FreeBSD.org>

Pass the lock object to lockstat_nsecs() and return immediately if
LO_NOPROFILE is set. Some timecounter handlers acquire a spin mutex, and
we don't want to recurse if lockstat probes are enabled.

PR: 201642
Reviewed by: avg
MFC after: 3 days


# 076dd8eb 12-Jun-2015 Andriy Gapon <avg@FreeBSD.org>

several lockstat improvements

0. For spin events report time spent spinning, not a loop count.
While loop count is much easier and cheaper to obtain it is hard
to reason about the reported numbers, espcially for adaptive locks
where both spinning and sleeping can happen.
So, it's better to compare apples and apples.

1. Teach lockstat about FreeBSD rw locks.
This is done in part by changing the corresponding probes
and in part by changing what probes lockstat should expect.

2. Teach lockstat that rw locks are adaptive and can spin on FreeBSD.

3. Report lock acquisition events for successful rw try-lock operations.

4. Teach lockstat about FreeBSD sx locks.
Reporting of events for those locks completely mirrors
rw locks.

5. Report spin and block events before acquisition event.
This is behavior documented for the upstream, so it makes sense to stick
to it. Note that because of FreeBSD adaptive lock implementations
both the spin and block events may be reported for the same acquisition
while the upstream reports only one of them.

Differential Revision: https://reviews.freebsd.org/D2727
Reviewed by: markj
MFC after: 17 days
Relnotes: yes
Sponsored by: ClusterHQ


# fd07ddcf 13-Dec-2014 Dmitry Chagin <dchagin@FreeBSD.org>

Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9).
A _NEW flag passed to _init_flags() to avoid check for double-init.

Differential Revision: https://reviews.freebsd.org/D1208
Reviewed by: jhb, wblock
MFC after: 1 Month


# 2cba8dd3 04-Nov-2014 John Baldwin <jhb@FreeBSD.org>

Add a new thread state "spinning" to schedgraph and add tracepoints at the
start and stop of spinning waits in lock primitives.


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# cf6b879f 22-Sep-2013 Davide Italiano <davide@FreeBSD.org>

Consistently use the same value to indicate exclusively-held and
shared-held locks for all the primitives in lc_lock/lc_unlock routines.
This fixes the problems introduced in r255747, which indeed introduced an
inversion in the logic.

Reported by: many
Tested by: bdrewery, pho, lme, Adam McDougall, O. Hartmann
Approved by: re (glebius)


# 7faf4d90 20-Sep-2013 Davide Italiano <davide@FreeBSD.org>

Fix lc_lock/lc_unlock() support for rmlocks held in shared mode. With
current lock classes KPI it was really difficult because there was no
way to pass an rmtracker object to the lock/unlock routines. In order
to accomplish the task, modify the aforementioned functions so that
they can return (or pass as argument) an uinptr_t, which is in the rm
case used to hold a pointer to struct rm_priotracker for current
thread. As an added bonus, this fixes rm_sleep() in the rm shared
case, which right now can communicate priotracker structure between
lc_unlock()/lc_lock().

Suggested by: jhb
Reviewed by: jhb
Approved by: re (delphij)


# b5fb43e5 25-Jun-2013 John Baldwin <jhb@FreeBSD.org>

A few mostly cosmetic nits to aid in debugging:
- Call lock_init() first before setting any lock_object fields in
lock init routines. This way if the machine panics due to a duplicate
init the lock's original state is preserved.
- Somewhat similarly, don't decrement td_locks and td_slocks until after
an unlock operation has completed successfully.


# cd2fe4e6 22-Dec-2012 Attilio Rao <attilio@FreeBSD.org>

Fixup r240424: On entering KDB backends, the hijacked thread to run
interrupt context can still be idlethread. At that point, without the
panic condition, it can still happen that idlethread then will try to
acquire some locks to carry on some operations.

Skip the idlethread check on block/sleep lock operations when KDB is
active.

Reported by: jh
Tested by: jh
MFC after: 1 week


# 0a15e5d3 13-Sep-2012 Attilio Rao <attilio@FreeBSD.org>

Remove all the checks on curthread != NULL with the exception of some MD
trap checks (eg. printtrap()).

Generally this check is not needed anymore, as there is not a legitimate
case where curthread != NULL, after pcpu 0 area has been properly
initialized.

Reviewed by: bde, jhb
MFC after: 1 week


# e3ae0dfe 12-Sep-2012 Attilio Rao <attilio@FreeBSD.org>

Improve check coverage about idle threads.

Idle threads are not allowed to acquire any lock but spinlocks.
Deny any attempt to do so by panicing at the locking operation
when INVARIANTS is on. Then, remove the check on blocking on a
turnstile.
The check in sleepqueues is left because they are not allowed to use
tsleep() either which could happen still.

Reviewed by: bde, jhb, kib
MFC after: 1 week


# f5f9340b 28-Mar-2012 Fabien Thomas <fabient@FreeBSD.org>

Add software PMC support.

New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after: 1 month


# 7a7ce668 12-Dec-2011 Andriy Gapon <avg@FreeBSD.org>

put sys/systm.h at its proper place or add it if missing

Reported by: lstewart, tinderbox
Pointyhat to: avg, attilio
MFC after: 1 week
MFC with: r228430


# 35370593 11-Dec-2011 Andriy Gapon <avg@FreeBSD.org>

panic: add a switch and infrastructure for stopping other CPUs in SMP case

Historical behavior of letting other CPUs merily go on is a default for
time being. The new behavior can be switched on via
kern.stop_scheduler_on_panic tunable and sysctl.

Stopping of the CPUs has (at least) the following benefits:
- more of the system state at panic time is preserved intact
- threads and interrupts do not interfere with dumping of the system
state

Only one thread runs uninterrupted after panic if stop_scheduler_on_panic
is set. That thread might call code that is also used in normal context
and that code might use locks to prevent concurrent execution of certain
parts. Those locks might be held by the stopped threads and would never
be released. To work around this issue, it was decided that instead of
explicit checks for panic context, we would rather put those checks
inside the locking primitives.

This change has substantial portions written and re-written by attilio
and kib at various times. Other changes are heavily based on the ideas
and patches submitted by jhb and mdf. bde has provided many insights
into the details and history of the current code.

The new behavior may cause problems for systems that use a USB keyboard
for interfacing with system console. This is because of some unusual
locking patterns in the ukbd code which have to be used because on one
hand ukbd is below syscons, but on the other hand it has to interface
with other usb code that uses regular mutexes/Giant for its concurrency
protection. Dumping to USB-connected disks may also be affected.

PR: amd64/139614 (at least)
In cooperation with: attilio, jhb, kib, mdf
Discussed with: arch@, bde
Tested by: Eugene Grosbein <eugen@grosbein.net>,
gnn,
Steven Hartland <killing@multiplay.co.uk>,
glebius,
Andrew Boyer <aboyer@averesystems.com>
(various versions of the patch)
MFC after: 3 months (or never)


# 9fde98bb 20-Nov-2011 Attilio Rao <attilio@FreeBSD.org>

Introduce the same mutex-wise fix in r227758 for sx locks.

The functions that offer file and line specifications are:
- sx_assert_
- sx_downgrade_
- sx_slock_
- sx_slock_sig_
- sx_sunlock_
- sx_try_slock_
- sx_try_xlock_
- sx_try_upgrade_
- sx_unlock_
- sx_xlock_
- sx_xlock_sig_
- sx_xunlock_

Now vm_map locking is fully converted and can avoid to know specifics
about locking procedures.
Reviewed by: kib
MFC after: 1 month


# d576deed 16-Nov-2011 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Constify arguments for locking KPIs where possible.

This enables locking consumers to pass their own structures around as const and
be able to assert locks embedded into those structures.

Reviewed by: ed, kib, jhb


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# e4cd31dd 21-Mar-2011 Jeff Roberson <jeff@FreeBSD.org>

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# fbbb13f9 12-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the kernel changes.


# 58ccf5b4 11-Jan-2011 John Baldwin <jhb@FreeBSD.org>

Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by: bde


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# a2e21392 10-Jun-2010 John Baldwin <jhb@FreeBSD.org>

MFC 208912:
Fix a sign bug that caused adaptive spinning in sx_xlock() to not work
properly.

Approved by: re (bz)


# 8545538b 08-Jun-2010 John Baldwin <jhb@FreeBSD.org>

Fix a sign bug that caused adaptive spinning in sx_xlock() to not work
properly. Among other things it did not drop Giant while spinning
leading to livelocks.

Reviewed by: rookie, kib, jmallett
MFC after: 3 days


# 702748e9 18-Jan-2010 Attilio Rao <attilio@FreeBSD.org>

MFC r200447,201703,201709-201710:
In current code, threads performing an interruptible sleep
will leave the waiters flag on forcing the owner to do a wakeup even
when the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" queue while the other queue has real waiters on it,
because nobody is going to wakeup the 2nd queue waiters and they will
sleep indefinitively.
A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on.

Add a sleepqueue interface which does report the actual number of waiters
on a specified queue of a waitchannel and track if at least one sleepfail
waiter is present or not. In presence of this or empty "preferred" queue,
wakeup both waiters queues.

Discussed with: kib
Tested by: Pete French <petefrench at ticketswitch dot com>,
Justin Head <justin at encarnate dot com>


# 2028867d 12-Dec-2009 Attilio Rao <attilio@FreeBSD.org>

In current code, threads performing an interruptible sleep (on both
sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will
leave the waiters flag on forcing the owner to do a wakeup even when if
the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" (based on the wakeup algorithm) queue while the other
queue has real waiters on it, because nobody is going to wakeup the 2nd
queue waiters and they will sleep indefinitively.

A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue
is not empty, the waiters won't progress after being awake but they will
just fail, still not taking care of the 2nd queue waiters (as instead the
lock owned doing the wakeup would expect).

In order to fix this bug in a cheap way (without adding too much locking
and complicating too much the semantic) add a sleepqueue interface which
does report the actual number of waiters on a specified queue of a
waitchannel (sleepq_sleepcnt()) and use it in order to determine if the
exclusive waiters (or shared waiters) are actually present on the lockmgr
(or sx) before to give them precedence in the wakeup algorithm.
This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to
cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters
a lockmgr has and if all the waiters on the exclusive waiters queue are
LK_SLEEPFAIL just wake both queues.

The sleepq_sleepcnt() introduction and ABI breakage require
__FreeBSD_version bumping.

Reported by: avg, kib, pho
Reviewed by: kib
Tested by: pho


# 3f4609ac 12-Oct-2009 Attilio Rao <attilio@FreeBSD.org>

MFC r197643, r197735:
When releasing a read/shared lock we need to use a write memory barrier
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.

Approved by: re (kib)


# ddce63ca 30-Sep-2009 Attilio Rao <attilio@FreeBSD.org>

When releasing a read/shared lock we need to use a write memory barrier
in order to avoid, on architectures which doesn't have strong ordered
writes, CPU instructions reordering.

Diagnosed by: fabio
Reviewed by: jhb
Tested by: Giovanni Trematerra
<giovanni dot trematerra at gmail dot com>


# c90c9ddd 09-Sep-2009 Attilio Rao <attilio@FreeBSD.org>

Adaptive spinning for locking primitives, in read-mode, have some tuning
SYSCTLs which are inappropriate for a daily use of the machine (mostly
useful only by a developer which wants to run benchmarks on it).
Remove them before the release as long as we do not want to ship with
them in.

Now that the SYSCTLs are gone, instead than use static storage for some
constants, use real numeric constants in order to avoid eventual compiler
dumbiness and the risk to share a storage (and then a cache-line) among
CPUs when doing adaptive spinning together.

Pleasse note that the sys/linker_set.h inclusion in lockmgr and sx lock
support could have been gone, but re@ preferred them to be in order to
minimize the risk of problems on future merging.

Please note that this patch is not a MFC, but an 'edge case' as commit
directly to stable/8, which creates a diverging from HEAD.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Approved by: re (kib)


# db0c92ce 09-Sep-2009 Attilio Rao <attilio@FreeBSD.org>

MFC r196772:
fix adaptive spinning in lockmgr by using correctly GIANT_RESTORE and
continue statement and improve adaptive spinning for sx lock by just
doing once GIANT_SAVE.

Approved by: re (kib)


# 8d3635c4 02-Sep-2009 Attilio Rao <attilio@FreeBSD.org>

Fix some bugs related to adaptive spinning:

In the lockmgr support:
- GIANT_RESTORE() is just called when the sleep finishes, so the current
code can ends up into a giant unlock problem. Fix it by appropriately
call GIANT_RESTORE() when needed. Note that this is not exactly ideal
because for any interation of the adaptive spinning we drop and restore
Giant, but the overhead should be not a factor.
- In the lock held in exclusive mode case, after the adaptive spinning is
brought to completition, we should just retry to acquire the lock
instead to fallthrough. Fix that.
- Fix a style nit

In the sx support:
- Call GIANT_SAVE() before than looping. This saves some overhead because
in the current code GIANT_SAVE() is called several times.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 7e2d0af9 17-Aug-2009 Attilio Rao <attilio@FreeBSD.org>

MFC r196334:

* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to
a pointer-fetching specific operation check. Consequently, rename the
operation ASSERT_ATOMIC_LOAD_PTR().
* Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking
directly alignment on the word boundry, for all the given specific
architectures. That's a bit too strict for some common case, but it
assures safety.
* Add a comment explaining the scope of the macro
* Add a new stub in the lockmgr specific implementation

Tested by: marcel (initial version), marius
Reviewed by: rwatson, jhb (comment specific review)
Approved by: re (kib)


# 353998ac 17-Aug-2009 Attilio Rao <attilio@FreeBSD.org>

* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to
a pointer-fetching specific operation check. Consequently, rename the
operation ASSERT_ATOMIC_LOAD_PTR().
* Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking
directly alignment on the word boundry, for all the given specific
architectures. That's a bit too strict for some common case, but it
assures safety.
* Add a comment explaining the scope of the macro
* Add a new stub in the lockmgr specific implementation

Tested by: marcel (initial version), marius
Reviewed by: rwatson, jhb (comment specific review)
Approved by: re (kib)


# 21845eba 14-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r196226:

Add a new macro to test that a variable could be loaded atomically.
Check that the given variable is at most uintptr_t in size and that
it is aligned.

Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate
alignment -- however, the function of ALIGN() is to guarantee
alignment, and therefore may lead to stronger alignment
enforcement than necessary for types that are smaller than
sizeof(uintptr_t).

Add checks to mtx, rw and sx locks init functions to detect possible
breakage. This was used during debugging of the problem fixed with
r196118 where a pointer was on an un-aligned address in the dpcpu area.

In collaboration with: rwatson
Reviewed by: rwatson

Approved by: re (kib)


# 8d518523 14-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Add a new macro to test that a variable could be loaded atomically.
Check that the given variable is at most uintptr_t in size and that
it is aligned.

Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate
alignment -- however, the function of ALIGN() is to guarantee
alignment, and therefore may lead to stronger alignment
enforcement than necessary for types that are smaller than
sizeof(uintptr_t).

Add checks to mtx, rw and sx locks init functions to detect possible
breakage. This was used during debugging of the problem fixed with
r196118 where a pointer was on an un-aligned address in the dpcpu area.

In collaboration with: rwatson
Reviewed by: rwatson
Approved by: re (kib)


# f0830182 02-Jun-2009 Attilio Rao <attilio@FreeBSD.org>

Handle lock recursion differenty by always checking against LO_RECURSABLE
instead the lock own flag itself.

Tested by: pho


# e31d0833 29-May-2009 Attilio Rao <attilio@FreeBSD.org>

The patch for r193011 was partially rejected when applied, complete it.


# 1ae1c2a3 28-May-2009 Attilio Rao <attilio@FreeBSD.org>

Reverse the logic for ADAPTIVE_SX option and enable it by default.
Introduce for this operation the reverse NO_ADAPTIVE_SX option.
The flag SX_ADAPTIVESPIN to be passed to sx_init_flags(9) gets suppressed
and the new flag, offering the reversed logic, SX_NOADAPTIVE is added.

Additively implements adaptive spininning for sx held in shared mode.
The spinning limit can be handled through sysctls in order to be tuned
while the code doesn't reach the release, after which time they should
be dropped probabilly.

This change has made been necessary by recent benchmarks where it does
improve concurrency of workloads in presence of high contention
(ie. ZFS).

KPI breakage is documented by __FreeBSD_version bumping, manpage and
UPDATING updates.

Requested by: jeff, kmacy
Reviewed by: jeff
Tested by: pho


# a5aedd68 26-May-2009 Stacey Son <sson@FreeBSD.org>

Add the OpenSolaris dtrace lockstat provider. The lockstat provider
adds probes for mutexes, reader/writer and shared/exclusive locks to
gather contention statistics and other locking information for
dtrace scripts, the lockstat(1M) command and other potential
consumers.

Reviewed by: attilio jhb jb
Approved by: gnn (mentor)


# 1723a064 15-Mar-2009 Jeff Roberson <jeff@FreeBSD.org>

- Wrap lock profiling state variables in #ifdef LOCK_PROFILING blocks.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 41313430 10-Sep-2008 John Baldwin <jhb@FreeBSD.org>

Teach WITNESS about the interlocks used with lockmgr. This removes a bunch
of spurious witness warnings since lockmgr grew witness support. Before
this, every time you passed an interlock to a lockmgr lock WITNESS treated
it as a LOR.

Reviewed by: attilio


# da7bbd2c 05-Aug-2008 John Baldwin <jhb@FreeBSD.org>

If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup(). When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup. However, this required
grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked. The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up. The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with: jeff
Glanced at by: sam
Tested by: Jurgen Weber jurgen - ish com au
MFC after: 2 weeks


# 90356491 15-May-2008 Attilio Rao <attilio@FreeBSD.org>

- Embed the recursion counter for any locking primitive directly in the
lock_object, using an unified field called lo_data.
- Replace lo_type usage with the w_name usage and at init time pass the
lock "type" directly to witness_init() from the parent lock init
function. Handle delayed initialization before than
witness_initialize() is called through the witness_pendhelp structure.
- Axe out LO_ENROLLPEND as it is not really needed. The case where the
mutex init delayed wants to be destroyed can't happen because
witness_destroy() checks for witness_cold and panic in case.
- In enroll(), if we cannot allocate a new object from the freelist,
notify that to userspace through a printf().
- Modify the depart function in order to return nothing as in the current
CVS version it always returns true and adjust callers accordingly.
- Fix the witness_addgraph() argument name prototype.
- Remove unuseful code from itismychild().

This commit leads to a shrinked struct lock_object and so smaller locks,
in particular on amd64 where 2 uintptr_t (16 bytes per-primitive) are
gained.

Reviewed by: jhb


# c5aa6b58 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Pass the priority argument from *sleep() into sleepq and down into
sched_sleep(). This removes extra thread_lock() acquisition and
allows the scheduler to decide what to do with the static boost.
- Change the priority arguments to cv_* to match sleepq/msleep/etc.
where 0 means no priority change. Catch -1 in cv_broadcastpri() and
convert it to 0 for now.
- Set a flag when sleeping in a way that is compatible with swapping
since direct priority comparisons are meaningless now.
- Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
controls the boost behavior. Turning it off gives better performance
in some workloads but needs more investigation.
- While we're modifying sleepq, change signal and broadcast to both
return with the lock held as the lock was held on enter.

Reviewed by: jhb, peter


# eea4f254 15-Dec-2007 Jeff Roberson <jeff@FreeBSD.org>

- Re-implement lock profiling in such a way that it no longer breaks
the ABI when enabled. There is no longer an embedded lock_profile_object
in each lock. Instead a list of lock_profile_objects is kept per-thread
for each lock it may own. The cnt_hold statistic is now always 0 to
facilitate this.
- Support shared locking by tracking individual lock instances and
statistics in the per-thread per-instance lock_profile_object.
- Make the lock profiling hash table a per-cpu singly linked list with a
per-cpu static lock_prof allocator. This removes the need for an array
of spinlocks and reduces cache contention between cores.
- Use a seperate hash for spinlocks and other locks so that only a
critical_enter() is required and not a spinlock_enter() to modify the
per-cpu tables.
- Count time spent spinning in the lock statistics.
- Remove the LOCK_PROFILE_SHARED option as it is always supported now.
- Specifically drop and release the scheduler locks in both schedulers
since we track owners now.

In collaboration with: Kip Macy
Sponsored by: Nokia


# f9721b43 18-Nov-2007 Attilio Rao <attilio@FreeBSD.org>

Expand lock class with the "virtual" function lc_assert which will offer
an unified way for all the lock primitives to express lock assertions.
Currenty, lockmgrs and rmlocks don't have assertions, so just panic in
that case.
This will be a base for more callout improvements.

Ok'ed by: jhb, jeff


# 431f8906 13-Nov-2007 Julian Elischer <julian@FreeBSD.org>

generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.


# 764a938b 02-Oct-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Fix sx_try_slock(), so it only fails when there is an exclusive owner.
Before that fix, it was possible for the function to fail if number
of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr()
call.

This fixes ZFS problem when ZFS returns strange EIO errors under load.
In ZFS there is a code that depends on the fact that sx_try_slock() can
only fail if there is an exclusive owner.

Discussed with: attilio
Reviewed by: jhb
Approved by: re (kensmith)


# c1a6d9fa 06-Jul-2007 Attilio Rao <attilio@FreeBSD.org>

Fix some problems with lock_profiling in sx locks:
- Adjust lock_profiling stubs semantic in the hard functions in order to be
more accurate and trustable
- Disable shared paths for lock_profiling. Actually, lock_profiling has a
subtle race which makes results caming from shared paths not completely
trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for
re-enabling this paths, but is currently intended for developing use only.
- Use homogeneous names for automatic variables in hard functions regarding
lock_profiling
- Style fixes
- Add a CTASSERT for some flags building

Discussed with: kmacy, kris
Approved by: jeff (mentor)
Approved by: re


# f9819486 31-May-2007 Attilio Rao <attilio@FreeBSD.org>

Add functions sx_xlock_sig() and sx_slock_sig().
These functions are intended to do the same actions of sx_xlock() and
sx_slock() but with the difference to perform an interruptible sleep, so
that sleep can be interrupted by external events.
In order to support these new featueres, some code renstruction is needed,
but external API won't be affected at all.

Note: use "void" cast for "int" returning functions in order to avoid tools
like Coverity prevents to whine.

Requested by: rwatson
Tested by: rwatson
Reviewed by: jhb
Approved by: jeff (mentor)


# 2c7289cb 29-May-2007 Attilio Rao <attilio@FreeBSD.org>

style(9) fixes for sx locks.

Approved by: jeff (mentor)


# acf840c4 29-May-2007 Attilio Rao <attilio@FreeBSD.org>

Add a small fix for lock profiling in sx locks.
"0" cannot be a correct value since when the function is entered at least
one shared holder must be present and since we want the last one "1" is
the correct value.
Note that lock_profiling for sx locks is far from being perfect.
Expect further fixes for that.

Approved by: jeff (mentor)


# 7ec137e5 19-May-2007 John Baldwin <jhb@FreeBSD.org>

Rename the macros for assertion flags passed to sx_assert() from SX_* to
SA_* to match mutexes and rwlocks. The old flags still exist for
backwards compatiblity.

Requested by: attilio


# 71a95881 19-May-2007 John Baldwin <jhb@FreeBSD.org>

Expose sx_xholder() as a public macro. It returns a pointer to the thread
that holds the current exclusive lock, or NULL if no thread holds an
exclusive lock.

Requested by: pjd


# 46a8b9cb 19-May-2007 John Baldwin <jhb@FreeBSD.org>

Oops, didn't include SX_ADAPTIVESPIN in the list of valid flags for the
assert in sx_init_flags().

Submitted by: attilio


# b0d67325 19-May-2007 John Baldwin <jhb@FreeBSD.org>

Add a new SX_RECURSE flag to make support for recursive exclusive locks
conditional. By default, sx(9) locks are back to not supporting recursive
exclusive locks.

Submitted by: attilio


# da7d0d1e 18-May-2007 John Baldwin <jhb@FreeBSD.org>

Fix a comment.


# c91fcee7 18-May-2007 John Baldwin <jhb@FreeBSD.org>

Move lock_profile_object_{init,destroy}() into lock_{init,destroy}().


# 0026c92c 08-May-2007 John Baldwin <jhb@FreeBSD.org>

Add destroyed cookie values for sx locks and rwlocks as well as extra
KASSERTs so that any lock operations on a destroyed lock will panic or
hang.


# 59a31e6a 03-Apr-2007 Kip Macy <kmacy@FreeBSD.org>

fix typo


# e2bc1066 03-Apr-2007 Kip Macy <kmacy@FreeBSD.org>

style fixes and make sure that the lock is treated as released in the sharers == 0 case
not that this is somewhat racy because a new sharer can come in while we're updating stats


# afc0bfbd 03-Apr-2007 Kip Macy <kmacy@FreeBSD.org>

Fixes to sx for newsx - fix recursed case and move out of inline

Submitted by: Attilio Rao <attilio@freebsd.org>


# 4e7f640d 31-Mar-2007 John Baldwin <jhb@FreeBSD.org>

Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd


# aa89d8cd 21-Mar-2007 John Baldwin <jhb@FreeBSD.org>

Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.


# 6e21afd4 09-Mar-2007 John Baldwin <jhb@FreeBSD.org>

Add two new function pointers 'lc_lock' and 'lc_unlock' to lock classes.
These functions are intended to be used to drop a lock and then reacquire
it when doing an sleep such as msleep(9). Both functions accept a
'struct lock_object *' as their first parameter. The 'lc_unlock' function
returns an integer that is then passed as the second paramter to the
subsequent 'lc_lock' function. This can be used to communicate state.
For example, sx locks and rwlocks use this to indicate if the lock was
share/read locked vs exclusive/write locked.

Currently, spin mutexes and lockmgr locks do not provide working lc_lock
and lc_unlock functions.


# ae8dde30 09-Mar-2007 John Baldwin <jhb@FreeBSD.org>

Use C99-style struct member initialization for lock classes.


# c66d7606 02-Mar-2007 Kip Macy <kmacy@FreeBSD.org>

lock stats updates need to be protected by the lock


# a5bceb77 01-Mar-2007 Kip Macy <kmacy@FreeBSD.org>

Evidently I've overestimated gcc's ability to peak inside inline functions
and optimize away unused stack values. The 48 bytes that the lock_profile_object
adds to the stack evidently has a measurable performance impact on certain workloads.


# f183910b 26-Feb-2007 Kip Macy <kmacy@FreeBSD.org>

Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
- Move updates to the lock hash to after the lock is released for spin mutexes,
sleep mutexes, and sx locks
- Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
statistics when an acquisition is contended. This reduces the overhead of
LOCK_PROFILING to increasing system time by 20%-25% which on
"make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
of wall-clock time. Contrast this to enabling lock profiling without
LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.


# fe68a916 26-Feb-2007 Kip Macy <kmacy@FreeBSD.org>

general LOCK_PROFILING cleanup

- only collect timestamps when a lock is contested - this reduces the overhead
of collecting profiles from 20x to 5x

- remove unused function from subr_lock.c

- generalize cnt_hold and cnt_lock statistics to be kept for all locks

- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
someone familiar with that should review


# 61bd5e21 12-Nov-2006 Kip Macy <kmacy@FreeBSD.org>

track lock class name in a way that doesn't break WITNESS


# 7c0435b9 10-Nov-2006 Kip Macy <kmacy@FreeBSD.org>

MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile
wait (time waited to acquire) and hold times for *all* kernel locks. If
the architecture has a system synchronized TSC, the profiling code will
use that - thereby minimizing profiling overhead. Large chunks of profiling
code have been moved out of line, the overhead measured on the T1 for when
it is compiled in but not enabled is < 1%.

Approved by: scottl (standing in for mentor rwatson)
Reviewed by: des and jhb


# 462a7add 15-Aug-2006 John Baldwin <jhb@FreeBSD.org>

Add a new 'show sleepchain' ddb command similar to 'show lockchain' except
that it operates on lockmgr and sx locks. This can be useful for tracking
down vnode deadlocks in VFS for example. Note that this command is a bit
more fragile than 'show lockchain' as we have to poke around at the
wait channel of a thread to see if it points to either a struct lock or
a condition variable inside of a struct sx. If td_wchan points to
something unmapped, then this command will terminate early due to a fault,
but no harm will be done.


# 764e4d54 27-Jul-2006 John Baldwin <jhb@FreeBSD.org>

Adjust td_locks for non-spin mutexes, rwlocks, and sx locks so that it is
a count of all non-spin locks, not just lockmgr locks. This can give us a
much cheaper way to see if we have any locks held (such as when returning
to userland via userret()) without requiring WITNESS.

MFC after: 1 week


# 83a81bcb 17-Jan-2006 John Baldwin <jhb@FreeBSD.org>

Add a new file (kern/subr_lock.c) for holding code related to struct
lock_obj objects:
- Add new lock_init() and lock_destroy() functions to setup and teardown
lock_object objects including KTR logging and registering with WITNESS.
- Move all the handling of LO_INITIALIZED out of witness and the various
lock init functions into lock_init() and lock_destroy().
- Remove the constants for static indices into the lock_classes[] array
and change the code outside of subr_lock.c to use LOCK_CLASS to compare
against a known lock class.
- Move the 'show lock' ddb function and lock_classes[] array out of
kern_mutex.c over to subr_lock.c.


# 3c6decc3 06-Jan-2006 John Baldwin <jhb@FreeBSD.org>

Trim another pointer from struct lock_object (and thus from struct mtx and
struct sx). Instead of storing a direct pointer to a our lock_class
struct in lock_object, reserve 4 bits in the lo_flags field to serve as an
index into a global lock_classes array that contains pointers to the lock
classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR
logging need to access the lock_class member, so this shouldn't add any
overhead to production kernels. It might add some slight overhead to
kernels using those debug options however.

As with the previous set of changes to lock_object, this is going to
completely obliterate the kernel ABI, so be sure to recompile all your
modules.


# d272fe53 13-Dec-2005 John Baldwin <jhb@FreeBSD.org>

Add a new 'show lock' command to ddb. If the argument has a valid lock
class, then it displays various information about the lock and calls a
new function pointer in lock_class (lc_ddb_show) to dump class-specific
information about the lock as well (such as the owner of a mutex or
xlock'ed sx lock). This is easier than staring at hex dumps of locks to
figure out who owns the lock, etc. Note that extending lock_class doesn't
affect the ABI for any kernel modules as the only code that deals with
lock_class structures directly is kern_mutex.c, kern_sx.c, and witness.

MFC after: 1 week


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 03129ba9 27-Feb-2004 John Baldwin <jhb@FreeBSD.org>

Fix _sx_assert() to panic() rather than printf() when an assertion fails
and ignore assertions if we have already paniced.


# f6739b1d 19-Feb-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Simplify check. We are only able to check exclusive lock and if
2nd condition is true, first one is true for sure.

Approved by: jhb, scottl (mentor)


# 19b0efd3 04-Feb-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Allow assert that the current thread does not hold the sx(9) lock.

Reviewed by: jhb
In cooperation with: juli, jhb
Approved by: jhb, scottl (mentor)


# 8d768e76 28-Jan-2004 John Baldwin <jhb@FreeBSD.org>

Rework witness_lock() to make it slightly more useful and flexible.
- witness_lock() is split into two pieces: witness_checkorder() and
witness_lock(). Witness_checkorder() determines if acquiring a specified
lock at the time it is called would result in a lock order. It
optionally adds a new lock order relationship as well. witness_lock()
updates witness's data structures to assume that a lock has been acquired
by stick a new lock instance in the appropriate lock instance list.
- The mutex and sx lock functions now call checkorder() prior to trying to
acquire a lock and continue to call witness_lock() after the acquire is
completed. This will let witness catch a deadlock before it happens
rather than trying to do so after the threads have deadlocked (i.e. never
actually report it).
- A new function witness_defineorder() has been added that adds a lock
order between two locks at runtime without having to acquire the locks.
If the lock order cannot be added it will return an error. This function
is available to programmers via the WITNESS_DEFINEORDER() macro which
accepts either two mutexes or two sx locks as its arguments.
- A few simple wrapper macros were added to allow developers to call
witness_checkorder() anywhere as a way of enforcing locking assertions
in code that might acquire a certain lock in some situations. The
macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an
appropriate lock as the sole argument.
- The code to remove a lock instance from a lock list in witness_unlock()
was unnested by using a goto to vastly improve the readability of this
function.


# 857d9c60 12-Jul-2003 Don Lewis <truckman@FreeBSD.org>

Extend the mutex pool implementation to permit the creation and use of
multiple mutex pools with different options and sizes. Mutex pools can
be created with either the default sleep mutexes or with spin mutexes.
A dynamically created mutex pool can now be destroyed if it is no longer
needed.

Create two pools by default, one that matches the existing pool that
uses the MTX_NOWITNESS option that should be used for building higher
level locks, and a new pool with witness checking enabled.

Modify the users of the existing mutex pool to use the appropriate pool
in the new implementation.

Reviewed by: jhb


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 9939f0f1 04-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Set the lock type equal to the lock name for now as all of the current
sx locks don't use very specific lock names.


# c27b5699 02-Apr-2002 Andrew R. Reiter <arr@FreeBSD.org>

- Add MTX_SYSINIT and SX_SYSINIT as macro glue for allowing sx and mtx
locks to be able to setup a SYSINIT call. This helps in places where
a lock is needed to protect some data, but the data is not truly
associated with a subsystem that can properly initialize it's lock.
The macros use the mtx_sysinit() and sx_sysinit() functions,
respectively, as the handler argument to SYSINIT().

Reviewed by: alfred, jhb, smp@


# 98bf25aa 18-Jan-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Invert the test of sx_xholder for SX_LOCKED. We need to warn if a
thread other than the curthread holds an sx.

While I am here, break a line at the end of warning.


# a48740b6 09-Dec-2001 David E. O'Brien <obrien@FreeBSD.org>

Update to C99, s/__FUNCTION__/__func__/.


# f2860039 13-Nov-2001 Matthew Dillon <dillon@FreeBSD.org>

Create a mutex pool API for short term leaf mutexes.
Replace the manual mutex pool in kern_lock.c (lockmgr locks) with the new API.
Replace the mutexes embedded in sxlocks with the new API.


# 781a35df 24-Oct-2001 John Baldwin <jhb@FreeBSD.org>

Fix this to actually compile in the !INVARIANTS case.

Reported by: Maxime Henrion <mux@qualys.com>


# 4e5e677b 23-Oct-2001 John Baldwin <jhb@FreeBSD.org>

Change the sx(9) assertion API to use a sx_assert() function similar to
mtx_assert(9) rather than several SX_ASSERT_* macros.


# 7ada5876 19-Oct-2001 John Baldwin <jhb@FreeBSD.org>

The mtx_init() and sx_init() functions bzero'd locks before handing them
off to witness_init() making the check for double intializating a lock by
testing the LO_INITIALIZED flag moot. Workaround this by checking the
LO_INITIALIZED flag ourself before we bzero the lock structure.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# b0b7cb50 23-Aug-2001 John Baldwin <jhb@FreeBSD.org>

Use witness_upgrade/downgrade for sx_try_upgrade/downgrade.


# e2870579 23-Aug-2001 John Baldwin <jhb@FreeBSD.org>

Clear the sx_xholder pointer when downgrading an exclusive lock.


# d55229b7 13-Aug-2001 Jason Evans <jasone@FreeBSD.org>

Add sx_try_upgrade() and sx_downgrade().

Submitted by: Alexander Kabaev <ak03@gte.com>


# 5f36700a 27-Jun-2001 John Baldwin <jhb@FreeBSD.org>

- Add trylock variants of shared and exclusive locks.
- The sx assertions don't actually need the internal sx mutex lock, so
don't bother doing so.
- Add a new assertion SX_ASSERT_LOCKED() that asserts that either a
shared or exclusive lock should be held. This assertion should be used
instead of SX_ASSERT_SLOCKED() in almost all cases.
- Adjust some KASSERT()'s to include file and line information.
- Use the new witness_assert() function in the WITNESS case for sx slock
asserts to verify that the current thread actually owns a slock.


# 2d96f0b1 04-May-2001 John Baldwin <jhb@FreeBSD.org>

- Move state about lock objects out of struct lock_object and into a new
struct lock_instance that is stored in the per-process and per-CPU lock
lists. Previously, the lock lists just kept a pointer to each lock held.
That pointer is now replaced by a lock instance which contains a pointer
to the lock object, the file and line of the last acquisition of a lock,
and various flags about a lock including its recursion count.
- If we sleep while holding a sleepable lock, then mark that lock instance
as having slept and ignore any lock order violations that occur while
acquiring Giant when we wake up with slept locks. This is ok because of
Giant's special nature.
- Allow witness to differentiate between shared and exclusive locks and
unlocks of a lock. Witness will now detect the case when a lock is
acquired first in one mode and then in another. Mutexes are always
locked and unlocked exclusively. Witness will also now detect the case
where a process attempts to unlock a shared lock while holding an
exclusive lock and vice versa.
- Fix a bug in the lock list implementation where we used the wrong
constant to detect the case where a lock list entry was full.


# 19284646 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.


# 7331c2a2 06-Mar-2001 John Baldwin <jhb@FreeBSD.org>

In order to avoid recursing on the backing mutex for sx locks in the
INVARIANTS case, define the actual KASSERT() in _SX_ASSERT_[SX]LOCKED
macros that are used in the sx code itself and convert the
SX_ASSERT_[SX]LOCKED macros to simple wrappers that grab the mutex for the
duration of the check.


# af761449 05-Mar-2001 Bosko Milekic <bmilekic@FreeBSD.org>

- Add sx_descr description member to sx lock structure
- Add sx_xholder member to sx struct which is used for INVARIANTS-enabled
assertions. It indicates the thread that presently owns the xlock.
- Add some assertions to the sx lock code that will detect the fatal
API abuse:
xlock --> xlock
xlock --> slock
which now works thanks to sx_xholder.
Notice that the remaining two problematic cases:
slock --> xlock
slock --> slock (a little less problematic, but still recursion)
will need to be handled by witness eventually, as they are more
involved.

Reviewed by: jhb, jake, jasone


# 6281b30a 05-Mar-2001 Jason Evans <jasone@FreeBSD.org>

Implement shared/exclusive locks.

Reviewed by: bmilekic, jake, jhb