History log of /freebsd-current/sys/kern/kern_umtx.c
Revision Date Author Comments
# 6bb132ba 15-Apr-2024 Brooks Davis <brooks@FreeBSD.org>

Reduce reliance on sys/sysproto.h pollution

Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.

sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44465


# 39e4665c 22-Feb-2024 Olivier Certner <olce@FreeBSD.org>

PP mutexes: lock: Reduce 'umtx_lock' holding before taking the user lock

There is no need to have it for the priority check (that the thread
doesn't have a higher priority than the mutex's ceiling), and there's
also no need to take it if the thread doesn't have privileges to set its
priority to the mutex's ceiling.

While here, turn 'su' into a 'bool' and compute the internal priority
corresponding to the mutex's ceiling once and for all, putting it in new
'new_pri'.

Reviewed by: kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D44045


# 9ac3ac9e 22-Feb-2024 Olivier Certner <olce@FreeBSD.org>

PP mutexes: lock: Check if priority is too high against base one

Doing this instead of using the current (user) priority, which includes
current lendings, prevents gratuitous failures for threads involved in
multiple locking groups, where each group is defined as the threads that
can lock a particular PP or PI mutex. No deadlock can occur in this
case. Indeed, if a thread holds such a lock A giving it a higher
priority than the ceiling of some other lock B that is PP, and B is
acquired by another thread, effectively the latter may not be able to
run but this situation can only last until the first thread releases A,
which it will do eventually.

Reviewed by: kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D44044


# 1df8700a 20-Feb-2024 Olivier Certner <olce@FreeBSD.org>

PP mutexes: unlock: Reset inherited prio regardless of privileges

'uq_inherited_pri' contains the current priority inherited from Priority
Protection mutexes. If -1 is passed through 'm_ceilings[1]', meaning
that there are no such mutexes held anymore, this must be reflected into
it by setting it to PRI_MAX, regardless of whether the thread has
privilege to set realtime priorities (PRI_MAX is also obviously not
a realtime priority level). By contrast, it shall not be updated and
the computed 'new_inherited_pri' shall stay unused if the thread doesn't
have the ability to set a realtime priority, possibly keeping an older
such priority acquired previously.

Reviewed by: kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43984


# 3379d9b5 25-Dec-2023 Mark Johnston <markj@FreeBSD.org>

umtx: Check for errors from suword32()

This is in preparation for annotating copyin() and related functions
with __result_use_check.

Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D43144


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 93ca6ff2 15-Apr-2023 Konstantin Belousov <kib@FreeBSD.org>

umtx: allow to configure minimal timeout (in nanoseconds)

PR: 270785
Reviewed by: markj, mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39584


# 5657f49e 20-Jan-2023 Konstantin Belousov <kib@FreeBSD.org>

kern_umtx.c do_wait(): correct confusing indent

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 0def80f1 03-Oct-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

time(3): Align fast clock times to avoid firing multiple timers.

In non-periodic mode absolute timers fire at exactly the time given.
When specifying a fast clock, align the firing time so that less
timer interrupt events are needed.

Reviewed by: rrs @
Differential Revision: https://reviews.freebsd.org/D36858
MFC after: 1 week
Sponsored by: NVIDIA Networking


# 768f6373 26-Aug-2022 firk <firk@cantconnect.ru>

Fix compat10 semaphore interface race

Wrong has-waiters and missing unconditional _count==0 check may cause
infinite waiting with already non-zero count.
1) properly clear _has_waiters flag when waiting failed to start
2) always check _count before start waiting

PR: 265997
Reviewed by: kib
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36272


# 630f633f 14-Jun-2022 Mark Johnston <markj@FreeBSD.org>

vm_object: Use the vm_object_(set|clear)_flag() helpers

... rather than setting and clearing flags inline. No functional change
intended.

Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35469


# 31d1b816 28-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

sysent: Get rid of bogus sys/sysent.h include.

Where appropriate hide sysent.h under proper condition.

MFC after: 2 weeks


# 11a6ecd4 09-May-2022 Andrew Turner <andrew@FreeBSD.org>

Handle cas failure when the compare succeeds

When locking a priority inherit mutex we perform a compare and swap
operation to try and acquire the mutex. This may fail even when the
compare succeeds.

Check and handle this case.

PR: 263825
Reviewed by: kib, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35150


# 91e7bdcd 25-Apr-2022 Dmitry Chagin <dchagin@FreeBSD.org>

Add timespecvalid_interval macro and use it.

Reviewed by: jhb, imp (early rev)
Differential revision: https://reviews.freebsd.org/D34848
MFC after: 2 weeks


# fd6ca665 21-Mar-2022 Alexander Motin <mav@FreeBSD.org>

Fix umtxq_sleep() regression caused by 56070dd2e4d.

umtxq_requeue() moves the queue to a different hash chain and different
lock, so we can't rely on msleep_sbt() reacquiring the same old lock.
We have to use PDROP and update the queue chain and so lock pointer.

PR: 262587
MFC after: 2 weeks


# 56070dd2 03-Mar-2022 Alexander Motin <mav@FreeBSD.org>

Improve timeout precision of pthread_cond_timedwait().

This code was not touched when all other user-space sleep functions were
switched to sbintime_t and decoupled from hardclock. When it is possible,
convert supplied times into sbinuptime to supply directly to msleep_sbt()
with C_ABSOLUTE. This provides the timeout resolution of few microseconds
instead of 2 milliseconds, plus avoids few clock reads and conversions.

Reviewed by: vangyzen
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D34163


# 151ddfec 22-Nov-2021 Brooks Davis <brooks@FreeBSD.org>

freebsd32: add _'s to _umtx_(un)lock

This aligns with the default ABI's configuration.

Reviewed by: kib


# d35a7716 17-Nov-2021 Brooks Davis <brooks@FreeBSD.org>

freebsd32: sync _umtx_op args with default ABI

Reviewed by: kevans


# bded8fa3 21-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

umtxq_requeue: remove write-only variable uh2

umtxq_queue_lookup() does not change state. It is redone inside
umtxq_insert() later, anyway.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 9e32efa7 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Split do_unlock_pi on two counterparts.

The umtx_pi_frop() will be used by Linux emulation layer.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31238
MFC after: 2 weeks


# 09f55e60 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Expose some of the pi umtx structures and API to the rest of the kernel.

Differential Revision: https://reviews.freebsd.org/D31237
MFC after: 2 weeks


# 8e4d22c0 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Add umtxq_requeue Linux emulation layer extension.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31235
MFC after: 2 weeks


# 7caa2911 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Add bitset conditional wakeup functionality.

The bitset is a Linux emulation layer extension. This 32-bit mask, in which at
least one bit must be set, is used to select which threads should be woken up.

The bitset is stored in the umtx_q structure, which is used to enqueue the waiter
into the umtx waitqueue. Put the bitset into the hole, that appeared on LP64 due
to data alignment, to prevent the growth of the struct umtx_q.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31234
MFC after: 2 weeks


# 1fdcc87c 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Expose some of the umtx structures and API to the rest of the kernel.

Differential Revision: https://reviews.freebsd.org/D31233
MFC after: 2 weeks


# 307a3dd3 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Expose struct abs_timeout to the rest of the kernel.

Add umtx_ prefix to all abs_timeout facility and add declaration for it.
For consistency with others abs_timeout mark inline abs_timeout_init2.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31249
MFC after: 2 weeks


# af29f399 28-Jul-2021 Dmitry Chagin <dchagin@FreeBSD.org>

umtx: Split umtx.h on two counterparts.

To prevent umtx.h polluting by future changes split it on two headers:
umtx.h - ABI header for userspace;
umtxvar.h - the kernel staff.

While here fix umtx_key_match style.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31248
MFC after: 2 weeks


# 9b6b793b 19-Jul-2021 Konstantin Belousov <kib@FreeBSD.org>

Revert most of ce42e793100b460f597e4c85ec0da12e274f9394

to restore ABI compatibility for pre-10.x binaries.

It restores _umtx_lock() and _umtx_unlock() syscalls, and UMTX_OP_LOCK/
UMTX_OP_UNLOCK umtx_op(2) operations. UMUTEX_ERROR_CHECK flag is left
out for now, I do not think it makes a difference.

PR: 218571
Reviewed by: brooks (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31220


# 138f78e9 22-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

umtx: convert umtxq_lock to a macro

Then LOCK_PROFILING starts reporting callers instead of the inline.


# 60e60e73 22-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

freebsd32: take the _umtx_op struct definitions back

Providing these in freebsd32.h facilitates local testing/measuring of the
structs rather than forcing one to locally recreate them. Sanity checking
offsets/sizes remains in kern_umtx.c where these are typically used.


# e0cb5b2a 21-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

[2/2] _umtx_op: introduce 32-bit/i386 flags for operations

This patch takes advantage of the consolidation that happened to provide two
flags that can be used with the native _umtx_op(2): UMTX_OP___32BIT and
UMTX_OP__I386.

UMTX_OP__32BIT iindicates that we are being provided with 32-bit structures.
Note that this flag alone indicates a 64bit time_t, since this is the
majority case.

UMTX_OP__I386 has been provided so that we can emulate i386 as well,
regardless of whether the host is amd64 or not.

Both imply a different set of copyops in sysumtx_op. freebsd32__umtx_op
simply ignores the flags, since it's already doing a 32-bit operation and
it's unlikely we'll be running an emulator under compat32. Future work
could consider it, but the author sees little benefit.

This will be used by qemu-bsd-user to pass on all _umtx_op calls to the
native interface as long as the host/target endianness matches, effectively
eliminating most if not all of the remaining unresolved deadlocks for most.

This version changed a fair amount from what was under review, mostly in
response to refactoring of the prereq reorganization and battle-testing
it with qemu-bsd-user. The main changes are as follows:

1.) The i386 flag got renamed to omit '32BIT' since this is redundant.
2.) The flags are now properly handled on 32-bit platforms to emulate other
32-bit platforms.
3.) Robust list handling was fixed, and the 32-bit functionality that was
previously gated by COMPAT_FREEBSD32 is now unconditional.
4.) Robust list handling was also improved, including the error reported
when a process has already registered 32-bit ABI lists and also
detecting if native robust lists have already been registered. Both
scenarios now return EBUSY rather than EINVAL, because the input is
technically valid but we're too busy with another ABI's lists.

libsysdecode/kdump/truss support will go into review soon-ish, along with
the associated manpage update.

Reviewed by: kib (earlier version)
MFC after: 3 weeks


# 15eaec6a 21-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

_umtx_op: move compat32 definitions back in

These are reasonably compact, and a future commit will blur the compat32
lines by supporting 32-bit operations with the native _umtx_op.


# 5a2a4551 21-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Remove unused prototype.

Missed part of r367918.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 74a093eb 21-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Stop using eventhandler to invoke umtx_exec hook.

There is no point in dynamic registration, umtx hook is there always.

Reviewed by: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27303


# 27a9392d 17-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

_umtx_op: fix robust lists after r367744

A copy-pasto left us copying in 24-bytes at the address of the rb pointer
instead of the intended target.

Reported by: sigsys@gmail.com
Sighing: kevans


# bd4bcd14 16-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

Fix !COMPAT_FREEBSD32 kernel build

One of the last shifts inadvertently moved these static assertions out of a
COMPAT_FREEBSD32 block, which the relevant definitions are limited to.

Fix it.

Pointy hat: kevans


# 63ecb272 16-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

umtx_op: reduce redundancy required for compat32

All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.

umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.

Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27222


# 4be0a1b5 16-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

_umtx_op: fix a compat32 bug in UMTX_OP_NWAKE_PRIVATE

Specifically, if we're waking up some value n > BATCH_SIZE, then the
copyin(9) is wrong on the second iteration due to upp being the wrong type.
upp is currently a uint32_t**, so upp + pos advances it by twice as many
elements as it should (host pointer size vs. compat32 pointer size).

Fix it by just making upp a uint32_t*; it's still technically a double
pointer, but the distinction doesn't matter all that much here since we're
just doing arithmetic on it.

Add a test case that demonstrates the problem, placed with the libthr tests
since one messing with _umtx_op should be running these tests. Running under
compat32, the new test case will hang as threads after the first 128 get
missed in the wake. it's not immediately clear how to hit it in practice,
since pthread_cond_broadcast() uses a smaller (sleepq batch?) size observed
to be around ~50 -- I did not spend much time digging into it.

The uintptr_t change makes no functional difference, but i've tossed it in
since it's more accurate (semantically).

Reported by: Andrew Gierth (andrew_tao173.riddles.org.uk, inspection)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27231


# 38033780 11-Nov-2020 Kyle Evans <kevans@FreeBSD.org>

umtx: drop incorrect timespec32 definition

This works for amd64, but none others -- drop it, because we already have a
proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses
time32_t.

MFC after: 1 week


# d301b358 09-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Support for userspace non-transparent superpages (largepages).

Created with shm_open2(SHM_LARGEPAGE) and then configured with
FIOSSHMLPGCNF ioctl, largepages posix shared memory objects guarantee
that all userspace mappings of it are served by superpage non-managed
mappings.

Only amd64 for now, both 2M and 1G superpages can be requested, the
later requires CPU feature.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652


# b05ca429 02-Mar-2020 Pawel Biernacki <kaktus@FreeBSD.org>

sys/: Document few more sysctls.

Submitted by: Antranig Vartanian <antranigv@freebsd.am>
Reviewed by: kaktus
Commented by: jhb
Approved by: kib (mentor)
Sponsored by: illuria security
Differential Revision: https://reviews.freebsd.org/D23759


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 3ff65f71 30-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicated empty lines from kern/*.c

No functional changes.


# 478ca4b0 02-Jan-2020 Konstantin Belousov <kib@FreeBSD.org>

Rename umtxq_check_susp() to thread_check_susp()

and make it usable outside of kern_umtx.c. To be used in several
future changes.

Discussed with: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 8f4d74eb 02-Jan-2020 Konstantin Belousov <kib@FreeBSD.org>

Style: remove trailing spaces/tabs.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 5cbdd18f 01-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Make umtxq_check_susp() to correctly handle thread exit requests.

The check for P_SINGLE_EXIT was shadowed by the (P_SHOULDSTOP || traced) check.

Reported by: bdrewery (might be)
Reviewed by: markj
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21124


# fd336e2a 31-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix handling of transient casueword(9) failures in do_sem_wait().

In particular, restart should be only done when the failure is
transient. For this, recheck the count1 value after the operation.

Note that do_sem_wait() is older usem interface.

Reported and tested by: bdrewery
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 9f7b7da5 15-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

In do_sem2_wait(), balance umtx_key_get() with umtx_key_release() on retry.

Reported by: ler
Bisected and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 12 days


# ad038195 15-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

In do_lock_pi(), do not return prematurely.

If umtxq_check_susp() indicates an exit, we should clean the resources
before returning. Do it by breaking out of the loop and relying on
post-loop cleanup.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Differential revision: https://reviews.freebsd.org/D20949


# 40bd868b 15-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

Correctly check for casueword(9) success in do_set_ceiling().

After r349951, the return code must be checked instead of old == new
comparision.

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Differential revision: https://reviews.freebsd.org/D20949


# 30b3018d 12-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

Provide protection against starvation of the ll/sc loops when accessing userpace.

Casueword(9) on ll/sc architectures must be prepared for userspace
constantly modifying the same cache line as containing the CAS word,
and not loop infinitely. Otherwise, rogue userspace livelocks the
kernel.

To fix the issue, change casueword(9) interface to return new value 1
indicating that either comparision or store failed, instead of relying
on the oldval == *oldvalp comparison. The primitive no longer retries
the operation if it failed spuriously. Modify callers of
casueword(9), all in kern_umtx.c, to handle retries, and react to
stops and requests to terminate between retries.

On x86, despite cmpxchg should not return spurious failures, we can
take advantage of the new interface and just return PSL.ZF.

Reviewed by: andrew (arm64, previous version), markj
Tested by: pho
Reported by: https://xenbits.xen.org/xsa/advisory-295.txt
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D20772


# 7fde3c6b 02-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

More style.

Re-wrap long lines, reformat comments, remove excessive blank line.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 4b8b28e1 02-Jul-2019 Konstantin Belousov <kib@FreeBSD.org>

Style.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 2d7a5552 28-Jun-2019 Konstantin Belousov <kib@FreeBSD.org>

Style.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# ac97da9a 08-May-2019 Mateusz Guzik <mjg@FreeBSD.org>

Reduce umtx-related work on exec and exit

- there is no need to take the process lock to iterate the thread
list after single-threading is enforced
- typically there are no mutexes to clean up (testable without taking
the global umtx lock)
- typically there is no need to adjust the priority (testable without
taking thread lock)

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20160


# 60178276 08-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

umtx: avoid umtxshm locking on object termination if possible

Sample build world result on tmpfs:
kern.ipc.umtx_terminate_notempty: 0
kern.ipc.umtx_terminate_empty: 2891815

Sponsored by: The FreeBSD Foundation


# b34f4419 08-Nov-2018 Brooks Davis <brooks@FreeBSD.org>

Make freebsd32_umtx_op follow the freebsd32_foo convention.

Sponsored by: DARPA, AFRL


# 6040822c 30-Jul-2018 Alan Somers <asomers@FreeBSD.org>

Make timespecadd(3) and friends public

The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725


# e1a92f05 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

umtx: don't call umtxq_getchain unless the value is needed


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# 91a74300 05-Mar-2018 Brooks Davis <brooks@FreeBSD.org>

Use umtx_copyin_umtx_time32() in __umtx_op_lock_umutex_compat32().

Non-NULL timeouts where copied in improperly and could produce failures
due to incompatible data structures.

Reviewed by: kib
MFC after: 3 days
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14587


# 8a36da99 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 30c43872 04-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Convert explicit panic() call to assert.

Based on github pull request: #113
Submitted by: pmarillo@github
MFC after: 1 week


# 9dbdf2a1 14-Mar-2017 Eric van Gyzen <vangyzen@FreeBSD.org>

When the RTC is adjusted, reevaluate absolute sleep times based on the RTC

POSIX 2008 says this about clock_settime(2):

If the value of the CLOCK_REALTIME clock is set via clock_settime(),
the new value of the clock shall be used to determine the time
of expiration for absolute time services based upon the
CLOCK_REALTIME clock. This applies to the time at which armed
absolute timers expire. If the absolute time requested at the
invocation of such a time service is before the new value of
the clock, the time service shall expire immediately as if the
clock had reached the requested time normally.

Setting the value of the CLOCK_REALTIME clock via clock_settime()
shall have no effect on threads that are blocked waiting for
a relative time service based upon this clock, including the
nanosleep() function; nor on the expiration of relative timers
based upon this clock. Consequently, these time services shall
expire when the requested relative interval elapses, independently
of the new or old value of the clock.

When the real-time clock is adjusted, such as by clock_settime(3),
wake any threads sleeping until an absolute real-clock time.
Such a sleep is indicated by a non-zero td_rtcgen. The sleep functions
will set that field to zero and return zero to tell the caller
to reevaluate its sleep duration based on the new value of the clock.

At present, this affects the following functions:

pthread_cond_timedwait(3)
pthread_mutex_timedlock(3)
pthread_rwlock_timedrdlock(3)
pthread_rwlock_timedwrlock(3)
sem_timedwait(3)
sem_clockwait_np(3)

I'm working on adding clock_nanosleep(2), which will also be affected.

Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed by: jhb, kib
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9791


# b215ceaa 23-Feb-2017 Eric van Gyzen <vangyzen@FreeBSD.org>

Add sem_clockwait_np()

This function allows the caller to specify the reference clock
and choose between absolute and relative mode. In relative mode,
the remaining time can be returned.

The API is similar to clock_nanosleep(3). Thanks to Ed Schouten
for that suggestion.

While I'm here, reduce the sleep time in the semaphore "child"
test to greatly reduce its runtime. Also add a reasonable timeout.

Reviewed by: ed (userland)
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9656


# 0046bef8 14-Nov-2016 Adrian Chadd <adrian@FreeBSD.org>

[mips] make UMTX_CHAINS configurable at compile time.

The default (512) wastes quite a bit of space which doesn't really buy
us much on highly embedded systems which don't take a lot of locks in
parallel.

This makes it at least build time configurable so people can experiment.


# 0f2d9783 25-Aug-2016 Konstantin Belousov <kib@FreeBSD.org>

In both do_rw_wrlock() and do_rw_rdlock() after r304808, do not
obliterate possible error from sleep with errors from
umtxq_check_susp(), when looping to clear URWLOCK_{READ,WRITE}_WAITERS.

Noted and reviewed by: vangyzen
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 28e21133 25-Aug-2016 Konstantin Belousov <kib@FreeBSD.org>

Prevent leak of URWLOCK_READ_WAITERS flag for urwlocks.

If there was some error, e.g. the sleep was interrupted, as in the
referenced PR, do_rw_rdlock() did not cleared URWLOCK_READ_WAITERS.
Since unlock only wakes up write waiters when there is no read
waiters, for URWLOCK_PREFER_READER kind of locks, the result was
missed wakeups for writers.

In particular, the most visible victims are ld-elf.so locks in
processes which loaded libthr, because rtld locks are urwlocks in
prefer-reader mode. Normal rwlocks fall into prefer-reader mode only
if thread already owns rw lock in read mode, which is not typical and
correspondingly less visible. In the PR, unowned rtld bind lock was
waited for in the process where only one thread was left alive.

Note that do_rw_wrlock() correctly clears URWLOCK_WRITE_WAITERS in
case of errors.

Reported and tested by: longwitz@incore.de
PR: 211947
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# b0f2185b 15-Aug-2016 Eric Badger <badger@FreeBSD.org>

sem_post(): wake up the sleeper only after adjusting has_waiters

If the caller of sem_post() wakes up a thread sleeping via sem_wait()
before it clears the has_waiters flag, the caller of sem_wait() has no way of
knowing when it is safe to destroy the semaphore and reuse the memory. This is
because the caller of sem_post() may be interrupted between the wake step and
the clearing of has_waiters. It will then write into the has_waiters flag in
userspace after being preempted for some unknown amount of time.

Reviewed by: jhb, kib, vangyzen
Approved by: kib (mentor), vangyzen (mentor)
MFC after: 2 weeks
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D7505


# 2a339d9e 17-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Add implementation of robust mutexes, hopefully close enough to the
intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013.

A robust mutex is guaranteed to be cleared by the system upon either
thread or process owner termination while the mutex is held. The next
mutex locker is then notified about inconsistent mutex state and can
execute (or abandon) corrective actions.

The patch mostly consists of small changes here and there, adding
neccessary checks for the inconsistent and abandoned conditions into
existing paths. Additionally, the thread exit handler was extended to
iterate over the userspace-maintained list of owned robust mutexes,
unlocking and marking as terminated each of them.

The list of owned robust mutexes cannot be maintained atomically
synchronous with the mutex lock state (it is possible in kernel, but
is too expensive). Instead, for the duration of lock or unlock
operation, the current mutex is remembered in a special slot that is
also checked by the kernel at thread termination.

Kernel must be aware about the per-thread location of the heads of
robust mutex lists and the current active mutex slot. When a thread
touches a robust mutex for the first time, a new umtx op syscall is
issued which informs about location of lists heads.

The umtx sleep queues for PP and PI mutexes are split between
non-robust and robust.

Somewhat unrelated changes in the patch:
1. Style.
2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared
pi mutexes.
3. Removal of the userspace struct pthread_mutex m_owner field.
4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls
the lifetime of the shared mutex associated with a vnode' page.

Reviewed by: jilles (previous version, supposedly the objection was fixed)
Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects)
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 2cfddaa6 19-Apr-2016 Konstantin Belousov <kib@FreeBSD.org>

Fix umtx lock/trylock for compat32.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 1bdbd705 28-Feb-2016 Konstantin Belousov <kib@FreeBSD.org>

Implement process-shared locks support for libthr.so.3, without
breaking the ABI. Special value is stored in the lock pointer to
indicate shared lock, and offline page in the shared memory is
allocated to store the actual lock.

Reviewed by: vangyzen (previous version)
Discussed with: deischen, emaste, jhb, rwatson,
Martin Simmons <martin@lispworks.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation


# 50657fd3 30-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

Minor (and incomplete) style cleanup.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2936e001 30-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

Also mark compat32 umtx op table as constant.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# c539e870 30-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

Use C99 array initialization, which also makes the code
self-documented, and eases addition of new ops.

For the similar reasons, eliminate UMTX_OP_MAX. nitems() handles the
only use of the symbol.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# dc4b5324 04-Aug-2015 Ed Schouten <ed@FreeBSD.org>

Fix bad arithmetic in umtx_key_get() to compute object offset.

It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3287


# 52942c1e 03-Aug-2015 Ed Schouten <ed@FreeBSD.org>

Add missing const keyword to function parameter.

The umtx_key_get() function does not dereference the address off the
userspace object. The pointer can safely be const.


# e858b027 28-Mar-2015 Eric van Gyzen <vangyzen@FreeBSD.org>

Clean up some cosmetic nits in kern_umtx.c, found during recent work
in this area and by the Clang static analyzer.

Remove some dead assignments.

Fix a typo in a panic string.

Use umtx_pi_disown() instead of duplicate code.

Use an existing variable instead of curthread.

Approved by: kib (mentor)
MFC after: 3 days
Sponsored by: Dell Inc


# 13dad108 27-Feb-2015 Konstantin Belousov <kib@FreeBSD.org>

The umtx_lock mutex is used by top-half of the kernel, but is
currently a spin lock. Apparently, the only reason for this is that
umtx_thread_exit() is called under the process spinlock, which put the
requirement on the umtx_lock. Note that the witness static order list
is wrong for the umtx_lock, umtx_lock is explicitely before any thread
lock, so it is also before sleepq locks.

Change umtx_lock to be the sleepable mutex. For the reason above, the
calls to umtx_thread_exit() are moved from thread_exit() earlier in
each caller, when the process spin lock is not yet taken.

Discussed with: jhb
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 84b736b2 25-Feb-2015 Konstantin Belousov <kib@FreeBSD.org>

When failing to claim ownership of a umtx_pi, restore the umutex owner
to its previous, unowned state. This avoids compounding an existing
problem of inconsistent ownership.

Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
Obtained from: Dell Inc.
PR: 198914
MFC after: 1 week


# cc876d2c 25-Feb-2015 Konstantin Belousov <kib@FreeBSD.org>

When unlocking a contested PI pthread mutex, if the queue of waiters
is empty, look up the umtx_pi and disown it if the current thread owns it.
This can happen if a signal or timeout removed the last waiter from
the queue, but there is still a thread in do_lock_pi() holding a reference
on the umtx_pi. The unlocking thread might not own the umtx_pi in this case,
but if it does, it must disown it to keep the ownership consistent between
the umtx_pi and the umutex.

Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc.
Obtained from: Dell Inc.
PR: 198914


# 6690381e 30-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

The dependency chain for priority-inheritance mutexes could be
subverted by userspace into cycle. Both umtx_propagate_priority() and
umtx_repropagate_priority() would then loop infinitely, owning the
spinlock.

Check for the cycle using standard Floyd' algorithm before doing the
pass in the affected functions. Add simple check for condition of
tricking the thread into a wait for itself, which could be easily
simulated by usermode without race.

Found by: Eric van Gyzen <eric@vangyzen.net>
In collaboration with: Eric van Gyzen <eric@vangyzen.net>
Tested by: pho
MFC after: 1 week


# 416be7a1 13-Nov-2014 Konstantin Belousov <kib@FreeBSD.org>

Fix assertion, &uc->uc_busy is never zero, the intent is to test the
uc_busy value, and not its address [1].

Remove the single use of the macro, write KASSERT() explicitely in the
code of umtxq_sleep_pi().

Submitted by: Eric van Gyzen <eric@vangyzen.net> [1]
MFC after: 1 week


# 2361c6d1 31-Oct-2014 Konstantin Belousov <kib@FreeBSD.org>

Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9). This makes the functions type-compatible
with volatile objects and does not require devolatile force, e.g. in
kern_umtx.c.

Requested by: bde
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# f7e91c28 28-Oct-2014 Konstantin Belousov <kib@FreeBSD.org>

Convert kern_umtx.c to use fueword() and casueword().

Also fix some mishandling of suword(9) errors as errno, which resulted
in spurious ERESTART.

Sponsored by: The FreeBSD Foundation
Tested by: pho
MFC after: 3 weeks


# 1bc9ea1c 25-Oct-2014 John Baldwin <jhb@FreeBSD.org>

Use correct type in __DEVOLATILE().


# 6362e06b 24-Oct-2014 Xin LI <delphij@FreeBSD.org>

Fix build.


# 53e1ffbb 24-Oct-2014 John Baldwin <jhb@FreeBSD.org>

The current POSIX semaphore implementation stores the _has_waiters flag
in a separate word from the _count. This does not permit both items to
be updated atomically in a portable manner. As a result, sem_post()
must always perform a system call to safely clear _has_waiters.

This change removes the _has_waiters field and instead uses the high bit
of _count as the _has_waiters flag. A new umtx object type (_usem2) and
two new umtx operations are added (SEM_WAIT2 and SEM_WAKE2) to implement
these semantics. The older operations are still supported under the
COMPAT_FREEBSD9/10 options. The POSIX semaphore API in libc has
been updated to use the new implementation. Note that the new
implementation is not compatible with the previous implementation.
However, this only affects static binaries (which cannot be helped by
symbol versioning). Binaries using a dynamic libc will continue to work
fine. SEM_MAGIC has been bumped so that mismatched binaries will error
rather than corrupting a shared semaphore. In addition, a padding field
has been added to sem_t so that it remains the same size.

Differential Revision: https://reviews.freebsd.org/D961
Reported by: adrian
Reviewed by: kib, jilles (earlier version)
Sponsored by: Norse


# fbb6eca6 22-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

In do_lock_pi(), do not override error from umtxq_sleep_pi() when
doing suspend check. This restores the pre-r251684 behaviour, to
retry once after the signal is detected.

PR: kern/192918
Submitted by: Elliott Rabe, Dell Inc., Eric van Gyzen <eric@vangyzen.net>
Obtained from: Dell Inc.
MFC after: 1 week


# 3198603e 18-Mar-2014 Attilio Rao <attilio@FreeBSD.org>

Fix comments.

Sponsored by: EMC / Isilon Storage Division


# ce42e793 18-Mar-2014 Attilio Rao <attilio@FreeBSD.org>

Remove dead code from umtx support:
- Retire long time unused (basically always unused) sys__umtx_lock()
and sys__umtx_unlock() syscalls
- struct umtx and their supporting definitions
- UMUTEX_ERROR_CHECK flag
- Retire UMTX_OP_LOCK/UMTX_OP_UNLOCK from _umtx_op() syscall

__FreeBSD_version is not bumped yet because it is expected that further
breakages to the umtx interface will follow up in the next days.
However there will be a final bump when necessary.

Sponsored by: EMC / Isilon storage division
Reviewed by: jhb


# 1d7466bc 13-Jun-2013 Konstantin Belousov <kib@FreeBSD.org>

Fix two issues with the spin loops in the umtx(2) implementation.

- When looping, check for the pending suspension. Otherwise, other
usermode thread which races with the looping one, could try to
prevent the process from stopping or exiting.

- Add missed checks for the faults from casuword*(). The code is
structured in a way which makes the loops exit if the specified
address is invalid, since both fuword() and casuword() return -1 on
the fault. But if the address is mapped readonly, the typical value
read by fuword() is different from -1, while casuword() returns -1.
Absent the checks for casuword() faults, this is interpreted as the
race with other thread and causes non-interruptible spinning in the
kernel.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 1e367efa 19-Apr-2013 Jilles Tjoelker <jilles@FreeBSD.org>

sem: Restart the POSIX sem_* calls after signals with SA_RESTART set.

Programs often do not expect an [EINTR] return from sem_wait() and POSIX
only allows it if the signal was installed without SA_RESTART. The timeout
in sem_timedwait() is absolute so it can be restarted normally.

The umtx call can be invoked with a relative timeout and in that case
[ERESTART] must be changed to [EINTR]. However, libc does not do this.

The old POSIX semaphore implementation did this correctly (before r249566),
unlike the new umtx one.

It may be desirable to avoid [EINTR] completely, which matches the pthread
functions and is explicitly permitted by POSIX. However, the kernel must
return [EINTR] at least for signals with SA_RESTART clear, otherwise pthread
cancellation will not abort a semaphore wait. In this commit, only restore
the 8.x behaviour which is also permitted by POSIX.

Discussed with: jhb
MFC after: 1 week


# d52d7aa8 21-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Fix a bug in UMTX_PROFILING:
UMTX_PROFILING should really analyze the distribution of locks as they
index entries in the umtxq_chains hash-table.
However, the current implementation does add/dec the length counters
for *every* thread insert/removal, measuring at all really userland
contention and not the hash distribution.

Fix this by correctly add/dec the length counters in the points where
it is really needed.

Please note that this bug brought us questioning in the past the quality
of the umtx hash table distribution.
To date with all the benchmarks I could try I was not able to reproduce
any issue about the hash distribution on umtx.

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff, davide
MFC after: 2 weeks


# 1fc8c346 09-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Improve UMTX_PROFILING:
- Use u_int values for length and max_length values
- Add a way to reset the max_length heuristic in order to have the
possibility to reuse the mechanism consecutively without rebooting
the machine
- Add a way to quick display top5 contented buckets in the system for
the max_length value.
This should give a quick overview on the quality of the hash table
distribution.

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff, davide


# ba4be211 27-Oct-2012 Davide Italiano <davide@FreeBSD.org>

The fields of struct timespec32 should be int32_t and not uint32_t.
Make this change.

Reviewed by: bde, davidxu
Tested by: pho
MFC after: 1 week


# d7f97db7 11-Aug-2012 David Xu <davidxu@FreeBSD.org>

Some style fixes inspired by @bde.


# e8afbca2 10-Aug-2012 David Xu <davidxu@FreeBSD.org>

tvtohz will print out an error message if a negative value is given
to it, avoid this problem by detecting timeout earlier.

Reported by: pho


# 331805a5 14-Apr-2012 Davide Italiano <davide@FreeBSD.org>

Fix some style bugs introduced in a previous commit (r233045)

Reported by: glebius, jmallet
Reviewed by: jmallet
Approved by: gnn (mentor)
MFC after: 2 days


# 8931e524 04-Apr-2012 David Xu <davidxu@FreeBSD.org>

In sem_post, the field _has_waiters is no longer used, because some
application destroys semaphore after sem_wait returns. Just enter
kernel to wake up sleeping threads, only update _has_waiters if
it is safe. While here, check if the value exceed SEM_VALUE_MAX and
return EOVERFLOW if this is true.


# 17ce6063 04-Apr-2012 David Xu <davidxu@FreeBSD.org>

umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily. On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c


# 8b1eafa7 31-Mar-2012 David Xu <davidxu@FreeBSD.org>

Remove stale comments.


# b29d7d9b 29-Mar-2012 David Xu <davidxu@FreeBSD.org>

Remove trailing semicolon, it is a typo.


# 0cf573e9 30-Mar-2012 David Xu <davidxu@FreeBSD.org>

Fix COMPAT_FREEBSD32 build.

Submitted by: Andreas Tobler < andreast at fgznet dot ch >


# 4ed8858d 29-Mar-2012 David Xu <davidxu@FreeBSD.org>

Remove trailing space.


# e05171d9 29-Mar-2012 David Xu <davidxu@FreeBSD.org>

Merge umtxq_sleep and umtxq_nanosleep into a single function by using
an abs_timeout structure which describes timeout info.


# d31f470d 28-Mar-2012 David Xu <davidxu@FreeBSD.org>

Reduce code size by creating common timed sleeping function.


# c6111de5 16-Mar-2012 Davide Italiano <davide@FreeBSD.org>

Add rudimentary profiling of the hash table used in the in the umtx code to
hold active lock queues.

Reviewed by: attilio
Approved by: davidxu, gnn (mentor)
MFC after: 3 weeks


# c9b01ed5 28-Feb-2012 David Xu <davidxu@FreeBSD.org>

initialize clock ID and flags only when copying timespec, a _umtx_time
copy already contains these fields.


# 24c20949 27-Feb-2012 David Xu <davidxu@FreeBSD.org>

Follow changes made in revision 232144, pass absolute timeout to kernel,
this eliminates a clock_gettime() syscall.


# df1f1bae 24-Feb-2012 David Xu <davidxu@FreeBSD.org>

In revision 231989, we pass a 16-bit clock ID into kernel, however
according to POSIX document, the clock ID may be dynamically allocated,
it unlikely will be in 64K forever. To make it future compatible, we
pack all timeout information into a new structure called _umtx_time, and
use fourth argument as a size indication, a zero means it is old code
using timespec as timeout value, but the new structure also includes flags
and a clock ID, so the size argument is different than before, and it is
non-zero. With this change, it is possible that a thread can sleep
on any supported clock, though current kernel code does not have such a
POSIX clock driver system.


# f911d9fa 22-Feb-2012 David Xu <davidxu@FreeBSD.org>

Fix typo.


# b13a8fa7 21-Feb-2012 David Xu <davidxu@FreeBSD.org>

Use unused fourth argument of umtx_op to pass flags to kernel for operation
UMTX_OP_WAIT. Upper 16bits is enough to hold a clock id, and lower
16bits is used to pass flags. The change saves a clock_gettime() syscall
from libthr.


# 29a06690 15-Jan-2012 David Xu <davidxu@FreeBSD.org>

Eliminate branch and insert an explicit reader memory barrier to ensure
that waiter bit is set before reading semaphore count.


# 662ebe9b 02-Dec-2011 Peter Holm <pho@FreeBSD.org>

Add umtx_copyin_timeout() and move parameter checks here.

In collaboration with: kib
MFC after: 1 week


# ff77dfb0 02-Dec-2011 Peter Holm <pho@FreeBSD.org>

Rename copyin_timeout32 to umtx_copyin_timeout32 and move parameter
check here. Include check for negative seconds value.

In collaboration with: kib
MFC after: 1 week


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 3379ac59 23-Feb-2011 John Baldwin <jhb@FreeBSD.org>

Expose the umtx_key structure and API to the rest of the kernel.

MFC after: 3 days


# c8e368a9 29-Dec-2010 David Xu <davidxu@FreeBSD.org>

- Follow r216313, the sched_unlend_user_prio is no longer needed, always
use sched_lend_user_prio to set lent priority.
- Improve pthread priority-inherit mutex, when a contender's priority is
lowered, repropagete priorities, this may cause mutex owner's priority
to be lowerd, in old code, mutex owner's priority is rise-only.


# 1c45127b 22-Dec-2010 David Xu <davidxu@FreeBSD.org>

Enlarge hash table for new condition variable.


# d1078b0b 21-Dec-2010 David Xu <davidxu@FreeBSD.org>

MFp4:

- Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based
condition variable, this should eliminate an extra system call to get
current time.

- Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single
system call. Create userland sleep queue for condition variable, in most
cases, thread will wait in the queue, the pthread_cond_signal will defer
thread wakeup until the mutex is unlocked, it tries to avoid an extra
system call and a extra context switch in time window of pthread_cond_signal
and pthread_mutex_unlock.

The changes are part of process-shared mutex project.


# e0f389c8 15-Dec-2010 Matthew D Fleming <mdf@FreeBSD.org>

One of the compat32 functions was copying in a raw timespec, instead of
a 32-bit one. This can cause weird timeout issues, as the copying reads
garbage from the user.

Code by: Deepak Veliath <deepak dot veliath at isilon dot com>
MFC after: 1 week


# acbe332a 08-Dec-2010 David Xu <davidxu@FreeBSD.org>

MFp4:
It is possible a lower priority thread lending priority to higher priority
thread, in old code, it is ignored, however the lending should always be
recorded, add field td_lend_user_pri to fix the problem, if a thread does
not have borrowed priority, its value is PRI_MAX.

MFC after: 1 week


# b169d0ef 21-Nov-2010 David Xu <davidxu@FreeBSD.org>

Use atomic instruction to set _has_writer, otherwise there is a race
causes userland to not wake up a thread sleeping in kernel.

MFC after: 3 days


# 32c63db5 15-Nov-2010 David Xu <davidxu@FreeBSD.org>

Only unlock process if a thread is found.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# cf7d9a8c 08-Oct-2010 David Xu <davidxu@FreeBSD.org>

Create a global thread hash table to speed up thread lookup, use
rwlock to protect the table. In old code, thread lookup is done with
process lock held, to find a thread, kernel has to iterate through
process and thread list, this is quite inefficient.
With this change, test shows in extreme case performance is
dramatically improved.

Earlier patch was reviewed by: jhb, julian


# df744253 24-Aug-2010 David Xu <davidxu@FreeBSD.org>

If a thread is removed from umtxq while sleeping, reset error code
to zero, this gives userland a better indication that a thread needn't
to be cancelled.


# 60ae52f7 21-Jun-2010 Ed Schouten <ed@FreeBSD.org>

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# 4ccf64eb 06-Apr-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

MFC r205014,205015:

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

This MFC is required for MFCs of later changes to the freebsd32
compatibility from HEAD.

Requested by: kib


# 841c0c7e 11-Mar-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


# 8251549f 09-Feb-2010 David Xu <davidxu@FreeBSD.org>

In function umtxq_insert_queue, use parameter q (shared/exclusive queue)
instead of hard coded constant. This does not affect RELENG_8 and previous,
because the code only exists in the HEAD.


# 93f01627 08-Feb-2010 David Xu <davidxu@FreeBSD.org>

Set waiters flag before checking semaphore's counter,
otherwise we might lose a wakeup. Tested on postgresql database server.


# d430c009 07-Feb-2010 David Xu <davidxu@FreeBSD.org>

MFC r203414:
After busied the lock, re-read state word before checking waiters flag,
otherwise, the waiters bit may not be set and a wakeup is lost.


# 43271eac 03-Feb-2010 David Xu <davidxu@FreeBSD.org>

Fix comments in do_sem_wait().


# 676e6574 02-Feb-2010 David Xu <davidxu@FreeBSD.org>

After busied the lock, re-read state word before checking waiters flag,
otherwise, the waiters bit may not be set and a wakeup is lost.

Submitted by: justin.teller at gmail dot com
MFC after: 3 days


# a4b0b4b0 10-Jan-2010 David Xu <davidxu@FreeBSD.org>

Make a chain be a list of queues, and make threads waiting
for same key coalesce to same queue, this makes searching
path shorter and improves performance.
Also fix comments about shared PI-mutex.


# 73532aa7 08-Jan-2010 David Xu <davidxu@FreeBSD.org>

Use enum to define key types.

Suggested by: jmallett


# 4904f91f 08-Jan-2010 David Xu <davidxu@FreeBSD.org>

put semaphore waiter in long term list.


# 2c3b3fef 08-Jan-2010 David Xu <davidxu@FreeBSD.org>

Add key type TYPE_SEM.


# 79f8b619 03-Jan-2010 David Xu <davidxu@FreeBSD.org>

Add user-level semaphore synchronous type, this change allows multiple
processes to share semaphore by using shared memory area, in simplest case,
only one atomic operation is needed in userland, waiter flag is maintained by
kernel and userland only checks the flag, if the flag is set, user code enters
kernel and does a wakeup() call.
Move type definitions into file _umtx.h to minimize compiling time.
Also type names need to be prefixed with underline character, this would reduce
name conflict (still in progress).


# be0ac160 13-Oct-2009 Attilio Rao <attilio@FreeBSD.org>

MFC r197476:
In function do_rw_wrlock, when a writer got an error and before returning,
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.

Approved by: re (kib), davidxu


# b101b127 24-Sep-2009 David Xu <davidxu@FreeBSD.org>

In function do_rw_wrlock, when a writer got an error and before returning,
check if there are readers blocked by us via URWLOCK_WRITE_WAITERS flag,
and resume the readers. The error must be EAGAIN, otherwise there must
have memory problem, and nobody can rescue the buggy application.

The revision 197445 might be reverted.


# 94548829 12-Apr-2009 David Xu <davidxu@FreeBSD.org>

Make UMTX_OP_WAIT_UINT actually wait for an unsigned integer on 64-bits
machine.

MFC after: 1 week


# 326bf949 13-Mar-2009 David Xu <davidxu@FreeBSD.org>

1) Check NULL pointer before calling umtx_pi_adjust_locked(), this avoids
a PANIC.
2) Rework locking for POSIX priority-mutex, this fixes a
race where a thread may wait there forever even if the mutex is unlocked.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 7de1ecef 24-Jun-2008 David Xu <davidxu@FreeBSD.org>

Add two commands to _umtx_op system call to allow a simple mutex to be
locked and unlocked completely in userland. by locking and unlocking mutex
in userland, it reduces the total time a mutex is locked by a thread,
in some application code, a mutex only protects a small piece of code, the
code's execution time is less than a simple system call, if a lock contention
happens, however in current implemenation, the lock holder has to extend its
locking time and enter kernel to unlock it, the change avoids this disadvantage,
it first sets mutex to free state and then enters kernel and wake one waiter
up. This improves performance dramatically in some sysbench mutex tests.

Tested by: kris
Sounds great: jeff


# 6e24e617 29-May-2008 David Xu <davidxu@FreeBSD.org>

Use a seperated hash table for mutex and rwlock, avoid wasting some time
on walking through idle threads sleeping on condition variables.


# 727158f6 28-Apr-2008 David Xu <davidxu@FreeBSD.org>

Introduce command UMTX_OP_WAIT_UINT_PRIVATE and UMTX_OP_WAKE_PRIVATE
to allow userland to specify that an address is not shared by multiple
processes.


# 44253336 03-Apr-2008 David Xu <davidxu@FreeBSD.org>

let umtxq_busy() only spin on mp machine. make function name
do_rwlock_unlock to be consistent with others.


# fadd84c5 01-Apr-2008 David Xu <davidxu@FreeBSD.org>

Fix compiling problem for amd64.


# 11b1023b 01-Apr-2008 David Xu <davidxu@FreeBSD.org>

Er, don't restart a timeout version.


# 1a30511c 01-Apr-2008 David Xu <davidxu@FreeBSD.org>

Introduce kernel based userland rwlock. Each umtx chain now has two lists,
one for readers and one for writers, other types of synchronization
object just use first list.

Asked by: jeff


# 7fab871d 17-Dec-2007 David Xu <davidxu@FreeBSD.org>

Check NULL pointer.


# 9514dcc0 16-Dec-2007 David Xu <davidxu@FreeBSD.org>

Add missing changes for fixing LOR of umtx lock and thread lock, follow
the committing of files:
kern_resource.c revision 1.181
sched_4bsd.c revision 1.111
sched_ule.c revision 1.218


# 110de0cf 20-Nov-2007 David Xu <davidxu@FreeBSD.org>

Add function UMTX_OP_WAIT_UINT, the function causes thread to wait for
an integer to be changed.


# 42ce445f 06-Jun-2007 David Xu <davidxu@FreeBSD.org>

Backout experimental adaptive-spin umtx code.


# 3c2e4436 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 8/14 of sched_lock decomposition.
- Use a global umtx spinlock to protect the sleep queues now that there
is no global scheduler lock.
- Use thread_lock() to protect thread state.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 873fbcd7 05-Mar-2007 Robert Watson <rwatson@FreeBSD.org>

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 4e32b7b3 19-Dec-2006 David Xu <davidxu@FreeBSD.org>

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


# ad1e7d28 05-Dec-2006 Julian Elischer <julian@FreeBSD.org>

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


# 745fbd3a 04-Dec-2006 David Xu <davidxu@FreeBSD.org>

if a thread blocked on userland condition variable is
pthread_cancel()ed, it is expected that the thread will not
consume a pthread_cond_signal(), therefor, we use thr_wake()
to mark a flag, the flag tells a thread calling do_cv_wait()
in umtx code to not block on a condition variable.
Thread library is expected that once a thread detected itself
is in pthread_cond_wait, it will call the thr_wake() for itself
in its SIGCANCEL handler.


# a6abdf32 02-Dec-2006 David Xu <davidxu@FreeBSD.org>

Introduce userspace condition variable, since we have already POSIX
priority mutex implemented, it is the time to introduce this stuff,
now we can use umutex and ucond together to implement pthread's
condition wait/signal.


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 8460a577 26-Oct-2006 John Birrell <jb@FreeBSD.org>

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 4c9b02c2 26-Oct-2006 David Xu <davidxu@FreeBSD.org>

Optimize umtx_lock_pi() a bit by moving some heavy code out of the loop,
make a fast path when a umtx_pi can be allocated without being blocked.


# 7c24ae41 25-Oct-2006 David Xu <davidxu@FreeBSD.org>

In order to eliminate a branch, convert opcode to unsigned integer.


# 91d0b4d6 25-Oct-2006 David Xu <davidxu@FreeBSD.org>

Eliminate an unnecessary `if' statement.


# 5f641fc0 16-Oct-2006 David Xu <davidxu@FreeBSD.org>

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


# ae7d8a67 06-Oct-2006 David Xu <davidxu@FreeBSD.org>

Implement 32bit umtx_lock and umtx_unlock system calls, these two system
calls are not used by libthr in RELENG_6 and HEAD, it is only used by
the libthr in RELENG-5, the _umtx_op system call can do more incremental
dirty works than these two system calls without having to introduce new
system calls or throw away old system calls when things are going on.


# e58b17ea 22-Sep-2006 David Xu <davidxu@FreeBSD.org>

Fix umtx command order error for freebsd 32bit.


# 1eec02f5 21-Sep-2006 David Xu <davidxu@FreeBSD.org>

Add umtx support for 32bit process on AMD64 machine.


# 654d6b2e 04-Sep-2006 David Xu <davidxu@FreeBSD.org>

Merge all code of do_lock_normal, do_lock_pi and do_lock_pp into
function do_lock_umutex.


# 295ce693 02-Sep-2006 David Xu <davidxu@FreeBSD.org>

Check if it is root user in do_unlock_pp.


# 81273e06 01-Sep-2006 David Xu <davidxu@FreeBSD.org>

Make sure we get new m_owner value if we can not unlock it in
uncontested case. Reorder statements in do_unlock_umutex.


# 8a156460 30-Aug-2006 David Xu <davidxu@FreeBSD.org>

Reorder some statments. Fix typo and remove stale comments.


# a324b5ec 28-Aug-2006 David Xu <davidxu@FreeBSD.org>

Update comments about interrupted mutex locking.


# d10183d9 27-Aug-2006 David Xu <davidxu@FreeBSD.org>

This is initial version of POSIX priority mutex support, a new userland
mutex structure is added as following:
struct umutex {
__lwpid_t m_owner;
uint32_t m_flags;
uint32_t m_ceilings[2];
uint32_t m_spare[4];
};
The m_owner represents owner thread, it is a thread id, in non-contested
case, userland can simply use atomic_cmpset_int to lock the mutex, if the
mutex is contested, high order bit will be set, and userland should do locking
and unlocking via kernel syscall. Flag UMUTEX_PRIO_INHERIT represents
pthread's PTHREAD_PRIO_INHERIT mutex, which when contention happens, kernel
should do priority propagating. Flag UMUTEX_PRIO_PROTECT indicates it is
pthread's PTHREAD_PRIO_PROTECT mutex, userland should initialize m_owner
to contested state UMUTEX_CONTESTED, then atomic_cmpset_int will be failure
and kernel syscall should be invoked to do locking, this becauses
for such a mutex, kernel should always boost the thread's priority before
it can lock the mutex, m_ceilings is used by PTHREAD_PRIO_PROTECT mutex,
the first element is used to boost thread's priority when it locked the mutex,
second element is used when the mutex is unlocked, the PTHREAD_PRIO_PROTECT
mutex's link list is kept in userland, the m_ceiling[1] is managed by thread
library so kernel needn't allocate memory to keep the link list, when such
a mutex is unlocked, kernel reset m_owner to UMUTEX_CONTESTED.
Flag USYNC_PROCESS_SHARED indicate if the synchronization object is process
shared, if the flag is not set, it saves a vm_map_lookup() call.

The umtx chain is still used as a sleep queue, when a thread is blocked on
PTHREAD_PRIO_INHERIT mutex, a umtx_pi is allocated to support priority
propagating, it is dynamically allocated and reference count is used,
it is not optimized but works well in my tests, while the umtx chain has
its own locking protocol, the priority propagating protocol are all protected
by sched_lock because priority propagating function is called with sched_lock
held from scheduler.

No visible performance degradation is found which these changes. Some parameter
names in _umtx_op syscall are renamed.


# 3db720fd 25-Aug-2006 David Xu <davidxu@FreeBSD.org>

Add user priority loaning code to support priority propagation for
1:1 threading's POSIX priority mutexes, the code is no-op unless
priority-aware umtx code is committed.


# 7b8d8212 18-May-2006 David Xu <davidxu@FreeBSD.org>

Move flag TDF_UMTXQ into structure umtxq, this eliminates the requirement
of scheduler lock in some umtx code.


# 005efcdb 09-May-2006 David Xu <davidxu@FreeBSD.org>

Use wakeup_one to avoid thundering herd.

Tested by: kris


# 0f180a7c 17-Apr-2006 John Baldwin <jhb@FreeBSD.org>

Change msleep() and tsleep() to not alter the calling thread's priority
if the specified priority is zero. This avoids a race where the calling
thread could read a snapshot of it's current priority, then a different
thread could change the first thread's priority, then the original thread
would call sched_prio() inside msleep() undoing the change made by the
second thread. I used a priority of zero as no thread that calls msleep()
or tsleep() should be specifying a priority of zero anyway.

The various places that passed 'curthread->td_priority' or some variant
as the priority now pass 0.


# a99f7ca2 03-Feb-2006 David Xu <davidxu@FreeBSD.org>

Axe unused code.


# 4938faa6 26-Oct-2005 David Xu <davidxu@FreeBSD.org>

do umtx_wake at userland thread exit address, so that others userland
threads can wait for a thread to exit, and safely assume that the thread
has left userland and is no longer using its userland stack, this is
necessary for pthread_join when a thread is waiting for another thread
to exit which has user customized stack, after pthread_join returns,
the userland stack can be reused for other purposes, without this change,
the joiner thread has to spin at the address to ensure the thread is really
exited.


# bc8e6d81 05-Mar-2005 David Xu <davidxu@FreeBSD.org>

Allocate umtx_q from heap instead of stack, this avoids
page fault panic in kernel under heavy swapping.


# a2cc61fa 18-Jan-2005 David Xu <davidxu@FreeBSD.org>

Revert my previous errno hack, that is certainly an issue,
and always has been, but the system call itself returns
errno in a register so the problem is really a function of
libc, not the system call.

Discussed with : Matthew Dillion <dillon@apollo.backplane.com>


# b7be40d6 14-Jan-2005 David Xu <davidxu@FreeBSD.org>

make umtx timeout relative so userland can select different clock type,
e.g, CLOCK_REALTIME or CLOCK_MONOTONIC.
merge umtx_wait and umtx_timedwait into single function.


# 3963baec 12-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Comment out debugging printf which doesn't compile on amd64.


# 333d4875 11-Jan-2005 David Xu <davidxu@FreeBSD.org>

Let _umtx_op directly return error code rather than from errno because
errno can be tampered potentially by nested signal handle.
Now all error codes are returned in negative value, positive value are
reserved for future expansion.


# 3e380f0d 07-Jan-2005 David Xu <davidxu@FreeBSD.org>

Break out of loop earlier if it is not timeout.


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 476e1d07 05-Jan-2005 David Xu <davidxu@FreeBSD.org>

Return ETIMEDOUT when thread is timeouted since POSIX thread
APIs expect ETIMEDOUT not EAGAIN, this simplifies userland code a
bit.


# cc1000ac 29-Dec-2004 David Xu <davidxu@FreeBSD.org>

Make umtx_wait and umtx_wake more like linux futex does, it is
more general than previous. It also lets me implement cancelable point
in thread library. Also in theory, umtx_lock and umtx_unlock can
be implemented by using umtx_wait and umtx_wake, all atomic operations
can be done in userland without kernel's casuptr() function.


# c180db2b 25-Dec-2004 David Xu <davidxu@FreeBSD.org>

Make _umtx_op() as more general interface, the final parameter needn't be
timespec pointer, every parameter will be interpreted by its opcode.


# 8b37fbab 24-Dec-2004 David Xu <davidxu@FreeBSD.org>

1. introduce umtx_owner to get an owner of a umtx.
2. add const qualifier to umtx_timedlock and umtx_timedwait.
3. add missing blackets in umtx do_unlock_and_wait.


# 3dd213f1 24-Dec-2004 David Xu <davidxu@FreeBSD.org>

Add umtxq_lock/unlock around umtx_signal, fix debug kernel compiling,
let umtx_lock returns EINTR when it returns ERESTART, this lets
userland have chance to back off mtx lock code when needed.


# a08c214a 24-Dec-2004 David Xu <davidxu@FreeBSD.org>

1. Fix race condition between umtx lock and unlock, heavy testing
on SMP can explore the bug.
2. Let umtx_wake returns number of threads have been woken.


# 839f811c 18-Dec-2004 David Xu <davidxu@FreeBSD.org>

1. msleep returns EWOULDBLOCK not ETIMEDOUT, use EWOULDBLOCK instead.
2. Eliminate a possible lock leak in timed wait loop.


# 50586e8b 17-Dec-2004 David Xu <davidxu@FreeBSD.org>

1. make umtx sharable between processes, the way is two or more processes
call mmap() to create a shared space, and then initialize umtx on it,
after that, each thread in different processes can use the umtx same
as threads in same process.
2. introduce a new syscall _umtx_op to support timed lock and condition
variable semantics. also, orignal umtx_lock and umtx_unlock inline
functions now are reimplemented by using _umtx_op, the _umtx_op can
use arbitrary id not just a thread id.


# d111b340 29-Nov-2004 David Xu <davidxu@FreeBSD.org>

Forgot to inline umtxq_unlock.


# 3f76af0f 29-Nov-2004 David Xu <davidxu@FreeBSD.org>

1. use per-chain mutex instead of global mutex to reduce
lock collision.
2. Fix two race conditions. One is between _umtx_unlock and signal,
also a thread was marked TDF_UMTXWAKEUP by _umtx_unlock, it is
possible a signal delivered to the thread will cause msleep
returns EINTR, and the thread breaks out of loop, this causes
umtx ownership is not transfered to the thread. Another is in
_umtx_unlock itself, when the function sets the umtx to
UMTX_UNOWNED state, a new thread can come in and lock the umtx,
also the function tries to set contested bit flag, but it will
fail. Although the function will wake a blocked thread, if that
thread breaks out of loop by signal, no contested bit will be set.


# c21e3b38 12-Jul-2004 Mike Makonnen <mtm@FreeBSD.org>

writers must hold both sched_lock and the process lock; therefore, readers
need only obtain the process lock.


# cd28f17d 01-Jul-2004 Marcel Moolenaar <marcel@FreeBSD.org>

Change the thread ID (thr_id_t) used for 1:1 threading from being a
pointer to the corresponding struct thread to the thread ID (lwpid_t)
assigned to that thread. The primary reason for this change is that
libthr now internally uses the same ID as the debugger and the kernel
when referencing to a kernel thread. This allows us to implement the
support for debugging without additional translations and/or mappings.

To preserve the ABI, the 1:1 threading syscalls, including the umtx
locking API have not been changed to work on a lwpid_t. Instead the
1:1 threading syscalls operate on long and the umtx locking API has
not been changed except for the contested bit. Previously this was
the least significant bit. Now it's the most significant bit. Since
the contested bit should not be tested by userland, this change is
not expected to be visible. Just to be sure, UMTX_CONTESTED has been
removed from <sys/umtx.h>.

Reviewed by: mtm@
ABI preservation tested on: i386, ia64


# 0af67a2e 27-Mar-2004 Mike Makonnen <mtm@FreeBSD.org>

Use the proc lock to sleep on a libthr umtx.


# f05a427a 07-Sep-2003 Tim J. Robbins <tjr@FreeBSD.org>

Return EINVAL if the contested bit is not set on the umtx passed to
_umtx_unlock() instead of firing a KASSERT.


# 80611144 23-Jul-2003 Peter Wemm <peter@FreeBSD.org>

Initialize 'blocked' to NULL. I think this was a real problem, but I
am not sure about that. The lack of -Werror and the inline noise hid
this for a while.


# 6022ec67 19-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

Turn a KASSERT back into an EINVAL return value. So, next time someone
comes across it, it will turn into a core dump in userland instead of
a kernel panic. I had also inverted the sense of the test, so

Double pointy hat to: mtm


# 5c6edbec 18-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

Remove a lock held across casuptr() that snuck in last commit.


# 7df7f5c5 18-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

Move the decision on whether to unset the contested
bit or not from lock to unlock time.

Suggested by: jhb


# 994599d7 17-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

Fix umtx locking, for libthr, in the kernel.
1. There was a race condition between a thread unlocking
a umtx and the thread contesting it. If the unlocking
thread won the race it may try to wakeup a thread that
was not yet in msleep(). The contesting thread would then
go to sleep to await a wakeup that would never come. It's
not possible to close the race by using a lock because
calls to casuptr() may have to fault a page in from swap.
Instead, the race was closed by introducing a flag that
the unlocking thread will set when waking up a thread.
The contesting thread will check for this flag before
going to sleep. For now the flag is kept in td_flags,
but it may be better to use some other member or create
a new one because of the possible performance/contention
issues of having to own sched_lock. Thanks to jhb for
pointing me in the right direction on this one.

2. Once a umtx was contested all future locks and unlocks
were happening in the kernel, regardless of whether it
was contested or not. To prevent this from happening,
when a thread locks a umtx it checks the queue for that
umtx and unsets the contested bit if there are no other
threads waiting on it. Again, this is slightly more
complicated than it needs to be because we can't hold
a lock across casuptr(). So, the thread has to check
the queue again after unseting the bit, and reset the
contested bit if it finds that another thread has put
itself on the queue in the mean time.

3. Remove the if... block for unlocking an uncontested
umtx, and replace it with a KASSERT. The _only_ time
a thread should be unlocking a umtx in the kernel is
if it is contested.


# e55c35c4 04-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

I was so happy I found the semi-colon from hell that I didn't
notice another typo in the same line. This typo makes libthr unuseable,
but it's effects where counter-balanced by the extra semicolon, which
made libthr remarkably useable for the past several months.


# 1069e3a6 04-Jul-2003 Mike Makonnen <mtm@FreeBSD.org>

It's unfair how one extraneous semi-colon can cause so much grief.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 980c75b4 02-Jun-2003 Jeff Roberson <jeff@FreeBSD.org>

- Remove the blocked pointer from the umtx structure.
- Use a hash of umtx queues to queue blocked threads. We hash on pid and the
virtual address of the umtx structure. This eliminates cases where we
previously held a lock across a casuptr call.

Reviwed by: jhb (quickly)


# 0003d1b7 25-May-2003 Jeff Roberson <jeff@FreeBSD.org>

- Create a new lock, umtx_lock, for use instead of the proc lock for
protecting the umtx queues. We can't use the proc lock because we need
to hold the lock across calls to casuptr, which can fault.

Approved by: re


# cef57e76 02-Apr-2003 Jake Burkholder <jake@FreeBSD.org>

- Make casuptr return the old value of the location we're trying to update,
and change the umtx code to expect this.

Reviewed by: jeff


# 69404b50 31-Mar-2003 Jeff Roberson <jeff@FreeBSD.org>

- Add an api for doing smp safe locks in userland.
- umtx_lock() is defined as an inline in umtx.h. It tries to do an
uncontested acquire of a lock which falls back to the _umtx_lock()
system-call if that fails.
- umtx_unlock() is also an inline which falls back to _umtx_unlock() if the
uncontested unlock fails.
- Locks are keyed off of the thr_id_t of the currently running thread which
is currently just the pointer to the 'struct thread' in kernel.
- _umtx_lock() uses the proc pointer to synchronize access to blocked thread
queues which are stored in the first blocked thread.