History log of /openbsd-current/sys/sys/timeout.h
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.48 12-Oct-2023 cheloha

timeout: add TIMEOUT_MPSAFE flag

Add a TIMEOUT_MPSAFE flag to signal that a timeout is safe to run
without the kernel lock. Currently, TIMEOUT_MPSAFE requires
TIMEOUT_PROC. When the softclock() is unlocked in the future this
dependency will be removed.

On MULTIPROCESSOR kernels, softclock() now shunts TIMEOUT_MPSAFE
timeouts to a dedicated "timeout_proc_mp" bucket for processing by the
dedicated softclock_thread_mp() kthread. Unlike softclock_thread(),
softclock_thread_mp() is not pinned to any CPU and runs run at IPL_NONE.

Prompted by bluhm@. Lots of input from bluhm@. Joint work with mvs@.

Prompt: https://marc.info/?l=openbsd-tech&m=169646019109736&w=2
Thread: https://marc.info/?l=openbsd-tech&m=169652212131109&w=2

ok mvs@


Revision tags: OPENBSD_7_3_BASE OPENBSD_7_4_BASE
# 1.47 31-Dec-2022 cheloha

timeout: rename "timeout_at_ts" to "timeout_abs_ts"

I think "abs" ("absolute timeout") is a better mnemonic than
"at" ("at the given time").

The interface is undocumented and there are only two callers, so
renaming it is not a big deal.

probably ok kn@


# 1.46 11-Nov-2022 cheloha

timeout(9): remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK()

We have too many timeout(9) initialization functions and macros.
Let's slim it down and combine some interfaces.

- Remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK().
- Expand timeout_set_flags(), TIMEOUT_INITIALIZER_FLAGS() to accept
an additional "kclock" parameter.
- Reimplement timeout_set(), timeout_set_proc() with timeout_set_flags().
- Reimplement TIMEOUT_INITIALIZER() with TIMEOUT_INITIALIZER_FLAGS().
- Update the sole timeout_set_flags() user to pass a kclock parameter.
- Update the sole timeout_set_kclock() user to call timeout_set_flags().
- Update the sole TIMEOUT_INITIALIZER_FLAGS() user to provide a kclock
parameter.

The timeout(9) code is now a bit out of sync with the manpage. This
will be corrected in a subsequent commit.

ok kn@


# 1.45 09-Nov-2022 cheloha

timeout(9): remove TIMEOUT_KCLOCK flag

I never should have added the TIMEOUT_KCLOCK flag. It is redundant
and only serves to complicate the timeout(9) logic. In every place
where we check for the flag we can just use timeout.to_kclock.

So, remove the flag from <sys/timeout.h> and rewrite all affected
logic to use the value of timeout.to_kclock instead.

ok kn@


# 1.44 08-Nov-2022 cheloha

timeout(9): remove unused, undocumented timeout_in_nsec() interface

The kernel is not quite ready for timeout_in_nsec(). Remove it and
kclock_nanotime(). Both are unused.

Prompted by jsg@.

ok kn@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE OPENBSD_7_2_BASE
# 1.43 13-Jul-2021 mvs

Fix TIMEOUT_INITIALIZER_{FLAGS,KCLOCK}() macro.

ok cheloha@


# 1.42 19-Jun-2021 cheloha

timeout(9): change argument order for timeout_set_kclock()

Move the kclock argument before the flags argument. XORing a bunch of
flags together may "sprawl", and I'd rather have any sprawl at the end
of the parameter list.

timeout_set_kclock() is undocumented and there is only one caller, so
no big refactor required.

Best to do this argument order shuffle before any bigger refactors of
e.g. timeout_set(9).


# 1.41 29-May-2021 cheloha

timeout.h: remove API documentation comment

Details about using the timeout API can be found in the timeout.9
manpage. We don't need this comment.

ok mvs@


Revision tags: OPENBSD_6_9_BASE
# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.47 31-Dec-2022 cheloha

timeout: rename "timeout_at_ts" to "timeout_abs_ts"

I think "abs" ("absolute timeout") is a better mnemonic than
"at" ("at the given time").

The interface is undocumented and there are only two callers, so
renaming it is not a big deal.

probably ok kn@


# 1.46 11-Nov-2022 cheloha

timeout(9): remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK()

We have too many timeout(9) initialization functions and macros.
Let's slim it down and combine some interfaces.

- Remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK().
- Expand timeout_set_flags(), TIMEOUT_INITIALIZER_FLAGS() to accept
an additional "kclock" parameter.
- Reimplement timeout_set(), timeout_set_proc() with timeout_set_flags().
- Reimplement TIMEOUT_INITIALIZER() with TIMEOUT_INITIALIZER_FLAGS().
- Update the sole timeout_set_flags() user to pass a kclock parameter.
- Update the sole timeout_set_kclock() user to call timeout_set_flags().
- Update the sole TIMEOUT_INITIALIZER_FLAGS() user to provide a kclock
parameter.

The timeout(9) code is now a bit out of sync with the manpage. This
will be corrected in a subsequent commit.

ok kn@


# 1.45 09-Nov-2022 cheloha

timeout(9): remove TIMEOUT_KCLOCK flag

I never should have added the TIMEOUT_KCLOCK flag. It is redundant
and only serves to complicate the timeout(9) logic. In every place
where we check for the flag we can just use timeout.to_kclock.

So, remove the flag from <sys/timeout.h> and rewrite all affected
logic to use the value of timeout.to_kclock instead.

ok kn@


# 1.44 08-Nov-2022 cheloha

timeout(9): remove unused, undocumented timeout_in_nsec() interface

The kernel is not quite ready for timeout_in_nsec(). Remove it and
kclock_nanotime(). Both are unused.

Prompted by jsg@.

ok kn@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE OPENBSD_7_2_BASE
# 1.43 13-Jul-2021 mvs

Fix TIMEOUT_INITIALIZER_{FLAGS,KCLOCK}() macro.

ok cheloha@


# 1.42 19-Jun-2021 cheloha

timeout(9): change argument order for timeout_set_kclock()

Move the kclock argument before the flags argument. XORing a bunch of
flags together may "sprawl", and I'd rather have any sprawl at the end
of the parameter list.

timeout_set_kclock() is undocumented and there is only one caller, so
no big refactor required.

Best to do this argument order shuffle before any bigger refactors of
e.g. timeout_set(9).


# 1.41 29-May-2021 cheloha

timeout.h: remove API documentation comment

Details about using the timeout API can be found in the timeout.9
manpage. We don't need this comment.

ok mvs@


Revision tags: OPENBSD_6_9_BASE
# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.46 11-Nov-2022 cheloha

timeout(9): remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK()

We have too many timeout(9) initialization functions and macros.
Let's slim it down and combine some interfaces.

- Remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK().
- Expand timeout_set_flags(), TIMEOUT_INITIALIZER_FLAGS() to accept
an additional "kclock" parameter.
- Reimplement timeout_set(), timeout_set_proc() with timeout_set_flags().
- Reimplement TIMEOUT_INITIALIZER() with TIMEOUT_INITIALIZER_FLAGS().
- Update the sole timeout_set_flags() user to pass a kclock parameter.
- Update the sole timeout_set_kclock() user to call timeout_set_flags().
- Update the sole TIMEOUT_INITIALIZER_FLAGS() user to provide a kclock
parameter.

The timeout(9) code is now a bit out of sync with the manpage. This
will be corrected in a subsequent commit.

ok kn@


# 1.45 09-Nov-2022 cheloha

timeout(9): remove TIMEOUT_KCLOCK flag

I never should have added the TIMEOUT_KCLOCK flag. It is redundant
and only serves to complicate the timeout(9) logic. In every place
where we check for the flag we can just use timeout.to_kclock.

So, remove the flag from <sys/timeout.h> and rewrite all affected
logic to use the value of timeout.to_kclock instead.

ok kn@


# 1.44 08-Nov-2022 cheloha

timeout(9): remove unused, undocumented timeout_in_nsec() interface

The kernel is not quite ready for timeout_in_nsec(). Remove it and
kclock_nanotime(). Both are unused.

Prompted by jsg@.

ok kn@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE OPENBSD_7_2_BASE
# 1.43 13-Jul-2021 mvs

Fix TIMEOUT_INITIALIZER_{FLAGS,KCLOCK}() macro.

ok cheloha@


# 1.42 19-Jun-2021 cheloha

timeout(9): change argument order for timeout_set_kclock()

Move the kclock argument before the flags argument. XORing a bunch of
flags together may "sprawl", and I'd rather have any sprawl at the end
of the parameter list.

timeout_set_kclock() is undocumented and there is only one caller, so
no big refactor required.

Best to do this argument order shuffle before any bigger refactors of
e.g. timeout_set(9).


# 1.41 29-May-2021 cheloha

timeout.h: remove API documentation comment

Details about using the timeout API can be found in the timeout.9
manpage. We don't need this comment.

ok mvs@


Revision tags: OPENBSD_6_9_BASE
# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.45 09-Nov-2022 cheloha

timeout(9): remove TIMEOUT_KCLOCK flag

I never should have added the TIMEOUT_KCLOCK flag. It is redundant
and only serves to complicate the timeout(9) logic. In every place
where we check for the flag we can just use timeout.to_kclock.

So, remove the flag from <sys/timeout.h> and rewrite all affected
logic to use the value of timeout.to_kclock instead.

ok kn@


# 1.44 08-Nov-2022 cheloha

timeout(9): remove unused, undocumented timeout_in_nsec() interface

The kernel is not quite ready for timeout_in_nsec(). Remove it and
kclock_nanotime(). Both are unused.

Prompted by jsg@.

ok kn@


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE OPENBSD_7_2_BASE
# 1.43 13-Jul-2021 mvs

Fix TIMEOUT_INITIALIZER_{FLAGS,KCLOCK}() macro.

ok cheloha@


# 1.42 19-Jun-2021 cheloha

timeout(9): change argument order for timeout_set_kclock()

Move the kclock argument before the flags argument. XORing a bunch of
flags together may "sprawl", and I'd rather have any sprawl at the end
of the parameter list.

timeout_set_kclock() is undocumented and there is only one caller, so
no big refactor required.

Best to do this argument order shuffle before any bigger refactors of
e.g. timeout_set(9).


# 1.41 29-May-2021 cheloha

timeout.h: remove API documentation comment

Details about using the timeout API can be found in the timeout.9
manpage. We don't need this comment.

ok mvs@


Revision tags: OPENBSD_6_9_BASE
# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.43 13-Jul-2021 mvs

Fix TIMEOUT_INITIALIZER_{FLAGS,KCLOCK}() macro.

ok cheloha@


# 1.42 19-Jun-2021 cheloha

timeout(9): change argument order for timeout_set_kclock()

Move the kclock argument before the flags argument. XORing a bunch of
flags together may "sprawl", and I'd rather have any sprawl at the end
of the parameter list.

timeout_set_kclock() is undocumented and there is only one caller, so
no big refactor required.

Best to do this argument order shuffle before any bigger refactors of
e.g. timeout_set(9).


# 1.41 29-May-2021 cheloha

timeout.h: remove API documentation comment

Details about using the timeout API can be found in the timeout.9
manpage. We don't need this comment.

ok mvs@


Revision tags: OPENBSD_6_9_BASE
# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.42 19-Jun-2021 cheloha

timeout(9): change argument order for timeout_set_kclock()

Move the kclock argument before the flags argument. XORing a bunch of
flags together may "sprawl", and I'd rather have any sprawl at the end
of the parameter list.

timeout_set_kclock() is undocumented and there is only one caller, so
no big refactor required.

Best to do this argument order shuffle before any bigger refactors of
e.g. timeout_set(9).


# 1.41 29-May-2021 cheloha

timeout.h: remove API documentation comment

Details about using the timeout API can be found in the timeout.9
manpage. We don't need this comment.

ok mvs@


Revision tags: OPENBSD_6_9_BASE
# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.41 29-May-2021 cheloha

timeout.h: remove API documentation comment

Details about using the timeout API can be found in the timeout.9
manpage. We don't need this comment.

ok mvs@


Revision tags: OPENBSD_6_9_BASE
# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.40 15-Oct-2020 cheloha

timeout(9): basic support for kclock timeouts

A kclock timeout is a timeout that expires at an absolute time on one
of the kernel's clocks. A timeout's absolute expiration time is kept
in a new member of the timeout struct, to_abstime. The timeout's
kclock is set at initialization and is kept in another new member of
the timeout struct, to_kclock.

Kclock timeouts are desireable because they have nanosecond
resolution, regardless of the value of hz(9). The timecounter
subsystem is also inherently NTP-sensitive, so timeouts scheduled
against the subsystem are NTP-sensitive. These two qualities
guarantee that a kclock timeout will never expire early.

Currently there is support for one kclock, KCLOCK_UPTIME (the uptime
clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC
(the UTC clock) is planned for the future.

Support for these additional kclocks will allow us to implement some
of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and
timer_create(). We could also use it to provide proper absolute
timeouts for e.g. pthread_mutex_timedlock(3).

Kclock timeouts are initialized with timeout_set_kclock(). They can
be scheduled with either timeout_in_nsec() (relative timeout) or
timeout_at_ts() (absolute timeout). They are incompatible with
timeout_add(9), timeout_add_sec(9), timeout_add_msec(9),
timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9).
They can be cancelled with timeout_del(9) or timeout_del_barrier(9).

Documentation for the new interfaces is a work in progress.

For now, tick-based timeouts remain supported alongside kclock
timeouts. They will remain supported until we are certain we don't
need them anymore. It is possible we will never remove them. I would
rather not keep them around forever, but I cannot predict what
difficulties we will encounter while converting tick-based timeouts to
kclock timeouts. There are a *lot* of timeouts in the kernel.

Kclock timeouts are more costly than tick-based timeouts:

- Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading
the hardware timecounter is too expensive in some contexts, so care
must be taken when converting existing timeouts.

We may add a flag in the future to cause timeout_in_nsec() to use
getnanouptime(9) instead of nanouptime(9), which is much cheaper.
This may be appropriate for certain classes of timeouts. tcp/ip
session timeouts come to mind.

- Kclock timeout expirations are kept in a timespec. Timespec
arithmetic has more overhead than 32-bit tick arithmetic, so
processing kclock timeouts during softclock() is more expensive.
On my machine the overhead for processing a tick-based timeout is
~125 cycles. The overhead for a kclock timeout is ~500 cycles.

The overhead difference on 32-bit platforms is unknown. If it
proves too large we may need to use a 64-bit value to store the
expiration time. More measurement is needed.

Priority targets for conversion are setitimer(2), *sleep_nsec(9), and
the kevent(2) EVFILT_TIMER timers. Others will follow.

With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@,
deraadt@, probably many others. Older version tested by visa@.
Problems found in older version by bluhm@. Current version tested by
Yuichiro Naito.

"wait until after unlock" deraadt@, ok kettenis@


Revision tags: OPENBSD_6_8_BASE
# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.39 07-Aug-2020 cheloha

timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)

These two interfaces have been entirely unused since introduction.
Remove them and thin the "timeout" namespace a bit.

Discussed with mpi@ and ratchov@ almost a year ago, though I blocked
the change at that time. Also discussed with visa@.

ok visa@, mpi@


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.38 01-Aug-2020 anton

Add support for remote coverage to kcov. Remote coverage is collected
from threads other than the one currently having kcov enabled. A thread
with kcov enabled occasionally delegates work to another thread,
collecting coverage from such threads improves the ability of syzkaller
to correlate side effects in the kernel caused by issuing a syscall.

Remote coverage is divided into subsystems. The only supported subsystem
right now collects coverage from scheduled tasks and timeouts on behalf
of a kcov enabled thread. In order to make this work `struct task' and
`struct timeout' must be extended with a new field keeping track of the
process that scheduled the task/timeout. Both aforementioned structures
have therefore increased with the size of a pointer on all
architectures.

The kernel API is documented in a new kcov_remote_register(9) manual.

Remote coverage is also supported by kcov on NetBSD and Linux.

ok mpi@


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.37 25-Jul-2020 cheloha

timeout(9): remove TIMEOUT_SCHEDULED flag

The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate
between wheel timeouts and new timeouts during softclock(). The
distinction is useful when incrementing the "rescheduled" stat and the
"late" stat.

Now that we have an intermediate queue for new timeouts, timeout_new,
we don't need the flag. The distinction between wheel timeouts and
new timeouts can be made computationally.

Suggested by procter@ several months ago.


Revision tags: OPENBSD_6_7_BASE
# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.36 03-Jan-2020 cheloha

timeout(9): Add timeout_set_flags(9) and TIMEOUT_INITIALIZER_FLAGS(9)

These allow the caller to initialize timeouts with arbitrary flags. We
only have one flag at the moment, TIMEOUT_PROC, but experimenting with
other flags is easier if these interfaces are available in-tree.

With input from bluhm@, guenther@, and visa@.

"makes sense to me" bluhm@, ok visa@


# 1.35 03-Jan-2020 cheloha

timeout(9): Rename the TIMEOUT_NEEDPROCCTX flag to TIMEOUT_PROC.

This makes it more likely to fit into 80 columns if used alongside
the forthcoming timeout_set_flags() and TIMEOUT_INITIALIZER_FLAGS()
interfaces.

"makes sense to me" bluhm@, ok visa@


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.34 25-Dec-2019 cheloha

TIMEOUT_INITIALIZER(9): C99 initializers


# 1.33 25-Dec-2019 cheloha

timeout(9): new flag: TIMEOUT_SCHEDULED, new statistic: tos_scheduled

This flag is set whenever a timeout is put on the wheel and cleared upon
(a) running, (b) deletion, and (c) readdition. It serves two purposes:

1. Facilitate distinguishing scheduled and rescheduled timeouts. When a
timeout is put on the wheel it is "scheduled" for a later softclock().
If this happens two or more times it is also said to be "rescheduled".
The tos_rescheduled value thus indicates how many distant timeouts
have been cascaded into a lower wheel level.

2. Eliminate false late timeouts. A timeout is not late if it is due
before softclock() has had a chance to schedule it. To track this we
need additional state, hence a new flag.

rprocter@ raises some interesting questions. Some answers:

- This interface is not stable and name changes are possible at a
later date.

- Although rescheduling timeouts is a side effect of the underlying
implementation, I don't forsee us using anything but a timeout wheel
in the future. Other data structures are too slow in practice, so
I doubt that the concept of a rescheduled timeout will be irrelevant
any time soon.

- I think the development utility of gathering these sorts of statistics
is high. Watching the distribution of timeouts under a given workflow
is informative.

ok visa@


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.32 02-Dec-2019 cheloha

Revert "timeout(9): switch to tickless backend"

It appears to have caused major performance regressions all over the
network stack.

Reported by bluhm@

ok deraadt@


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.31 26-Nov-2019 cheloha

timeout(9): switch to tickless backend

Rebase the timeout wheel on the system uptime clock. Timeouts are now
set to run at or after an absolute time as returned by nanouptime(9).
Timeouts are thus "tickless": they expire at a real time on that clock
instead of at a particular value of the global "ticks" variable.

To facilitate this change the timeout struct's .to_time member becomes a
timespec. Hashing timeouts into a bucket on the wheel changes slightly:
we build a 32-bit hash with 25 bits of seconds (.tv_sec) and 7 bits of
subseconds (.tv_nsec). 7 bits of subseconds means the width of the
lowest wheel level is now 2 seconds on all platforms and each bucket in
that lowest level corresponds to 1/128 seconds on the uptime clock.
These values were chosen to closely align with the current 100hz
hardclock(9) typical on almost all of our platforms. At 100hz a bucket
is currently ~1/100 seconds wide on the lowest level and the lowest
level itself is ~2.56 seconds wide. Not a huge change, but a change
nonetheless.

Because a bucket no longer corresponds to a single tick more than one
bucket may be dumped during an average timeout_hardclock_update() call.
On 100hz platforms you now dump ~2 buckets. On 64hz machines (sh) you
dump ~4 buckets. On 1024hz machines (alpha) you dump only 1 bucket,
but you are doing extra work in softclock() to reschedule timeouts
that aren't due yet.

To avoid changing current behavior all timeout_add*(9) interfaces
convert their timeout interval into ticks, compute an equivalent
timespec interval, and then add that interval to the timestamp of
the most recent timeout_hardclock_update() call to determine an
absolute deadline. So all current timeouts still "use" ticks,
but the ticks are faked in the timeout layer.

A new interface, timeout_at_ts(9), is introduced here to bypass this
backwardly compatible behavior. It will be used in subsequent diffs
to add absolute timeout support for userland and to clean up some of
the messier parts of kernel timekeeping, especially at the syscall
layer.

Because timeouts are based against the uptime clock they are subject to
NTP adjustment via adjtime(2) and adjfreq(2). Unless you have a crazy
adjfreq(2) adjustment set this will not change the expiration behavior
of your timeouts.

Tons of design feedback from mpi@, visa@, guenther@, and kettenis@.
Additional amd64 testing from anton@ and visa@. Octeon testing from visa@.
macppc testing from me.

Positive feedback from deraadt@, ok visa@


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.30 02-Nov-2019 cheloha

softclock: move softintr registration/scheduling into timeout module

softclock() is scheduled from hardclock(9) because long ago callouts were
processed from hardclock(9) directly. The introduction of timeout(9) circa
2000 moved all callout processing into a dedicated module, but the softclock
scheduling stayed behind in hardclock(9).

We can move all the softclock() "stuff" into the timeout module to make
kern_clock.c a bit cleaner. Neither initclocks() nor hardclock(9) need
to "know" about softclock(). The initial softclock() softintr registration
can be done from timeout_proc_init() and softclock() can be scheduled
from timeout_hardclock_update().

ok visa@


Revision tags: OPENBSD_6_6_BASE
# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.29 12-Jul-2019 cheloha

sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.

With these totals one can track the throughput of the timeout(9) layer
from userspace.

With input from mpi@.

ok mpi@


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.28 14-Apr-2019 visa

Add lock order checking for timeouts

The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.

This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.

In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).

OK dlg@ mpi@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.


# 1.27 24-Nov-2017 dlg

add timeout_barrier, which is like intr_barrier and taskq_barrier.

if you're trying to free something that a timeout is using, you
have to wait for that timeout to finish running before doing the
free. timeout_del can stop a timeout from running in the future,
but it doesn't know if a timeout has finished being scheduled and
is now running.

previously you could know that timeouts are not running by simply
masking softclock interrupts on the cpu running the kernel. however,
code is now running outside the kernel lock, and timeouts can run
in a thread instead of softclock.

timeout_barrier solves the first problem by taking the kernel lock
and then masking softclock interrupts. that is enough to ensure
that any further timeout processing is waiting for those resources
to run again.

the second problem is solved by having timeout_barrier insert work
into the thread. when that work runs, that means all previous work
running in that thread has completed.

fixes and ok visa@, who thinks this will be useful for his work
too.


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.26 22-Sep-2016 mpi

Introduce a new 'softclock' thread that will be used to execute timeout
callbacks needing a process context.

The function timeout_set_proc(9) has to be used instead of timeout_set(9)
when a timeout callback needs a process context.

Note that if such a timeout is waiting, understand sleeping, for a non
negligible amount of time it might delay other timeouts needing a process
context.

dlg@ agrees with this as a temporary solution.

Manpage tweaks from jmc@

ok kettenis@, bluhm@, mikeb@


Revision tags: OPENBSD_5_7_BASE OPENBSD_5_8_BASE OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.25 22-Dec-2014 dlg

add TIMEOUT_INITIALIZER for initting timeout declaractions.

similar to TASK_INITIALIZER and all the queue _INITIALIZER things.

ok deraadt@


Revision tags: OPENBSD_5_5_BASE OPENBSD_5_6_BASE
# 1.24 27-Nov-2013 dlg

make timeout_add and its wrappers return whether the timeout was scheduled
in this call by returning 1, or a previous call by returning 0. this makes
it easy to refcount the stuff we're scheduling a timeout for, and brings
the api in line with what task_add(9) provides.

ok mpi@ matthew@ mikeb@ guenther@


# 1.23 23-Oct-2013 deraadt

need a forward declaration of bintime for the _KERNEL case, ie. trpt
ok guenther


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.22 24-May-2012 guenther

On resume, run forward the monotonic and realtimes clocks instead of jumping
just the realtime clock, triggering and adjusting timeouts to reflect that.

ok matthew@ deraadt@


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.21 10-May-2011 dlg

tweak timeout_del so it can tell the caller if it actually did remove a
timeout or not.

without this it is impossible to tell if the timeout was removed
or if it is just about to run. if the caller of timeout_del is about
to free some state the timeout itself might use, this could lead
to a use after free.

now if timeout_del returns 1, you know the timeout wont fire and
you can proceed with cleanup. how you cope with the timeout being
about to fire is up to the caller of timeout_del.

discussed with drinking art and art, and most of k2k11
ok miod@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.20 26-May-2010 deraadt

libevent has named two of it's new macros by the same name as our kernel
macros, which are visible, and get pulled into some source code... Hide
the kernel ones inside _KERNEL, and make trpt (the only userland viewer of
them) define _KERNEL temporarily. This is really gross. libevent is doing
a poor job of choosing function names!
ok tedu guenther


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE
# 1.19 02-Jun-2009 guenther

Constipate the second argument to timeout_add_*(). Also, use
nitems() in two places instead of coding the array size and fix a
spot of whitespace.

ok miod@ blambert@


Revision tags: OPENBSD_4_5_BASE
# 1.18 22-Oct-2008 blambert

Add timeout_add_msec(), for timeouts in milliseconds.

Idea and original patch mk@

ok mk@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.17 11-Jul-2008 blambert

Add timeout_add_{tv,ts,bt,sec,usec,nsec} so that we can add timeouts
in something other than clock ticks. From art@'s punchlist and (for
the time being) not yet used.

"you're doing it wrong" art@,ray@,otto@,tedu@

ok art@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE SMP_SYNC_A SMP_SYNC_B
# 1.16 03-Jun-2003 art

license cleaning.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.15 25-Jun-2002 mickey

still export the macros, some userland uses it


# 1.14 25-Jun-2002 mickey

protos and macros are only for _KERNEL, malliciously pollutes the user name space otherwise


Revision tags: OPENBSD_3_1_BASE
# 1.13 14-Mar-2002 millert

First round of __P removal in sys


# 1.12 22-Dec-2001 nordin

New scalable implementation with constant time add and delete. ok deraadt@


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.11 12-Sep-2001 art

branches: 1.11.4;
Rename timeout_init to timeout_startup to deconfuse a bit.


# 1.10 23-Aug-2001 art

Remove more.


# 1.9 23-Aug-2001 art

Remove even more old timeout tentacles.


# 1.8 23-Aug-2001 miod

Remove the old timeout legacy code.


Revision tags: OPENBSD_2_9_BASE
# 1.7 15-Mar-2001 csapuntz

Triggered mechanism allows a handler to figure out whether a given
timeout is actually executing.


Revision tags: OPENBSD_2_7_BASE OPENBSD_2_8_BASE
# 1.6 23-Mar-2000 art

branches: 1.6.2;
Another typo. Noted by aaron.


# 1.5 23-Mar-2000 art

Opps. Fix a comment from "should" to "should not".
Thanks to mickey@ for pointing this out.


# 1.4 23-Mar-2000 art

Protect from multiple include.


# 1.3 23-Mar-2000 art

Speling.


# 1.2 23-Mar-2000 art

Provide methods to check if a timeout was initalized and if it is scheduled.


# 1.1 23-Mar-2000 art

New API for timeouts. Replaces the old timeout()/untimeout() API and
makes it the callers responsibility to allocate resources for the
timeouts.

This is a KISS implementation and does _not_ solve the problems of slow
handling of a large number of pending timeouts (this will be solved in
future work) (although hardclock is now guarateed to take constant time
for handling of timeouts).

Old timeout() and untimeout() are implemented as wrappers around the new
API and kept for compatibility. They will be removed as soon as all
subsystems are converted to use the new API.