History log of /freebsd-current/sys/kern/kern_tc.c
Revision Date Author Comments
# af93fea7 23-Aug-2023 Jake Freeland <jfree@freebsd.org>

timerfd: Move implementation from linux compat to sys/kern

Move the timerfd impelemntation from linux compat code to sys/kern. Use
it to implement the new system calls for timerfd. Add a hook to kern_tc
to allow timerfd to know when the system time has stepped. Add kqueue
support to timerfd. Adjust a few names to be less Linux centric.

RelNotes: YES
Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack)
Differential Revision: https://reviews.freebsd.org/D38459


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 28ed159f 27-Feb-2023 Sebastian Huber <sebastian.huber@embedded-brains.de>

pps: Round to closest integer in pps_event()

The comment above bintime2timespec() says:

When converting between timestamps on parallel timescales of differing
resolutions it is historical and scientific practice to round down.

However, the delta_nsec value is a time difference and not a timestamp. Also
the rounding errors accumulate in the frequency accumulator, see hardpps().
So, rounding to the closest integer is probably slightly better.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/604


# 1e48d9d3 27-Feb-2023 Sebastian Huber <sebastian.huber@embedded-brains.de>

pps: Simplify the nsec calculation in pps_event()

Let A be the current calculation of the frequency accumulator (pps_fcount)
update in pps_event()

scale = (uint64_t)1 << 63;
scale /= captc->tc_frequency;
scale *= 2;
bt.sec = 0;
bt.frac = 0;
bintime_addx(&bt, scale * tcount);
bintime2timespec(&bt, &ts);
hardpps(tsp, ts.tv_nsec + 1000000000 * ts.tv_sec);

and hardpps(..., delta_nsec):

u_nsec = delta_nsec;
if (u_nsec > (NANOSECOND >> 1))
u_nsec -= NANOSECOND;
else if (u_nsec < -(NANOSECOND >> 1))
u_nsec += NANOSECOND;
pps_fcount += u_nsec;

This change introduces a new calculation which is slightly simpler and more
straight forward. Name it B.

Consider the following sample values with a tcount of 2000000100 and a
tc_frequency of 2000000000 (2GHz).

For A, the scale is 9223372036. Then scale * tcount is 18446744994337203600
which is larger than UINT64_MAX (= 18446744073709551615). The result is
920627651984 == 18446744994337203600 % UINT64_MAX. Since all operands are
unsigned the result is well defined through modulo arithmetic. The result of
bintime2timespec(&bt, &ts) is 49. This is equal to the correct result
1000000049 % NANOSECOND.

In hardpps(), both conditional statements are not executed and pps_fcount is
incremented by 49.

For the new calculation B, we have 1000000000 * tcount is 2000000100000000000
which is less than UINT64_MAX. This yields after the division with tc_frequency
the correct result of 1000000050 for delta_nsec.

In hardpps(), the first conditional statement is executed and pps_fcount is
incremented by 50.

This shows that both methods yield roughly the same results. However, method B
is easier to understand and requires fewer conditional statements.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/604


# 8a142484 27-Feb-2023 Sebastian Huber <sebastian.huber@embedded-brains.de>

pps: Directly assign the timestamps in pps_event()

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/604


# 0448501f 27-Feb-2023 Sebastian Huber <sebastian.huber@embedded-brains.de>

pps: Move pcount assignment in pps_event()

Move the pseq increment. This makes it possible to reuse registers earlier.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/604


# fd88f4e1 27-Feb-2023 Sebastian Huber <sebastian.huber@embedded-brains.de>

pps: Simplify capture and event processing

Use local variables for the captured timehand and timecounter in pps_event().
This fixes a potential issue in the nsec preparation for hardpps(). Here the
timecounter was accessed through the captured timehand after the generation was
checked.

Make a snapshot of the relevent timehand values early in pps_event(). Check
the timehand generation only once during the capture and event processing. Use
atomic_thread_fence_acq() similar to the other readers.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/604


# cb2a028b 27-Feb-2023 Sebastian Huber <sebastian.huber@embedded-brains.de>

pps: Load timecounter once in pps_capture()

This ensures that the timecounter and the tc_get_timecount handler belong
together.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/604


# 8701571d 21-Jun-2022 Mitchell Horne <mhorne@FreeBSD.org>

set_cputicker: use a bool

The third argument to this function indicates whether the supplied
ticker is fixed or variable, i.e. requiring calibration. Give this
argument a type and name that better conveys this purpose.

Reviewed by: kib, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35459


# bb53dd56 21-Mar-2022 firk <firk@cantconnect.ru>

kern_tc.c/cputick2usec() (which is used to calculate cputime from
cpu ticks) has some imprecision and, worse, huge timestep (about
20 minutes on 4GHz CPU) near 53.4 days of elapsed time.

kern_time.c/cputick2timespec() (it is used for clock_gettime() for
querying process or thread consumed cpu time) Uses cputick2usec()
and then needlessly converting usec to nsec, obviously losing
precision even with fixed cputick2usec().

kern_time.c/kern_clock_getres() uses some weird (anyway wrong)
formula for getting cputick resolution.

PR: 262215
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D34558


# 3d9d64aa 30-Nov-2021 Andriy Gapon <avg@FreeBSD.org>

kern_tc: unify timecounter to bintime delta conversion

There are two places where we convert from a timecounter delta to
a bintime delta: tc_windup and bintime_off.
Both functions use the same calculations when the timecounter delta is
small. But for a large delta (greater than approximately an equivalent
of 1 second) the calculations were different. Both functions use
approximate calculations based on th_scale that avoid division. Both
produce values slightly greater than a true value, calculated with
division by tc_frequency, would be. tc_windup is slightly more
accurate, so its result is closer to the true value and, thus, smaller
than bintime_off result.

As a consequence there can be a jump back in time when time hands are
switched after a long period of time (a large delta). Just before the
switch the time would be calculated with a large delta from
th_offset_count in bintime_off. tc_windup does the switch using its own
calculations of a new th_offset using the large delta. As explained
earlier, the new th_offset may end up being less than the previously
produced binuptime. So, for a period of time new binuptime values may
be "back in time" comparing to values just before the switch.

Such a jump must never happen. All the code assumes that the uptime is
monotonically nondecreasing and some code works incorrectly when that
assumption is broken. For example, we have observed sleepq_timeout()
ignoring a timeout when the sbinuptime value obtained by the callout
code was greater than the expiration value, but the sbinuptime obtained
in sleepq_timeout() was less than it. In that case the target thread
would never get woken up.

The unified calculations should ensure the monotonic property of the
uptime.

The problem is quite rare as normally tc_windup should be called HZ
times per second (typically 1000 or 100). But it may happen in VMs on
very busy hypervisors where a VM's virtual CPU may not get an execution
time slot for a second or more.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Panzura LLC


# 33399501 19-Nov-2021 Mark Johnston <markj@FreeBSD.org>

timecounter: Initialize tc_lock earlier

Hyper-V wants to register its MSR-based timecounter during
SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be
available for DELAY(). So we cannot use MTX_SYSINIT to initialize the
timecounter lock.

PR: 259878
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014


# 26f76aea 29-Oct-2021 Mark Johnston <markj@FreeBSD.org>

timecounter: Load the currently selected tc once in tc_windup()

Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32729


# ae750fba 28-Oct-2021 Sebastian Huber <sebastian.huber@embedded-brains.de>

kern_tc.c: Scaling/large delta recalculation

This change is a slight performance optimization for systems with a slow
64-bit division.

The th->th_scale and th->th_large_delta values only depend on the
timecounter frequency and the th->th_adjustment. The timecounter
frequency of a timehand only changes when a new timecounter is activated
for the timehand. The th->th_adjustment is only changed by the NTP
second update. The NTP second update is not done for every call of
tc_windup().

Move the code block to recalculate the scaling factor and
the large delta of a timehand to the new helper function
recalculate_scaling_factor_and_large_delta().

Call recalculate_scaling_factor_and_large_delta() when a new timecounter
is activated and a NTP second update occurred.

MFC after: 1 week


# 621fd9dc 16-Oct-2021 Mark Johnston <markj@FreeBSD.org>

timecounter: Lock the timecounter list

Timecounter registration is dynamic, i.e., there is no requirement that
timecounters must be registered during single-threaded boot. Loadable
drivers may in principle register timecounters (which can be switched to
automatically). Timecounters cannot be unregistered, though this could
be implemented.

Registered timecounters belong to a global linked list. Add a mutex to
synchronize insertions and the traversals done by (mpsafe) sysctl
handlers. No functional change intended.

Reviewed by: imp, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32511


# fa9da1f5 08-Oct-2021 Mark Johnston <markj@FreeBSD.org>

timecounter: Let kern.timecounter.stepwarnings be set as a tunable

MFC after: 1 week


# 9feff969 08-Aug-2021 Ed Maste <emaste@FreeBSD.org>

Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights

These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).

Sponsored by: The FreeBSD Foundation


# a512d0ab 05-May-2021 Warner Losh <imp@FreeBSD.org>

kern: clarify boot time

In FreeBSD, the current time is computed from uptime + boottime. Uptime
is a continuous, smooth function that's monotonically increasing. To
effect changes to the current time, boottime is adjusted. boottime is
mutable and shouldn't be cached against future need. Document the
current implementation, with the caveat that we may stop stepping
boottime on resume in the future and will step uptime instead (noted in
the commit message, but not in the code).

Sponsored by: Netflix
Reviewed by: phk, rpokala
Differential Revision: https://reviews.freebsd.org/D30116


# 56b9bee6 07-Mar-2021 Konstantin Belousov <kib@FreeBSD.org>

Make kern.timecounter.hardware tunable

Noted and reviewed by: kevans
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D29122


# 36bcc44e 18-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

Add ddb 'show timecounter' command.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# 30b68ecd 09-Jan-2021 Robert Watson <rwatson@FreeBSD.org>

Changes that improve DTrace FBT reliability on freebsd/arm64:

- Implement a dtrace_getnanouptime(), matching the existing
dtrace_getnanotime(), to avoid DTrace calling out to a potentially
instrumentable function.

(These should probably both be under KDTRACE_HOOKS. Also, it's not clear
to me that they are correct implementations for the DTrace thread time
functions they are used in .. fixes for another commit.)

- Don't allow FBT to instrument functions involved in EL1 exception handling
that are involved in FBT trap processing: handle_el1h_sync() and
do_el1h_sync().

- Don't allow FBT to instrument DDB and KDB functions, as that makes it
rather harder to debug FBT problems.

Prior to these changes, use of FBT on FreeBSD/arm64 rapidly led to kernel
panics due to recursion in DTrace.

Reliable FBT on FreeBSD/arm64 is reliant on another change from @andrew to
have the aarch64 instrumentor more carefully check that instructions it
replaces are against the stack pointer, which can otherwise lead to memory
corruption. That change remains under review.

MFC after: 2 weeks
Reviewed by: andrew, kp, markj (earlier version), jrtc27 (earlier version)
Differential revision: https://reviews.freebsd.org/D27766


# 4149c6a3 10-Jun-2020 Konstantin Belousov <kib@FreeBSD.org>

Remove double-calls to tc_get_timecount() to warm timecounters.

It seems that second call does not add any useful state change for all
implemented timecounters.

Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 6cf2362e 14-Feb-2020 Konstantin Belousov <kib@FreeBSD.org>

Consolidate read code for timecounters and fix possible overflow in
bintime()/binuptime().

The algorithm to read the consistent snapshot of current timehand is
repeated in each accessor, including the details proper rollup
detection and synchronization with the writer. In fact there are only
two different kind of readers: one for bintime()/binuptime() which has
to do the in-place calculation, and another kind which fetches some
member from struct timehand.

Extract the logic into type-checked macros, GETTHBINTIME() for bintime
calculation, and GETTHMEMBER() for safe read of a structure' member.
This way, the synchronization is only written in bintime_off() and
getthmember().

In bintime_off(), use overflow-safe calculation of th_scale *
delta(timecounter). In tc_windup, pre-calculate the min delta value
which overflows and require slow algorithm, into the new timehands
th_large_delta member.

This part with overflow fix was written by Bruce Evans.

Reported by: Mark Millard <marklmi@yahoo.com> (the overflow issue)
Tested by: pho
Discussed with: emaste
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 3 weeks


# 3ff65f71 30-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicated empty lines from kern/*.c

No functional changes.


# 6c46ce7e 08-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

Initialize timehands linkage much earlier.

Reported and tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 4b23dec4 09-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

Make timehands count selectable at boottime.

Tested by: O'Connor, Daniel <darius@dons.net.au>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21563


# 7045ac43 12-Jan-2019 Olivier Houchard <cognet@FreeBSD.org>

Instead of using an incomplete list of platforms that uses 64bits time_t
in 32bits mode, special case amd64, as i386 is the only arch that still
uses 32bits time_t.


# 6040822c 30-Jul-2018 Alan Somers <asomers@FreeBSD.org>

Make timespecadd(3) and friends public

The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725


# 2bf95012 05-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Create a new macro for static DPCPU data.

On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by: bz, emaste, markj
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D16140


# 5ec2c936 04-May-2018 Mateusz Guzik <mjg@FreeBSD.org>

tc: bcopy -> memcpy


# 6469bdcd 06-Apr-2018 Brooks Davis <brooks@FreeBSD.org>

Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941


# 6f697994 19-Dec-2017 Konstantin Belousov <kib@FreeBSD.org>

Use atomic_load(9) to read ppsinfo sequence numbers.

In this case volatile qualifiers enusre that a compiler does not
optimize the accesses out.

Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13534


# 64de3fdd 30-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

SPDX: use the Beerware identifier.


# 70e3b262 11-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

The th_bintime, th_microtime and th_nanotime members of the timehand
all cache the last system time (uptime + boottime). Only the format
differs. Do not re-calculate the bintime and simply use the value
used to calculate the microtime and nanotime.

Group all the updates under the relevant comment. Remove obsoleted
XXX part.

Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 1 week


# 8addc72b 14-Mar-2017 Eric van Gyzen <vangyzen@FreeBSD.org>

Add missing pieces of r315280

I moved this branch from github to a private server, and pulled from the
wrong one when committing r315280, so I failed to include two recent commits.
Thankfully, they were only cosmetic and were included in the review.
Specifically:

Add documentation, polish comments, and improve style(9).

Tested by: pho (r315280)
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9791


# 9dbdf2a1 14-Mar-2017 Eric van Gyzen <vangyzen@FreeBSD.org>

When the RTC is adjusted, reevaluate absolute sleep times based on the RTC

POSIX 2008 says this about clock_settime(2):

If the value of the CLOCK_REALTIME clock is set via clock_settime(),
the new value of the clock shall be used to determine the time
of expiration for absolute time services based upon the
CLOCK_REALTIME clock. This applies to the time at which armed
absolute timers expire. If the absolute time requested at the
invocation of such a time service is before the new value of
the clock, the time service shall expire immediately as if the
clock had reached the requested time normally.

Setting the value of the CLOCK_REALTIME clock via clock_settime()
shall have no effect on threads that are blocked waiting for
a relative time service based upon this clock, including the
nanosleep() function; nor on the expiration of relative timers
based upon this clock. Consequently, these time services shall
expire when the requested relative interval elapses, independently
of the new or old value of the clock.

When the real-time clock is adjusted, such as by clock_settime(3),
wake any threads sleeping until an absolute real-clock time.
Such a sleep is indicated by a non-zero td_rtcgen. The sleep functions
will set that field to zero and return zero to tell the caller
to reevaluate its sleep duration based on the new value of the clock.

At present, this affects the following functions:

pthread_cond_timedwait(3)
pthread_mutex_timedlock(3)
pthread_rwlock_timedrdlock(3)
pthread_rwlock_timedwrlock(3)
sem_timedwait(3)
sem_clockwait_np(3)

I'm working on adding clock_nanosleep(2), which will also be affected.

Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed by: jhb, kib
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9791


# fd0f5970 13-Dec-2016 Ed Schouten <ed@FreeBSD.org>

Add labels to sysctls related to clocks.

Sysctls like kern.eventtimer.et.*.quality currently embed the name of
the clock device. This is problematic for the Prometheus metrics
exporter for two reasons:

- Some of those clocks have dashes in their names, which Prometheus
doesn't allow to be used in metric names.
- It doesn't allow for extracting the same property of all clocks on the
system from within a single query.

Attach these nodes to have a label, so that the Prometheus metrics
exporter gives these metric a uniform name with the name of the clock
attached as a label.

Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775


# 16808549 17-Aug-2016 Konstantin Belousov <kib@FreeBSD.org>

Implement userspace gettimeofday(2) with HPET timecounter.

Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.

Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.

Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.

Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.

Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473


# 50c22263 30-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Cache getbintime(9) answer in timehands, similarly to getnanotime(9)
and getmicrotime(9).

Suggested and reviewed by: bde (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month


# 5760b029 27-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Prevent parallel tc_windup() calls, both parallel top-level calls from
setclock() and from simultaneous top-level and interrupt. For this,
tc_windup() is protected with a tc_setclock_mtx spinlock, in the try
mode when called from hardclock interrupt. If spinlock cannot be
obtained without spinning from the interrupt context, this means that
top-level executes tc_windup() on other core and our try may be
avoided.

The boottimebin and boottime variables should be adjusted from
tc_windup(). To be correct, they must be part of the timehands and
read using lockless protocol. Remove the globals and reimplement the
getboottime(9)/getboottimebin(9) KPI using the timehands read
protocol.

Tested by: pho (as part of the whole patch)
Reviewed by: jhb (same)
Discussed wit: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302


# 4d29106e 27-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Style.

Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302


# a83c016f 27-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Reduce number of timehands to just two. This is useful because
consumers can now be only one tc_windup() call late.

Use C99 initialization.

Tested by: pho (as part of the whole patch)
Reviewed by: jhb (same)
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302


# 584b675e 27-Jul-2016 Konstantin Belousov <kib@FreeBSD.org>

Hide the boottime and bootimebin globals, provide the getboottime(9)
and getboottimebin(9) KPI. Change consumers of boottime to use the
KPI. The variables were renamed to avoid shadowing issues with local
variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure. As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance. Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by: pho (as part of the bigger patch)
Reviewed by: jhb (same)
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302


# e3043798 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: spelling fixes in comments.

No functional change.


# aaca7045 01-Nov-2015 Enji Cooper <ngie@FreeBSD.org>

Define `fhard` in pps_event(..) only when PPS_SYNC is defined to mute
an -Wunused-but-set-variable warning

Reported by: FreeBSD_HEAD_amd64_gcc4.9 jenkins job
Sponsored by: EMC / Isilon Storage Division


# b2557db6 25-Sep-2015 Konstantin Belousov <kib@FreeBSD.org>

Use per-cpu values for base and last in tc_cpu_ticks(). The values
are updated lockess, different CPUs write its own view of timecounter
state. The critical section is done for safety, callers of
tc_cpu_ticks() are supposed to already enter critical section, or to
own a spinlock.

The change fixes sporadical reports of too high values reported for
the (W)CPU on platforms that do not provide cpu ticker and use
tc_cpu_ticks(), in particular, arm*.

Diagnosed and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e8bac3f2 12-Aug-2015 Ian Lepore <ian@FreeBSD.org>

If a specific timecounter has been chosen via sysctl, and a new timecounter
with higher quality registers (presumably in a module that has just been
loaded), do not undo the user's choice by switching to the new timecounter.

Document that behavior, and also the fact that there is no way to unregister
a timecounter (and thus no way to unload a module containing one).


# 721b5817 07-Aug-2015 Ian Lepore <ian@FreeBSD.org>

Only process the PPS event types currently enabled in pps_params.mode.

This makes the PPS API behave correctly, but isn't ideal -- we still end
up capturing PPS data for non-enabled edges, we just don't process the
data into an event that becomes visible outside of kern_tc. That's because
the event type isn't passed to pps_capture(), so it can't do the filtering.
Any solution for capture filtering is going to require touching every driver.


# 6f7a9f7c 07-Aug-2015 Ian Lepore <ian@FreeBSD.org>

RFC 2783 requires a status of ETIMEDOUT, not EWOULDBLOCK, on a timeout.


# f4b5a972 08-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

Reimplement the ordering requirements for the timehands updates, and
for timehands consumers, by using fences.

Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*]. tc_setget() allowed data update
writes to become visible before generation (but not on TSO
architectures).

Remove tc_setgen(), tc_getgen() helpers, use atomics inline [**].

Noted by: alc [*]
Requested by: bde [**]
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


# 529c9788 10-Jun-2015 Konstantin Belousov <kib@FreeBSD.org>

Tweaks for r284178:

Do not include machine/atomic.h explicitely, the header is already included
by sys/systm.h.

Force inlining of tc_getgen() and tc_setgen(). The functions are used
more than once, which causes compilers with non-aggressive inlining
policies to generate calls.

Suggested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2c6946dc 09-Jun-2015 Konstantin Belousov <kib@FreeBSD.org>

When updating/accessing the timehands, barriers are needed to ensure
that:
- th_generation update is visible after the parameters update is
visible;
- the read of parameters is not reordered before initial read of
th_generation.

On UP kernels, compiler barriers are enough. For SMP machines, CPU
barriers must be used too, as was confirmed by submitter by testing on
the Freescale T4240 platform with 24 PowerPC processors.

Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 1 week


# 28315e27 04-May-2015 Ian Lepore <ian@FreeBSD.org>

Implement a mechanism for making changes in the kernel<->driver PPS
interface without breaking ABI or API compatibility with existing drivers.

The existing data structures used to communicate between the kernel and
driver portions of PPS processing contain no spare/padding fields and no
flags field or other straightforward mechanism for communicating changes
in the structures or behaviors of the code. This makes it difficult to
MFC new features added to the PPS facility. ABI compatibility is
important; out-of-tree drivers in module form are known to exist. (Note
that the existing api_version field in the pps_params structure must
contain the value mandated by RFC 2783 and any RFCs that come along after.)

These changes introduce a pair of abi-version fields which are filled in
by the driver and the kernel respectively to indicate the interface
version. The driver sets its version field before calling the new
pps_init_abi() function. That lets the kernel know how much of the
pps_state structure is understood by the driver and it can avoid using
newer fields at the end of the structure that it knows about if the driver
is a lower version. The kernel fills in its version field during the init
call, letting the driver know what features and data the kernel supports.

To implement the new version information in a way that is backwards
compatible with code from before these changes, the high bit of the
lightly-used 'kcmode' field is repurposed as a flag bit that indicates the
driver is aware of the abi versioning scheme. Basically if this bit is
clear that indicates a "version 0" driver and if it is set the driver_abi
field indicates the version.

These changes also move the recently-added 'mtx' field of pps_state from
the middle to the end of the structure, and make the kernel code that uses
this field conditional on the driver being abi version 1 or higher. It
changes the only driver currently supplying the mtx field, usb_serial, to
use pps_init_abi().

Reviewed by: hselasky@


# 91d9eda2 14-Mar-2015 Ian Lepore <ian@FreeBSD.org>

Use sbuf_printf() for sysctl strings instead of stack buffers and snprintf().


# 35ee8a4a 07-Mar-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Add mutex support to the pps_ioctl() API in the kernel.
Bump kernel version to reflect structure change.

PR: 196897
MFC after: 1 week


# d1b1b600 19-Jan-2015 Neel Natu <neel@FreeBSD.org>

Update the vdso timehands only via tc_windup().

Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter
hardware was changed via 'sysctl kern.timecounter.hardware'. This happened
because the vdso timehands update was missing the special treatment in
tc_windup() when changing timecounters.

Reviewed by: kib


# 92597e06 05-Jan-2015 John Baldwin <jhb@FreeBSD.org>

On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST). Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.

PR: 192316
Differential Revision: https://reviews.freebsd.org/D1441
No objection from: jkim
MFC after: 1 week


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 5b999a6b 04-Mar-2013 Davide Italiano <davide@FreeBSD.org>

- Make callout(9) tickless, relying on eventtimers(4) as backend for
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.

This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.

Together with: mav
Reviewed by: attilio, bde, luigi, phk
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm),
markj (amd64), mav, Fabian Keil


# a1137de9 15-Feb-2013 Ian Lepore <ian@FreeBSD.org>

Add PPS_CANWAIT support for time_pps_fetch(). This adds support for all three
blocking modes described in section 3.4.3 of RFC 2783, allowing the caller
to retrieve the most recent values without blocking, to block for a specified
time, or to block forever.

Reviewed by: discussion on hackers@


# a8df530d 28-Jan-2013 John Baldwin <jhb@FreeBSD.org>

Mark 'ticks', 'time_second', and 'time_uptime' as volatile to prevent the
compiler from caching their values in tight loops.

Reviewed by: bde
MFC after: 1 week


# 57d025c3 16-Jul-2012 George V. Neville-Neil <gnn@FreeBSD.org>

Add support for walltimestamp in DTrace.

Submitted by: Fabian Keil
MFC after: 2 weeks


# 21c295ef 23-Jun-2012 Konstantin Belousov <kib@FreeBSD.org>

Stop updating the struct vdso_timehands from even handler executed in
the scheduled task from tc_windup(). Do it directly from tc_windup in
interrupt context [1].

Establish the permanent mapping of the shared page into the kernel
address space, avoiding the potential need to sleep waiting for
allocation of sf buffer during vdso_timehands update. As a
consequence, shared_page_write_start() and shared_page_write_end()
functions are not needed anymore.

Guess and memorize the pointers to native host and compat32 sysentvec
during initialization, to avoid the need to get shared_page_alloc_sx
lock during the update.

In tc_fill_vdso_timehands(), do not loop waiting for timehands
generation to stabilize, since vdso_timehands is written in the same
interrupt context which wrote timehands.

Requested by: mav [1]
MFC after: 29 days


# aea81038 22-Jun-2012 Konstantin Belousov <kib@FreeBSD.org>

Implement mechanism to export some kernel timekeeping data to
usermode, using shared page. The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with: bde
Reviewed by: jhb
Tested by: flo
MFC after: 1 month


# 9624d947 03-Mar-2012 Juli Mallett <jmallett@FreeBSD.org>

o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI. This mostly follows nwhitehorn's lead in implementing
COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit
ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
with 32-bit compatibility, then, disable some code which assumes otherwise
wrongly when built for MIPS. A more general macro to check in this case would
seem like a good idea eventually. If someone adds support for using n32
userland with n64 kernels on MIPS, then they will have to add a variety of
flags related to each piece of the ABI that can vary. That's probably the
right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
freebsd32 compat code. Probably this should be generalized at some point.

Reviewed by: gonzo


# de02885a 09-Feb-2012 Kevin Lo <kevlo@FreeBSD.org>

Add a missing break. This bug was introduced in r228856.


# 6cedd609 23-Dec-2011 Lawrence Stewart <lstewart@FreeBSD.org>

Introduce the sysclock_getsnapshot() and sysclock_snap2bintime() KPIs. The
sysclock_getsnapshot() function allows the caller to obtain a snapshot of all
the system clock and timecounter state required to create time stamps at a later
point. The sysclock_snap2bintime() function converts a previously obtained
snapshot into a bintime time stamp according to the specified flags e.g. which
system clock, uptime vs absolute time, etc.

These KPIs enable useful functionality, including direct comparison of the
feedback and feed-forward system clocks and generation of multiple time stamps
with different formats from a single timecounter read.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

In collaboration with: Julien Ridoux (jridoux at unimelb edu au)


# 88394fe4 29-Nov-2011 Lawrence Stewart <lstewart@FreeBSD.org>

Do away with the somewhat clunky sysclock_ops structure and associated code,
reimplementing the [get]{bin,nano,micro}[up]time() wrapper functions in terms of
the new "fromclock" API instead.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Discussed with: Julien Ridoux (jridoux at unimelb edu au)
Submitted by: Julien Ridoux (jridoux at unimelb edu au)


# e977bac3 28-Nov-2011 Lawrence Stewart <lstewart@FreeBSD.org>

Make the fbclock_[get]{bin,nano,micro}[up]time() function prototypes public so
that new APIs with some performance sensitivity can be built on top of them.
These functions should not be called directly except in special circumstances.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Discussed with: Julien Ridoux (jridoux at unimelb edu au)
Submitted by: Julien Ridoux (jridoux at unimelb edu au)


# c2a4ee99 28-Nov-2011 Lawrence Stewart <lstewart@FreeBSD.org>

Fix an oversight in r227747 by calling fbclock_bin{up}time() directly from the
fbclock_{nanouptime|microuptime|bintime|nanotime|microtime}() functions to avoid
indirecting through a sysclock_ops wrapper function.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


# 65e359a1 21-Nov-2011 Lawrence Stewart <lstewart@FreeBSD.org>

- Add Pulse-Per-Second timestamping using raw ffcounter and corresponding
ffclock time in seconds.

- Add IOCTL to retrieve ffclock timestamps from userland.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


# 9bce0f05 19-Nov-2011 Lawrence Stewart <lstewart@FreeBSD.org>

- Provide a sysctl interface to change the active system clock at runtime.

- Wrap [get]{bin,nano,micro}[up]time() functions of sys/time.h to allow
requesting time from either the feedback or the feed-forward clock. If a
feedback (e.g. ntpd) and feed-forward (e.g. radclock) daemon are both running
on the system, both kernel clocks are updated but only one serves time.

- Add similar wrappers for the feed-forward difference clock.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


# b0fdc837 19-Nov-2011 Lawrence Stewart <lstewart@FreeBSD.org>

Core structure and functions to support a feed-forward clock within the kernel.
Implement ffcounter, a monotonically increasing cumulative counter on top of the
active timecounter. Provide low-level functions to read the ffcounter and
convert it to absolute time or a time interval in seconds using the current
ffclock estimates, which track the drift of the oscillator. Add a ring of
fftimehands to track passing of time on each kernel tick and pick up updates of
ffclock estimates.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 08e1b4f4 14-Jul-2011 Jung-uk Kim <jkim@FreeBSD.org>

If TSC stops ticking in C3, disable deep sleep when the user forcefully
select TSC as timecounter hardware.

Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de)


# cbc134ad 19-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Introduce signed and unsigned version of CTLTYPE_QUAD, renaming
existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().


# 772d1e42 22-Nov-2010 Colin Percival <cperciva@FreeBSD.org>

Add parentheses for clarity. The parentheses around the two terms of the &&
are unnecessary but I'm leaving them in for the sake of avoiding confusion
(I confuse easily).

Submitted by: bde


# aa519c0a 22-Nov-2010 Colin Percival <cperciva@FreeBSD.org>

In tc_windup, handle the case where the previous call to tc_windup was
more than 1s earlier. Prior to this commit, the computation of
th_scale * delta (which produces a 64-bit value equal to the time since
the last tc_windup call in units of 2^(-64) seconds) would overflow and
any complete seconds would be lost.

We fix this by repeatedly converting tc_frequency units of timecounter
to one seconds; this is not exactly correct, since it loses the NTP
adjustment, but if we find ourselves going more than 1s at a time between
clock interrupts, losing a few seconds worth of NTP adjustments is the
least of our problems...


# 8d065a39 14-Nov-2010 Rebecca Cran <brucec@FreeBSD.org>

Fix some more style(9) issues.


# b389be97 14-Nov-2010 Rebecca Cran <brucec@FreeBSD.org>

Fix style(9) issues from r215281 and r215282.

MFC after: 1 week


# 2baa5cdd 13-Nov-2010 Rebecca Cran <brucec@FreeBSD.org>

Add some descriptions to sys/kern sysctls.

PR: kern/148710
Tested by: Chip Camden <sterling at camdensoftware.com>
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 95d23438 21-Sep-2010 Alexander Motin <mav@FreeBSD.org>

Until hardclock() and respectively tc_windup() called first time, system
is running on "dummy" time counter. But to function properly in one-shot
mode, event timer management code requires working time counter. Slow
moving "dummy" time counter delays first hardclock() call by few seconds
on my systems, even though timer interrupts were correctly kicking kernel.
That causes few seconds delay during boot with one-shot mode enabled.

To break this loop, explicitly call tc_windup() first time during
initialization process to let it switch to some real time counter.


# 0e189873 14-Sep-2010 Alexander Motin <mav@FreeBSD.org>

Make kern_tc.c provide minimum frequency of tc_ticktock() calls, required
to handle current timecounter wraps. Make kern_clocksource.c to honor that
requirement, scheduling sleeps on first CPU for no more then specified
period. Allow other CPUs to sleep up to 1/4 second (for any case).


# a157e425 13-Sep-2010 Alexander Motin <mav@FreeBSD.org>

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


# 1a996ed1 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by: cperciva@


# 805cc58a 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.


# 3bc5958c 11-Jul-2010 Alexander Motin <mav@FreeBSD.org>

Remove interval validation from cpu_tick_calibrate(). As I found, check
was needed at preliminary version of the patch, where number of CPU ticks
was divided strictly on 16 seconds. Final code instead uses real interval
duration, so precise interval should not be important. Same time aliasing
issues around second boundary causes false positives, periodically logging
useless "t_delta ... too long/short" messages when HZ set below 256.


# 60ae52f7 21-Jun-2010 Ed Schouten <ed@FreeBSD.org>

Use ISO C99 integer types in sys/kern where possible.

There are only about 100 occurences of the BSD-specific u_int*_t
datatypes in sys/kern. The ISO C99 integer types are used here more
often.


# 547d94bd 15-Jun-2010 Jung-uk Kim <jkim@FreeBSD.org>

Implement flexible BPF timestamping framework.

- Allow setting format, resolution and accuracy of BPF time stamps per
listener. Previously, we were only able to use microtime(9). Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command. Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts. bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp. The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries. On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long. On 32-bit platforms, the size may
increase by 8 bytes. For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested. However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by: net@


# 89f28b1b 11-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Remove conditionally compiled time counter statistics; tools like
DTrace, kernel profiling, etc, can provide this information without
the overhead.

MFC after: 3 days
Suggested by: bde


# 83160d14 08-Mar-2009 Robert Watson <rwatson@FreeBSD.org>

By default, don't compile in counters of calls to various time
query functions in the kernel, as these effectively serialize
parallel calls to the gettimeofday(2) system call, as well as
other kernel services that use timestamps.

Use the NetBSD version of the fix (kern_tc.c:1.32 by ad@) as
they have picked up our timecounter code and also ran into the
same problem.

Reported by: kris
Obtained from: NetBSD
MFC after: 3 days


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 237fdd78 16-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# 6b4d690c 16-Feb-2008 Warner Losh <imp@FreeBSD.org>

Fix typo in comment.


# bedff79a 02-Jan-2008 David E. O'Brien <obrien@FreeBSD.org>

Note what is too {short,long}.


# 041b706b 04-Jun-2007 David Malone <dwmalone@FreeBSD.org>

Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.


# 776fc0e9 04-Aug-2006 Yaroslav Tykhiy <ytykhiy@gmail.com>

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


# 93ef14a7 16-Jun-2006 David Malone <dwmalone@FreeBSD.org>

Add a kern.timecounter.tc sysctl tree that contains the mask,
frequency, quality and current value of each available time counter.

At the moment all of these are read-only, but it might make sense to
make some of these read-write in the future.

MFC after: 3 months


# 59048707 15-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Disable the "cputick increased..." message now that the dust has settled.


# fef527ee 09-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Oops, forgot newline.


# 6cda760f 09-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

silence cpu_tick calibration and notice only (under bootverbose)
when the frequency increases.


# 88ca07e7 07-Mar-2006 John Baldwin <jhb@FreeBSD.org>

Style nit.


# fccfcfba 03-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Add missing cast.


# 5b51d1de 03-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

More detailed logging if timestepwarnings are enabled.


# 301af28a 02-Mar-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Suffer a little bit of math every 16 second and tighten calibration of
cpu_ticks to the low side of PPM.


# e8444a7e 11-Feb-2006 Poul-Henning Kamp <phk@FreeBSD.org>

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


# 5b1a8eb3 07-Feb-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


# e452573d 19-Sep-2005 Andre Oppermann <andre@FreeBSD.org>

Start time_uptime with 1 instead of 0.

Discussed with: phk


# 5b1c0294 07-Sep-2005 David E. O'Brien <obrien@FreeBSD.org>

Forward declaring static variables as extern is invalid ISO-C. Now that
GCC can properly handle forward static declarations, do this properly.


# f8385624 26-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

s/ENOTTY/ENOIOCTL/


# a7bc3102 11-Oct-2004 Peter Wemm <peter@FreeBSD.org>

Put on my peril sensitive sunglasses and add a flags field to the internal
sysctl routines and state. Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.

I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.

With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort. Things like netstat, ps, etc have a long way to go.

This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.


# d8e8b675 14-Aug-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add some KASSERTS.


# 1e0e79c9 04-Mar-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Just because the timecounter reads the same value on two samples
after each other doesn't mean that nothing happened.


# ee57aeea 22-Jan-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Write 100 times for tomorrow:
"Always print time_t as %jd, you never know what width it has"


# 4e74721c 21-Jan-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add a sysctl (default: off) which enables a log(LOG_INFO...) warning
if the clock is stepped.


# 555a5de2 13-Nov-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Various minor details:
Give the HZ/overflow check a 10% margin.
Eliminate bogus newline.
If timecounters have equal quality, prefer higher frequency.

Some inspiration from: bde


# c679c734 03-Sep-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Use the quality to disable timecounters for which we deem Hz too low.


# c1cccd1e 20-Aug-2003 Warner Losh <imp@FreeBSD.org>

bde made a number of suggested improvements to the code. This commit
represents the pruely stylistic changes and should have no net impact
on the rest of the code.

bde's more substantive changes will follow in a separate commit once
we've come to closure on them.

Submitted by: bde


# 45cc9f5f 19-Aug-2003 Warner Losh <imp@FreeBSD.org>

Fix an extreme edge case in leap second handling. We need to call
ntp_update_second twice when we have a large step in case that step
goes across a scheduled leap second. The only way this could happen
would be if we didn't call tc_windup over the end of day on the day of
a leap second, which would only happen if timeouts were delayed for
seconds. While it is an edge case, it is an important one to get
right for my employer.

Sponsored by: Timing Solutions Corporation


# 78a49a45 16-Aug-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Give timecounters a numeric quality field.

A timecounter will be selected when registered if its quality is
not negative and no less than the current timecounters.

Add a sysctl to report all available timecounters and their qualities.

Give the dummy timecounter a solid negative quality of minus a million.

Give the i8254 zero and the ACPI 1000.

The TSC gets 800, unless APM or SMP forces it negative.

Other timecounters default to zero quality and thereby retain current
selection behaviour.


# affd4332 12-Aug-2003 Maxime Henrion <mux@FreeBSD.org>

Remove extra space.


# d94e3652 02-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

typo fix in comment.


# 4f2073fb 25-Jun-2003 Warner Losh <imp@FreeBSD.org>

Fix leap second processing by the kernel time keeping routines.
Before, we would add/subtract the leap second when the system had been
up for an even multiple of days, rather than at the end of the day, as
a leap second is defined (at least wrt ntp). We do this by
calculating the notion of UTC earlier in the loop, and passing that to
get it adjusted. Any adjustments that ntp_update_second makes to this
time are then transferred to boot time. We can't pass it either the
boot time or the uptime because their sum is what determines when a
leap second is needed. This code adds an extra assignment and two
extra compare in the typical case, which is as cheap as I could made
it.

I have confirmed with this code the kernel time does the correct thing
for both positive and negative leap seconds. Since the ntp interface
doesn't allow for +2 or -2, those cases can't be tested (and the folks
in the know here say there will never be a +2s or -2s leap event, but
rather two +1s or -1s leap events).

There will very likely be no leap seconds for a while, given how the
earth is speeding up and slowing down, so there will be plenty of time
for this fix to propigate. UT1-UTC is currently at "about -0.4s" and
decrementing by .1s every 8 months or so. 6 * 8 is 48 months, or 4
years.

-stable has different code, but a similar bug that was introduced
about the time of the last leap second, which is why nobody has
noticed until now.

MFC After: 3 weeks
Reviewed by: phk

"Furthermore, leap seconds must die." -- Cato the Elder


# 4e82e5f6 23-Jun-2003 Warner Losh <imp@FreeBSD.org>

Use UTC rather than GMT to describe time scale. latter is obsolete.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# b4b138c2 18-Mar-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


# 60ca3996 29-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Move timecounters notion of frequency to 64 bits.

[WARNING: CPUs in the distant future may be closer than they appear!]


# 4394f476 25-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add sysctl kern.timecounter.nsetclock which indicates the number of
potential discontinuities in our UTC timescale.

Applications can monitor this variable if they want to be informed
about steps in the timescale. Slews (ntp and adjtime(2)) and
frequency adjustments (ntp) will not increment this counter, only
operations which set the clock. No attempt is made to classify
size or direction of the step.


# ce9fac00 16-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Move a local variable to avoid the compiler warning about it being unused.


# b1e7e201 16-Jan-2003 John Hay <jhay@FreeBSD.org>

hardpps() wants the raw hardware counter value converted to nanoseconds.


# ff292556 05-Jan-2003 Peter Wemm <peter@FreeBSD.org>

Explicitly have the timecounter init happen after the cpu_initclocks is
called. Otherwise (depending on a non-deterministic sort), the timecounter
code can be initialized before the clock rate has been set (on ia64) and it
assumes hz = 100, rather than the real value of 1024. I'm not sure how much
gets upset by this.

Glanced at by: phk


# b3ed130c 04-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Export tc_tick with sysctl, not tick.

Spotted by: bde


# 38b0884c 01-Nov-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce a "time_uptime" global variable which holds the time since boot
in seconds.


# e80fb434 17-Oct-2002 Robert Drehmel <robert@FreeBSD.org>

Use strlcpy() instead of strncpy() to copy NUL terminated strings
for safety and consistency.


# e46eeb89 04-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Do not employ timecounter hardware if our hz does not support their
correct rewinding.


# e7fa55af 04-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Give up on calling tc_ticktock() from a timeout, we have timeout
functions which run for several milliseconds at a time and getting
in queue behind one or more of those makes us miss our rewind.

Instead call it from hardclock() like we used to do, but retain the
prescaler so we still cope with high HZ values.


# 4f8cb019 15-Jul-2002 Mark Murray <markm@FreeBSD.org>

Use a semicolon at the end of a function-like macro invocation. Kills
warnings and makes the visual style easier.


# e3f0c575 11-Jun-2002 Kelly Yancey <kbyanc@FreeBSD.org>

Time counter stats are unsigned, advertise them to sysctl(8) that way.

PR: (one small part of) 19720
Approved by: phk


# eef633a7 30-May-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Mistyped and lost a '&' in previous commit.


# fe712246 30-May-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Don't forget to factor in the boottime when we calculate PPS timestamps.

Submitted by: Akira Watanabe <akira@myaw.ei.meisei-u.ac.jp>


# 48e5da55 03-May-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Initialize time_second to 1 instead of zero to pacify slightly bogus arp code.

Various minor style fixes from BDE.


# aed05564 30-Apr-2002 Peter Wemm <peter@FreeBSD.org>

kern_tc.c doesn't use <machine/psl.h>, and having this #include breaks
other platforms.


# 39acc78a 30-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Brucifixion ? Yes, out that door, row on the left, one patch each.

Many thanks to: bde


# 6b00cf46 28-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Stylistic sweep through the timecounter code.
Renovate comments.


# d25917e8 28-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Don't screw up our uptime with historical dates.


# f5d157fb 27-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Explain magic number.

Add magic date no explanation.

Add a delta which was lost in transit yesterday which prevented
other timecounters from actually being used.


# f175569a 27-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make the dummy timecounter actually tick or we will never get anyhere.


# 62efba6a 26-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Now that the private parts of timecounters are no longer being fingered
by other bits of code, split struct timecounter into two.

struct timecounter contains just the bits which pertains to the hardware
counter and the reading of it.

struct timehands (as in "the hands on a clock") contains all the ugly bit
fidling stuff. Statically compile ten timehands.

This commit is the functional part. A later cosmetic patch will rename
various variables and fieldnames.


# b4a1d0de 26-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Hide the private parts of timecounter from a couple of places that don't
really need to know the gory details.


# 7bf758bf 26-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Simplify the RFC2783 and PPS_SYNC timestamp collection API.


# 9e1b5510 25-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Move the winding of timecounters out of hardclock and into a normal
timeout loop.

Limit the rate at which we wind the timecounters to approx 1000 Hz.

This limits the precision of the get{bin,nano,micro}[up]time(9)
functions to roughly a millisecond.


# 056abcab 26-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Various cleanup and sorting of clock reading functions. Add the two
functions missing in the complete 12 function complement.


# 656d3e04 26-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Rename tco_setscales() and tco_delta() to use the same tc_ prefix as
the rest of this file.


# 7e2d76ff 26-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the tc_update() function. Any frequency change to the
timecounter will be used starting at the next second, which is
good enough for sysctl purposes. If better adjustment is needed
the NTP PLL should be used.


# e1d970f1 14-Apr-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Improve the implementation of adjtime(2).

Apply the change as a continuous slew rather than as a series of
discrete steps and make it possible to adjust arbitraryly huge
amounts of time in either direction.

In practice this is done by hooking into the same once-per-second
loop as the NTP PLL and setting a suitable frequency offset deducting
the amount slewed from the remainder. If the remaining delta is
larger than 1 second we slew at 5000PPM (5msec/sec), for a delta
less than a second we slew at 500PPM (500usec/sec) and for the last
one second period we will slew at whatever rate (less than 500PPM)
it takes to eliminate the delta entirely.

The old implementation stepped the clock a number of microseconds
every HZ to acheive the same effect, using the same rates of change.

Eliminate the global variables tickadj, tickdelta and timedelta and
their various use and initializations.

This removes the most significant obstacle to running timecounter and
NTP housekeeping from a timeout rather than hardclock.


# 45609bea 28-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Get the magnitude of the NTP adjustment right.


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 1634e908 26-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused variable.


# 5b7d8efa 24-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Add a generation number to timecounters and spin if it changes under
our feet when we look inside timecounter structures.

Make the "sync_other" code more robust by never overwriting the
tc_next field.

Add counters for the bin[up]time functions.

Call tc_windup() in tc_init() and switch_timecounter() to make sure
we all the fields set right.


# 4e2befc0 21-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Use better scaling factor for NTPs correction.
Explain the magic.


# 2028c0cd 07-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Revise timercounters to use binary fixed point format internally.

The binary format "bintime" is a 32.64 format, it will go to 64.64
when time_t does.

The bintime format is available to consumers of time in the kernel,
and is preferable where timeintervals needs to be accumulated.

This change simplifies much of the magic math inside the timecounters
and improves the frequency and time precision by a couple of bits.

I have not been able to measure a performance difference which was not
a tiny fraction of the standard deviation on the measurements.


# a3058964 05-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Let the number of timecounters follow hz, otherwise people with
HZ=BIGNUM will strain the assumptions behind timecounters to the
point where they break.

This may or may not help people seeing microuptime() backwards messages.

Make the global timecounter variable volatile, it makes no difference in
the code GCC generates, but it makes represents the intent correctly.

Thanks to: jdp
MFC after: 2 weeks


# 05a2f798 25-Jan-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Be more conservative about interrupt latency, it aint getting better it seems.


# 8eb6e436 01-Jan-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a bogus #ifdef KTR stanza.

Noticed by: Alexander Langer <alex@big.endian.de>


# 0384fff8 06-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 77978ab8 04-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


# 82d9ae4e 03-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


# 91266b96 20-Mar-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Isolate the Timecounter internals in their own two files.

Make the public interface more systematically named.

Remove the alternate method, it doesn't do any good, only ruins performance.

Add counters to profile the usage of the 8 access functions.

Apply the beer-ware to my code.

The weird +/- counts are caused by two repocopies behind the scenes:
kern/kern_clock.c -> kern/kern_tc.c
sys/time.h -> sys/timetc.h
(thanks peter!)


# f1220da4 13-Feb-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Fix sign reversal in adjtime(2).

Approved by: jkh


# 2ac0d0a1 08-Dec-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Make adjtime(2) adjust boottime so it doesn't cause non-monotonous
uptime.


# 71a62f8a 27-Nov-1999 Bruce Evans <bde@FreeBSD.org>

Fixed some comments in statclock(). The previous commit made it clearer
that one comment was attached to null code.


# 8a9d4d98 26-Nov-1999 Bruce Evans <bde@FreeBSD.org>

Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by: dufault


# de3f8889 10-Oct-1999 Peter Wemm <peter@FreeBSD.org>

#ifdef PPS_SYNC around "kapi" declaration to fix a -Wunused warning.


# b7424f2d 09-Oct-1999 John Hay <jhay@FreeBSD.org>

Update the PPSAPI to draft-mogul-pps-api-05.txt which is the latest.

NOTE: This will break building ntpd until ntpd has been upgraded to also
support draft 05. People that want to build ntpd in the meantime can
get patches from me.


# 37d38777 13-Sep-1999 Bruce Evans <bde@FreeBSD.org>

Moved the definition of `boottime' and its sysctl to the correct file.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# a1a10fdf 24-Jul-1999 Bruce Evans <bde@FreeBSD.org>

Oops, the previous commit only worked in the one case it was tested for.


# 6b6ef746 18-Jul-1999 Bruce Evans <bde@FreeBSD.org>

Added a sysctl "kern.timecounter.hardware" for selecting the hardware
used for timecounting. The possible values are the names of the
physically present harware timecounters ("i8254" and "TSC" on i386's).

Fixed some nearby bitrot in comments in <sys/time.h>.

Reviewed by: phk


# 7ac9503b 17-Jul-1999 John Polstra <jdp@FreeBSD.org>

Remove four no-op casts.


# 0bb2226a 25-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Make the machdep.i8254_freq and machdep.tsc_freq sysctls modify the
timecounter as well

Asked for by: bde, jhay


# c4a6db71 02-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Don't open window for race condition.

Detected by: Reg Clemens <reg@dwf.com>


# 30f27235 12-Mar-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Fix an old cut&paste bogon.

Noticed by: bde


# 37d39f0a 12-Mar-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Remove duplicate include.

Noticed by: bde


# 32c20357 11-Mar-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Make even more of the PPSAPI implementations generic.

FLL support in hardpps()

Various magic shuffles and improved comments

Style fixes from Bruce.


# c68996e2 07-Mar-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Integrate the new "nanokernel" PLL from Dave Mills.

This code is backwards compatible with the older "microkernel" PLL, but
allows ntpd v4 to use nanosecond resolution. Many other improvements.

PPS_SYNC and hardpps() are NOT supported yet.


# 1c6d46f9 19-Feb-1999 Luoqi Chen <luoqi@FreeBSD.org>

Introduce machine-dependent macro pgtok() to convert page count to number
of kilobytes. Its definition for each architecture could be optimized to
avoid potential numerical overflows.


# b1028ad1 19-Feb-1999 Luoqi Chen <luoqi@FreeBSD.org>

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


# 510eb5b9 29-Nov-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Make the previous behaviour the default, add a sysctl which you
can set if your hw/sw produces the "calcru negative..." message.

Setting the alternate method (sysctl -w kern.timecounter.method=1)
makes the the get{nano|micro}*() functions call the real thing at
resulting in a measurable but minor overhead.

I decided to NOT have the "calcru" change the method automatically
because you should be aware of this problem if you have it.

The problems currently seen, related to usleep and a few other corners
are fixed for both methods.


# c2906d55 23-Nov-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Make timecounters more resistant to badly behaved SW/HW which locks
out interrupts for too long. If you still see the "calcru: negative
time..." message you can increase NTIMECOUNTER (see LINT).

Sideeffect is that a timecounter is required to not wrap around in
less than (1 + delta) seconds instead of the (1/hz + delta) required
until now.

Many thanks to: msmith, wpaul, wosch & bde


# 8843cc35 23-Nov-1998 Søren Schmidt <sos@FreeBSD.org>

Add a kludge to prevent panicing when using VM86 and hitting here
with a NULL curproc.

Originally by: Tor Egge (IIRC)


# fffd686a 25-Oct-1998 Bruce Evans <bde@FreeBSD.org>

Fixed breakage of the GPROF case of statclock() in the previous commit.


# f5ef029e 25-Oct-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.


# 3bac064f 23-Oct-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Change the way we simulate stable storage for timecounters.

If you have problems with the "calcru" messages and processes being
killed for excessive cpu time, try to increase the NTIMECOUNTER
#define and report your findings.


# d6116663 06-Oct-1998 Alexander Langer <alex@FreeBSD.org>

Cast the return value of tvtohz() from a long to an int to satisfy the
compiler that we know what we're doing (the value returned has already
been restricted to int ranges).

Reviewed by: bde


# 7ea97031 15-Sep-1998 Justin T. Gibbs <gibbs@FreeBSD.org>

kern_clock.c:
Remove old disk statistics variables.

vfs_bio.c:
Enable bowrite now that B_ORDERED works for all buffer devices.


# 63606283 05-Aug-1998 Bruce Evans <bde@FreeBSD.org>

Removed unused function hzto().


# ac1e407b 11-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Fixed printf format errors.


# 52f8e5d6 04-Jul-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Hmm, braino in last commit.


# 0edd53d2 04-Jul-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Change the sign on a race-condition, so that instead of ending up several
tens of milliseconds out in the future we end up the right place with
a subweeniesecond error.


# 6ca4ca24 02-Jul-1998 Poul-Henning Kamp <phk@FreeBSD.org>

When we transfer time from one timecounter to the next, use nanouptime(),
not nanotime(); Otherwise we end up in 2026...

Fix the arg to dummy_get_timecount()


# a58f0f8e 09-Jun-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Add a tc_ prefix to struct timecounter members.

Urged by: bde


# 48115288 07-Jun-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Add a member function more to the timecounters, this one is for use
with latch based PPS implementations. The client that uses it will
be committed after more testing.


# dbb34755 07-Jun-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Add a "this" style argument and a "void *private" so timecounters can
figure out which instance to wount with.


# e796e00d 28-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


# 579f4456 19-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Change a data type internal to the timecounters, and remove the "delta"
function.

Reviewed, but not entirely approved by: bde


# c21410e1 17-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


# 5f88ec36 08-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Minor adjustments to the timecounting and proc0.

Mostly Submitted by: bde


# 4cf41af3 06-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by: bde


# bfe6c9fa 05-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Make the dummy timecounter run at 1 MHz rather than 100kHz (noticed by bde)
fix the itimer(REAL) handling.


# 91ad39c6 04-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Handle double fraction overflow in nano & microtime functions (spotted by Bruce)
Use tvtohz() a place where it fits.


# 00af9731 04-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


# 460608e7 31-Mar-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Fix an off by 1<<32 error.


# 75da0aa2 31-Mar-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Add a dummy timecounter until we find the real thing(s).


# 227ee8a1 30-Mar-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


# a0502b19 26-Mar-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Add two new functions, get{micro|nano}time.

They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.


# b05dcf3c 16-Mar-1998 Poul-Henning Kamp <phk@FreeBSD.org>

A bunch of BNN (Bruce Normal Nits) from bde:
Bring back the softclock inlining
save a couple of <<32's
many white-space shuffles.


# 7ec73f64 20-Feb-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Replace TOD clock code with more systematic approach.

Highlights:
* Simple model for underlying hardware.
* Hardware basis for timekeeping can be changed on the fly.
* Only one hardware clock responsible for TOD keeping.
* Provides a real nanotime() function.
* Time granularity: .232E-18 seconds.
* Frequency granularity: .238E-12 s/s
* Frequency adjustment is continuous in time.
* Less overhead for frequency adjustment.
* Improves xntpd performance.

Reviewed by: bde, bde, bde


# c7c9a816 15-Feb-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Add a nanotime() function so that we can start to use this call.


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 6f70df15 14-Jan-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Move almost all the ntp related stuff from kern_clock.c to
kern_ntptime.c. The only bit left over is that which is executed
in all calls to hardclock(). Various cleanups and staticizing
along the road.


# bb303fe2 11-Jan-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Try to solve timeout race by not touching softtics here.


# 55c449bc 10-Jan-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Fix softclock calling so we don't loose timeouts (I broke this ~10h ago)


# eeb355f7 10-Jan-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Whoops. softclock is called from doreti_swi as well. Abandon call from
hardclock().

Forgot this:

Pointed hat sent by: bd


# a50ec505 10-Jan-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Effect the divorce of kern_clock.c and kern_timeout.c (which was
repository copied from kern_clock.c)


# 9cf25fbe 06-Jan-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Improve hardpps readability a bit:
* Rename usec to p_usec so you can search for it.
* Macroize the huge median_of_3_samples if statement.


# e1d6dc65 23-Dec-1997 Nate Williams <nate@FreeBSD.org>

This patch causes the "calltodo" timer list to be decremented by the amount
of time that the laptop was suspending. Thus, select() calls that might have
suspended rather than firing at 1hr + "time suspended" since the timer was
posted.

Adding:

options APM_FIXUP_CALLTODO

to the kernel config enables the patch.

[
This patch was slightly modified to use a consistant indent style and
I removed some unused local variables. After this has been tested a
few weeks we'll make the options the default, so for now I'm now
documenting it in LINT. Mike can later if he wants.
]

Reviewed by: Mike Smith <msmith@freebsd.org>
Submitted by: Ken Key <key@cs.utk.edu>


# eae8fc2c 08-Dec-1997 Steve Passe <fsmp@FreeBSD.org>

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


# b672aa4b 24-Nov-1997 Bruce Evans <bde@FreeBSD.org>

Removed all traces of P_IDLEPROC. It was tested but never set.


# 99af8d4e 17-Nov-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #include.


# 4a11ca4e 07-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 45327611 24-Sep-1997 Justin T. Gibbs <gibbs@FreeBSD.org>

Store an absolute tick value in callout entries so that a subtraction on
hash chain traversal isn't needed. This also allows untimeout to recompute
the hash to find the bucket that the entry to remove is stored in so
that each callout entry no longer needs to store that information.

Reviewed by: Nate Williams <nate@mt.sri.com>


# ab36c067 21-Sep-1997 Justin T. Gibbs <gibbs@FreeBSD.org>

init_main.c subr_autoconf.c:
Add support for "interrupt driven configuration hooks".
A component of the kernel can register a hook, most likely
during auto-configuration, and receive a callback once
interrupt services are available. This callback will occur before
the root and dump devices are configured, so the configuration
task can affect the selection of those two devices or complete
any tasks that need to be performed prior to launching init.
System boot is posponed so long as a hook is registered. The
hook owner is responsible for removing the hook once their task
is complete or the system boot can continue.

kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c:
Change the interface and implementation for the kernel callout
service. The new implemntaion is based on the work of
Adam M. Costello and George Varghese, published in a technical
report entitled "Redesigning the BSD Callout and Timer Facilities".
The interface used in FreeBSD is a little different than the one
outlined in the paper. The new function prototypes are:

struct callout_handle timeout(void (*func)(void *),
void *arg, int ticks);

void untimeout(void (*func)(void *), void *arg,
struct callout_handle handle);

If a client wishes to remove a timeout, it must store the
callout_handle returned by timeout and pass it to untimeout.

The new implementation gives 0(1) insert and removal of callouts
making this interface scale well even for applications that
keep 100s of callouts outstanding.

See the updated timeout.9 man page for more details.


# bea0f0be 06-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Some staticized variables were still declared to be extern.


# e4ba6a82 02-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# b1037dcd 21-Aug-1997 Bruce Evans <bde@FreeBSD.org>

#include <machine/limits.h> explicitly in the few places that it is required.


# 5faa3121 24-Jun-1997 John Hay <jhay@FreeBSD.org>

Add tickadj to struct clockinfo, like NetBSD and OpenBSD.
NOTE: libc, time, kgmon and rpc.rstatd will have to be recompiled.


# 477a642c 26-Apr-1997 Peter Wemm <peter@FreeBSD.org>

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


# 9a8f4a4c 22-Mar-1997 Mike Pritchard <mpp@FreeBSD.org>

Restore Bruce's original comment. It seems that "iff" = if and only if,
and is not a typo. It is used other places in the kernel, too.


# 269ebc86 22-Mar-1997 Mike Pritchard <mpp@FreeBSD.org>

Fix a typo in a comment of a recent commit.


# 3c816944 21-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Fixed some invalid (non-atomic) accesses to `time', mostly ones of the
form `tv = time'. Use a new function gettime(). The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 885bd8e4 30-Dec-1996 John Hay <jhay@FreeBSD.org>

Update our kernel ntp code to the latest from David Mills. The main change
is the addition of the FLL code, which is used by the latest versions of
xntpd. The kernel PPS code is also updated, although I can't test that yet.


# 835bd1ce 25-Oct-1996 Bruce Evans <bde@FreeBSD.org>

Improved biasing of i586 clock by adjusting for hardclock() latency.
I decided to do this for every hardclock() call instead of lazily
in microtime(). The lazy method is simpler but has more overhead
if microtime() is called a lot.

CPU_THISTICKLEN() is now a no-op and should probably go away.
Previously it did nothing directly but had the side effect of
setting i586_last_tick for CPU_CLOCKUPDATE() and i586_avg_tick for
debugging. CPU_CLOCKUPDATE() now uses a better method and
i586_avg_tick is too much trouble to maintain.

Reduced nesting of #includes in the usual case.

Increased nesting of #includes when CLOCK_HAIR is defined. This
is a kludge to get typedefs for inline functions only when the
inline functions are used. Normally only kern_clock.c defines
this. kern_clock.c can't include the i386 headers directly.

Removed unused LOCORE support.


# a0ea75ec 10-Oct-1996 Bruce Evans <bde@FreeBSD.org>

Don't include "opt_cpu.h" in <machine/clock.h>, since this breaks lkm's.
The change breaks kern_clock.c; fix that temporarily by including
"opt_cpu.h" there.


# f5e9e8ec 30-Jul-1996 Bruce Evans <bde@FreeBSD.org>

Fixed resource usage integrals. They were too large by a factor of
of profhz/stathz when profiling was enabled.


# cc3d5226 23-Jun-1996 Bruce Evans <bde@FreeBSD.org>

Unstaticize psratio and staticize profprocs. psratio needs to be exported
to trap.c to fix user profiling.


# 27a0b398 17-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Staticize.
Unstaticize a function in scsi/scsi_base that was used, with an undocumented
option.
My last count on the LINT kernel shows:
Total symbols: 3647
unref symbols: 463
undef symbols: 4
1 ref symbols: 1751
2 ref symbols: 485
Approaching the pain threshold now.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# 65d0bc13 06-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A couple of minor tweaks to the sysctl stuff.


# 946bb7a2 04-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

A major sweep over the sysctl stuff.

Move a lot of variables home to their own code (In good time before xmas :-)

Introduce the string descrition of format.

Add a couple more functions to poke into these marvels, while I try to
decide what the correct interface should look like.

Next is adding vars on the fly, and sysctl looking at them too.

Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.


# d841aaa7 02-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Finished (?) cleaning up sysinit stuff.


# ae0eb976 12-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

The entire sysctl callback to read/write version. I havn't tested this as
much as I'd like to, but the malloc stunt I tried for an interim for
sure does worse.
Now we can read and write from any kind of address-space, not only
user and kernel, using callbacks.
This may be over-generalization for now, but it's actually simpler.


# 787d58f2 08-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Fix some of the sysctl broke, and add a lot more to it.


# 07e3b0c2 12-Oct-1995 Garrett Wollman <wollman@FreeBSD.org>

Improve clock accuracy by accounting for late/missed clock interrupts
if the hardware supports it.


# 4590fd3a 09-Sep-1995 David Greenman <dg@FreeBSD.org>

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 28f8db14 29-Jul-1995 Bruce Evans <bde@FreeBSD.org>

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 6976af69 12-Dec-1994 Bruce Evans <bde@FreeBSD.org>

Obtained from: my old fix for 1.1.5

Improve hzto():

Round up instead of down and then add 1 tick. This fixes sleep(1)
sometimes sleeping for < 1 second and usleep(10000) sometimes sleeping
for as little as 1 usec + syscall time.

Don't do all the calculations at splhigh().

Don't depend on `tick' being a multiple of 1000.

Don't lose accuracy for `sec' between 0x7fffffff / 1000 - 1000 and
0x7fffffff / hz.

Don't assume that longs are 32 bits or that ints have the same size as
longs.


# 8478caba 15-Oct-1994 Garrett Wollman <wollman@FreeBSD.org>

kern_clock.c: define dk_names[][].
kern_sysctl.c: call dev_sysctl for hw.devconf mib subtree
kern_devconf.c: sysctl-accessible device-configuration and -management
interface


# 797f2d22 02-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.


# 93e010b0 28-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Fixed bug in hardclock() that caused adjtime() to fail when given
a negative offset. This would be seen in xntpd as a rash of
``Previous time adjustment didn't complete'' messages on startup.


# bb56ec4a 25-Sep-1994 Poul-Henning Kamp <phk@FreeBSD.org>

While in the real world, I had a bad case of being swapped out for a lot of
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.


# 3f31c649 18-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Redo Kernel NTP PLL support, kernel side.

This code is mostly taken from the 1.1 port (which was in turn taken from
Dave Mills's kern.tar.Z example). A few significant differences:

1) ntp_gettime() is now a MIB variable rather than a system call. A few
fiddles are done in libc to make it behave the same.

2) mono_time does not participate in the PLL adjustments.

3) A new interface has been defined (in <machine/clock.h>) for doing
possibly machine-dependent things around the time of the clock update.
This is used in Pentium kernels to disable interrupts, set `time', and
reset the CPU cycle counter as quickly as possible to avoid jitter in
microtime(). Measurements show an apparent resolution of a bit more than
8.14usec, which is reasonable given system-call overhead.


# 8a129cae 27-Aug-1994 David Greenman <dg@FreeBSD.org>

1) Changed ddb into a option rather than a pseudo-device (use options DDB
in your kernel config now).
2) Added ps ddb function from 1.1.5. Cleaned it up a bit and moved into its
own file.
3) Added \r handing in db_printf.
4) Added missing memory usage stats to statclock().
5) Added dummy function to pseudo_set so it will be emitted if there
are no other pseudo declarations.


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources