History log of /freebsd-10.1-release/sys/sys/systm.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 266347 17-May-2014 ian

MFC 264019, 264041, 264048, 264049, 264050, 264051

Add support for event timers whose clock frequency can change while running.

Apparently all ARM configs build kern_et.c, but only a few of them also
build kern_clocksource.c, un-break the build by not referencing functions in
kern_clocksource if NO_EVENTTIMERS is defined.

Add variable-frequency support to the arm mpcore eventtimer driver.

mpcore_timer: Disable the timer and clear any pending bit, then setup the
new counter register values, then restart the timer. Also re-nest the parens
properly for casting the result of converting time and frequency to a count.


# 266197 15-May-2014 ian

MFC r261786, r261789

Rework the EARLY_PRINTF mechanism. Instead of defining a special eprintf()
routine, now a platform can provide a pointer to an early_putc() routine
which is used instead of cn_putc(). Control can be handed off from early
printf support to standard console support by NULLing out the pointer
during standard console init.

Convert two while(1); statements into proper panics.


# 266094 14-May-2014 ian

MFC r261038, r261039, r261040, r261041

Implement generic support for early printf.


# 258127 14-Nov-2013 pluknet

Merge r257996,r258001,r258069 from head: fixes for HyperV guest.

- Set description string for VM_GUEST_HV (HyperV guest).
- Add a brief comment about VM_GUEST and vm_guest_sysctl_names relationship.
- CTASSERT that vm_guest range is covered by vm_guest_sysctl_names.

Approved by: re (glebius)


# 257122 25-Oct-2013 kib

MFC r256502:
Similar to debug.iosize_max_clamp sysctl, introduce
devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized
i/o requests on the devfs files.

Approved by: re (glebius)


# 256758 18-Oct-2013 gibbs

MFC r256425:

Centralize the detection logic for the Hyper-V hypervisor.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs, grehan
Approved by: re (gjb)

sys/sys/systm.h:
* Add a new VM_GUEST type, VM_GUEST_HV (HyperV guest).

sys/dev/hyperv/vmbus/hv_vmbus_drv_freebsd.c:
sys/dev/hyperv/vmbus/hv_hv.c:
sys/dev/hyperv/stordisengage/hv_ata_pci_disengage.c:
* Set vm_guest to VM_GUEST_HV and use that on other HyperV related
devices instead of cloning the cpuid hypervisor check.
* Cleanup the vmbus_identify function.
------------------------------------------------------------------------


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 255351 07-Sep-2013 np

Add a vtprintf. It is to tprintf what vprintf is to printf.

Reviewed by: kib


# 255057 30-Aug-2013 kib

Move the definition of the struct unrhdr into a separate header file,
to allow embedding the struct. Add init_unrhdr(9) initializer, which
sets up preallocated unrhdr.

Reviewed by: alc
Tested by: pho, bf


# 253007 07-Jul-2013 alfred

Make kassert_printf use __printflike.

Fix associated errors/warnings while I'm here.

Requested by: avg


# 249189 06-Apr-2013 glebius

Move CRITICAL_ASSERT() macro to systm.h, where the critical(9)
functions are declared.


# 248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


# 248032 08-Mar-2013 andre

Move the callout subsystem initialization to its own SYSINIT()
from being indirectly called via cpu_startup()+vm_ksubmap_init().
The boot order position remains the same at SI_SUB_CPU.

Allocation of the callout array is changed to stardard kernel malloc
from a slightly obscure direct kernel_map allocation.

kern_timeout_callwheel_alloc() is renamed to callout_callwheel_init()
to better describe its purpose.
kern_timeout_callwheel_init() is removed simplifying the per-cpu
initialization.

Reviewed by: davide


# 247787 04-Mar-2013 davide

MFcalloutng:
Introduce sbt variants of msleep(), msleep_spin(), pause(), tsleep() in
the KPI, allowing to specify timeout in 'sbintime_t' rather than ticks.

Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, markj, Fabian Keil


# 247777 04-Mar-2013 davide

- Make callout(9) tickless, relying on eventtimers(4) as backend for
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.

This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.

Together with: mav
Reviewed by: attilio, bde, luigi, phk
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm),
markj (amd64), mav, Fabian Keil


# 247714 03-Mar-2013 davide

Remove a couple of unused include.


# 247454 28-Feb-2013 davide

MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.


# 247110 21-Feb-2013 imp

splsoftvm() is no longer in the tree. gc.


# 247108 21-Feb-2013 imp

Remove splsoftclock() since it is now gone.


# 247059 20-Feb-2013 imp

Remove the unused spl functions: spl0, splsoftcam, splsofttty,
splsofttq and splstatclock.

Other used spl functions to follow.


# 246254 02-Feb-2013 avg

fix some fat-fingering in r246246

Submitted by: mjg
Pointyhat to: avg
MFC after: 5 days
X-MFC with: r246246


# 246246 02-Feb-2013 avg

print compiler version in the kernel banner

And provide kernel compiler version as a sysctl as well.
This is useful while we have gcc and clang cohabitation.
This could be even more useful when we have support
for external toolchains.

In cooperation with: mjg
MFC after: 13 days


# 244105 10-Dec-2012 alfred

Switch the hardwired WITNESS panics to kassert_panic.

This is an ongoing effort to provide runtime debug information
useful in the field that does not panic existing installations.

This gives us the flexibility needed when shipping images to a
potentially large audience with WITNESS enabled without worrying
about formerly non-fatal LORs hurting a release.

Sponsored by: iXsystems


# 243980 07-Dec-2012 alfred

Allow KASSERT to log instead of panic.

This is to allow debug images to be used without taking down the
system when non-fatal asserts are hit.

The following sysctls are added:

debug.kassert.warn_only: 1 = log, 0 = panic

debug.kassert.do_ktr: set to a ktr mask for logging via KTR

debug.kassert.do_log: 1 = log, 0 = quiet

debug.kassert.warnings: stats, number of kasserts hit

debug.kassert.log_panic_at:
number of kasserts before we actually panic, 0 = never

debug.kassert.log_pps_limit: pps limit for log messages

debug.kassert.log_mute_at: stop warning after N kasserts, 0 = never stop

debug.kassert.kassert: set this sysctl to trigger a kassert

Discussed with: scottl, gnn, marcel
Sponsored by: iXsystems


# 234785 29-Apr-2012 dim

Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after: 2 weeks


# 232783 10-Mar-2012 mav

Idle ticks optimization:
- Pass number of events to the statclock() and profclock() functions
same as to hardclock() before to not call them many times in a loop.
- Rename them into statclock_cnt() and profclock_cnt().
- Turn statclock() and profclock() into compatibility wrappers,
still needed for arm.
- Rename hardclock_anycpu() into hardclock_cnt() for unification.

MFC after: 1 week


# 231949 20-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


# 230643 28-Jan-2012 attilio

Avoid to check the same cache line/variable from all the locking
primitives by breaking stop_scheduler into a per-thread variable.
Also, store the new td_stopsched very close to td_*locks members as
they will be accessed mostly in the same codepaths as td_stopsched and
this results in avoiding a further cache-line pollution, possibly.

STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to
take advantage of already cached curthread, but in the end there should
not really be a performance benefit, while introducing a KPI breakage.

In collabouration with: flo
Reviewed by: avg
MFC after: 3 months (or never)
X-MFC: r228424


# 228478 13-Dec-2011 ed

Reimplement CTASSERT() using _Static_assert().


# 228424 11-Dec-2011 avg

panic: add a switch and infrastructure for stopping other CPUs in SMP case

Historical behavior of letting other CPUs merily go on is a default for
time being. The new behavior can be switched on via
kern.stop_scheduler_on_panic tunable and sysctl.

Stopping of the CPUs has (at least) the following benefits:
- more of the system state at panic time is preserved intact
- threads and interrupts do not interfere with dumping of the system
state

Only one thread runs uninterrupted after panic if stop_scheduler_on_panic
is set. That thread might call code that is also used in normal context
and that code might use locks to prevent concurrent execution of certain
parts. Those locks might be held by the stopped threads and would never
be released. To work around this issue, it was decided that instead of
explicit checks for panic context, we would rather put those checks
inside the locking primitives.

This change has substantial portions written and re-written by attilio
and kib at various times. Other changes are heavily based on the ideas
and patches submitted by jhb and mdf. bde has provided many insights
into the details and history of the current code.

The new behavior may cause problems for systems that use a USB keyboard
for interfacing with system console. This is because of some unusual
locking patterns in the ukbd code which have to be used because on one
hand ukbd is below syscons, but on the other hand it has to interface
with other usb code that uses regular mutexes/Giant for its concurrency
protection. Dumping to USB-connected disks may also be affected.

PR: amd64/139614 (at least)
In cooperation with: attilio, jhb, kib, mdf
Discussed with: arch@, bde
Tested by: Eugene Grosbein <eugen@grosbein.net>,
gnn,
Steven Hartland <killing@multiplay.co.uk>,
glebius,
Andrew Boyer <aboyer@averesystems.com>
(various versions of the patch)
MFC after: 3 months (or never)


# 224307 25-Jul-2011 avg

remove RESTARTABLE_PANICS option

This is done per request/suggestion from John Baldwin
who introduced the option. Trying to resume normal
system operation after a panic is very unpredictable
and dangerous. It will become even more dangerous
when we allow a thread in panic(9) to penetrate all
lock contexts.
I understand that the only purpose of this option was
for testing scenarios potentially resulting in panic.

Suggested by: jhb
Reviewed by: attilio, jhb
X-MFC-After: never
Approved by: re (kib)


# 223889 09-Jul-2011 kib

Add a facility to disable processing page faults. When activated,
uiomove generates EFAULT if any accessed address is not mapped, as
opposed to handling the fault.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc (previous version)


# 223884 09-Jul-2011 kib

Implement bitcount16.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 223426 22-Jun-2011 jkim

Set negative quality to TSC timecounter when C3 state is enabled for Intel
processors unless the invariant TSC bit of CPUID is set. Intel processors
may stop incrementing TSC when DPSLP# pin is asserted, according to Intel
processor manuals, i. e., TSC timecounter is useless if the processor can
enter deep sleep state (C3/C4). This problem was accidentally uncovered by
r222869, which increased timecounter quality of P-state invariant TSC, e.g.,
for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).

Reported by: Fabian Keil (freebsd-listen at fabiankeil dot de)
Ian FREISLICH (ianf at clue dot co dot za)
Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de)
- Core2 Duo T5870 (C3 state available/enabled)
jkim - Xeon X5150 (C3 state unavailable)


# 221855 13-May-2011 mdf

Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk. Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file. (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by: alc
MFC after: 1 week
MFC with: r221853


# 221853 13-May-2011 mdf

Usa a globally visible region of zeros for both /dev/zero and the md
device. There are likely other kernel uses of "blob of zeros" than can
be converted.

Reviewed by: alc
MFC after: 1 week


# 221741 10-May-2011 avg

bitcount32: replace lengthy comment with a reference to SWAR

MFC after: 5 days


# 220987 24-Apr-2011 kib

Fix typo.

MFC after: 3 days


# 219920 23-Mar-2011 alc

Modestly increase the maximum allowed size of the kmem map on i386.
Also, express this new maximum as a fraction of the kernel's address
space size rather than a constant so that increasing KVA_PAGES will
automatically increase this maximum. As a side-effect of this change,
kern.maxvnodes will automatically increase by a proportional amount.

While I'm here ensure that this change doesn't result in an unintended
increase in maxpipekva on i386. Calculate maxpipekva based upon the
size of the kernel address space and the amount of physical memory
instead of the size of the kmem map. The memory backing pipes is not
allocated from the kmem map. It is allocated from its own submap of
the kernel map. In short, it has no real connection to the kmem map.
(In fact, the commit messages for the maxpipekva auto-sizing talk
about using the kernel map size, cf. r117325 and r117391, even though
the implementation actually used the kmem map size.) Although the
calculation is now done differently, the resulting value for
maxpipekva should remain almost the same on i386. However, on amd64,
the value will be reduced by 2/3. This is intentional. The recent
change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had
the unnecessary side-effect of increasing maxpipekva. This change is
effectively restoring maxpipekva on amd64 to its prior value.

Eliminate init_param3() since it is no longer used.


# 214004 18-Oct-2010 marcel

Rename boot() to kern_reboot() and make it visible outside of
kern_shutdown.c. This makes it easier for emulators and other
parts of the kernel to initiate a reboot.


# 212541 13-Sep-2010 mav

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


# 210131 15-Jul-2010 mav

Move functions declaration to MI code, following implementation.


# 209710 05-Jul-2010 jh

Extend the kernel unit number allocator for allocating specific unit
numbers. This change adds a new function alloc_unr_specific() which
returns the requested unit number if it is free. If the number is
already allocated or out of the range, -1 is returned.

Update alloc_unr(9) manual page accordingly and add a MLINK for
alloc_unr_specific(9).

Discussed on: freebsd-hackers


# 209371 20-Jun-2010 mav

Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.


# 208494 24-May-2010 mav

- Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.


# 204633 03-Mar-2010 ivoras

Make the comment follow style(9) format.

Spotted by: jhb


# 204611 02-Mar-2010 ivoras

Document the VM detection type and sysctl a bit better.


# 204420 27-Feb-2010 alc

When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by: jhb, kib
MFC after: 3 days


# 204139 20-Feb-2010 alc

Eliminate an unused declaration.


# 202143 12-Jan-2010 brooks

Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1. Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after: 1 month


# 202050 10-Jan-2010 imp

Merge change r198561 from projects/mips to head:

r198561 | thompsa | 2009-10-28 15:25:22 -0600 (Wed, 28 Oct 2009) | 4 lines
Allow a scratch buffer to be set in order to be able to use setenv() while
booting, before dynamic kenv is running. A few platforms implement their own
scratch+sprintf handling to save data from the boot environment.


# 197316 18-Sep-2009 alc

Add a new sysctl for reporting all of the supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


# 196334 17-Aug-2009 attilio

* Change the scope of the ASSERT_ATOMIC_LOAD() from a generic check to
a pointer-fetching specific operation check. Consequently, rename the
operation ASSERT_ATOMIC_LOAD_PTR().
* Fix the implementation of ASSERT_ATOMIC_LOAD_PTR() by checking
directly alignment on the word boundry, for all the given specific
architectures. That's a bit too strict for some common case, but it
assures safety.
* Add a comment explaining the scope of the macro
* Add a new stub in the lockmgr specific implementation

Tested by: marcel (initial version), marius
Reviewed by: rwatson, jhb (comment specific review)
Approved by: re (kib)


# 196226 14-Aug-2009 bz

Add a new macro to test that a variable could be loaded atomically.
Check that the given variable is at most uintptr_t in size and that
it is aligned.

Note: ASSERT_ATOMIC_LOAD() uses ALIGN() to check for adequate
alignment -- however, the function of ALIGN() is to guarantee
alignment, and therefore may lead to stronger alignment
enforcement than necessary for types that are smaller than
sizeof(uintptr_t).

Add checks to mtx, rw and sx locks init functions to detect possible
breakage. This was used during debugging of the problem fixed with
r196118 where a pointer was on an un-aligned address in the dpcpu area.

In collaboration with: rwatson
Reviewed by: rwatson
Approved by: re (kib)


# 192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


# 192323 18-May-2009 marcel

Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.


# 190878 10-Apr-2009 thompsa

Revert r190676,190677

The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by: scottl


# 190676 03-Apr-2009 thompsa

Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.


# 189450 06-Mar-2009 kib

Extract the no_poll() and vop_nopoll() code into the common routine
poll_no_poll().
Return a poll_no_poll() result from devfs_poll_f() when
filedescriptor does not reference the live cdev, instead of ENXIO.

Noted and tested by: hps
MFC after: 1 week


# 189170 28-Feb-2009 ed

Add memmove() to the kernel, making the kernel compile with Clang.

When copying big structures, LLVM generates calls to memmove(), because
it may not be able to figure out whether structures overlap. This caused
linker errors to occur. memmove() is now implemented using bcopy().
Ideally it would be the other way around, but that can be solved in the
future. On ARM we don't do add anything, because it already has
memmove().

Discussed on: arch@
Reviewed by: rdivacky


# 183982 17-Oct-2008 bz

Add cr_canseeinpcb() doing checks using the cached socket
credentials from inp_cred which is also available after the
socket is gone.
Switch cr_canseesocket consumers to cr_canseeinpcb.
This removes an extra acquisition of the socket lock.

Reviewed by: rwatson
MFC after: 3 months (set timer; decide then)


# 183406 27-Sep-2008 ed

Move uminor() and umajor() to the same place as userspace minor() and major().

The uminor() and umajor() functions have the same use in kernel space as
the minor() and major() functions in userspace. If we ever get rid of
the minor() function in kernel space, we could decide to just expose
minor() and major() to kernel space, making uminor() and umajor()
redundant.

There are two reasons why we want to have uminor() and umajor() in
<sys/types.h>:

- Having them close together prevents them from diverting. Even though
it's unlikely the definitions will change, it's a good habit to have
them at the same place.

- They don't really belong in kern_conf.c. kern_conf.c has been
liberated from dealing with device major and minor number handling.

The device_ids(9) manpage now lists the wrong #include's, because it
should only list <sys/types.h> now. I'm leaving it as it is now, because
I wonder if we should document them anyway. We're probably better off
documenting minor(3) and major(3).


# 182909 10-Sep-2008 jhb

Various whitespace fixes.


# 179757 12-Jun-2008 ed

Turn dev2unit(), minor(), unit2minor() and minor2unit() into macro's.

Now that we got rid of the minor-to-unit conversion and the constraints
on device minor numbers, we can convert the functions that operate on
minor and unit numbers to simple macro's. The unit2minor() and
minor2unit() macro's are now no-ops.

The ZFS code als defined a macro named `minor'. Change the ZFS code to
use umajor() and uminor() here, as it is the correct approach to do
this. Also add $FreeBSD$ to keep SVN happy.

Approved by: philip (mentor), pjd


# 179096 18-May-2008 jb

Remove some DTrace hook definitions that are now in dtrace_bsd.h
which contains all the hook definitions rather than splattering
them all over the header files.

The definitions are only valid when the KDTRACE_HOOKS kernel
option is defined, so other kernel sources have no need to
see them.


# 177642 26-Mar-2008 phk

The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].


# 177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 174647 16-Dec-2007 jeff

Refactor select to reduce contention and hide internal implementation
details from consumers.

- Track individual selecters on a per-descriptor basis such that there
are no longer collisions and after sleeping for events only those
descriptors which triggered events must be rescaned.
- Protect the selinfo (per descriptor) structure with a mtx pool mutex.
mtx pool mutexes were chosen to preserve api compatibility with
existing code which does nothing but bzero() to setup selinfo
structures.
- Use a per-thread wait channel rather than a global wait channel.
- Hide select implementation details in a seltd structure which is
opaque to the rest of the kernel.
- Provide a 'selsocket' interface for those kernel consumers who wish to
select on a socket when they have no fd so they no longer have to
be aware of select implementation details.

Tested by: kris
Reviewed on: arch


# 174253 04-Dec-2007 kib

Implement fetching of the __FreeBSD_version from the ELF ABI-tag note.
The value is read into the p_osrel member of the struct proc. p_osrel
is set to 0 for the binaries without the note.

MFC after: 3 days


# 172612 13-Oct-2007 des

I don't know what I was smoking when I wrote these three years ago; the
return value is an error code, hence always an int.

While I'm here, add getenv_uint() for completeness.


# 171202 04-Jul-2007 kib

Since cdev mutex is after system map mutex in global lock order, free()
shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as
mutex, thus creating this LOR situation.

Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions
until the unrhdr mutex is dropped. Save the freed items on the ppfree list
instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to
clean the list.
Call clean_unrhdrl() after devfs_create() calls immediately before
dropping cdev mutex. devfs_create() is the only user of the alloc_unrl()
in the tree.

Reviewed by: phk
Tested by: Peter Holm
LOR: 80
Approved by: re (kensmith)


# 171156 02-Jul-2007 rwatson

Continue kernel privilege cleanup for 7.0: unstaticize suser_enabled and
stop declaring it in systm.h -- it's used only in kern_priv.c and is not
required elsewhere.

Approved by: re (kensmith)


# 170587 11-Jun-2007 rwatson

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# 170468 09-Jun-2007 attilio

Since locking in kern/subr_prof.c is changed a bit, we need nomore of
time_lock spinlock exported.

Approved by: jeff (mentor)


# 169803 20-May-2007 jeff

- Move clock synchronization into a seperate clock lock so the global
scheduler lock is not involved. sched_lock still protects the sched_clock
call. Another patch will remedy this.

Contributed by: Attilio Rao <attilio@FreeBSD.org>
Tested by: kris, jeff


# 168506 08-Apr-2007 pjd

Add root_mounted() function that returns true if the root file system is
already mounted.


# 168300 03-Apr-2007 pjd

Add root_mount_wait() function which can be used to wait until the root
file system is mounted. This is useful for kernel modules loaded from
/boot/loader.conf, that have to access file system.


# 167787 21-Mar-2007 jhb

Rename the 'mtx_object', 'rw_object', and 'sx_object' members of mutexes,
rwlocks, and sx locks to 'lock_object'.


# 167387 09-Mar-2007 jhb

Allow threads to atomically release rw and sx locks while waiting for an
event. Locking primitives that support this (mtx, rw, and sx) now each
include their own foo_sleep() routine.
- Rename msleep() to _sleep() and change it's 'struct mtx' object to a
'struct lock_object' pointer. _sleep() uses the recently added
lc_unlock() and lc_lock() function pointers for the lock class of the
specified lock to release the lock while the thread is suspended.
- Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks
(rw_sleep()), and sx locks (sx_sleep()). msleep() still exists and
is now identical to mtx_sleep(), but it is deprecated.
- Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP.
- Rewrite much of sleep.9 to not be msleep(9) centric.
- Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS'
section.
- Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will
warn if you try to pass a NULL wait channel. The functions already have
a KASSERT to that effect.


# 167111 28-Feb-2007 thomas

Minor reformatting.


# 166908 23-Feb-2007 jhb

Add a new kernel sleep function pause(9). pause(9) is for places that
want an equivalent of DELAY(9) that sleeps instead of spins. It accepts
a wmesg and a timeout and is not interrupted by signals. It uses a private
wait channel that should never be woken up by wakeup(9) or wakeup_one(9).

Glanced at by: phk


# 166698 14-Feb-2007 cperciva

Optimize bitcount32 by replacing 6 logical operations with 2. The key
observation here is that it doesn't matter what garbage accumulates in
bits which we're going to end up masking away anyway, as long as the
garbage doesn't overflow into bits which we care about.

This improved version may not be the fastest possible on all systems,
but it's certainly going to be better than what was here before.


# 166022 15-Jan-2007 rrs

Reviewed by: rwatson
Approved by: gnn

Add a new function hashinit_flags() which allows NOT-waiting
for memory (or waiting). The old hashinit() function now
calls hashinit_flags(..., HASH_WAITOK);


# 165547 26-Dec-2006 kmacy

turn non-INVARIANT KASSERT into an empty but real C
statement so that syntax errors will still be caught
even if INVARIANTS aren't enabled in the config file


# 164032 06-Nov-2006 rwatson

Add a new priv(9) kernel interface for checking the availability of
privilege for threads and credentials. Unlike the existing suser(9)
interface, priv(9) exposes a named privilege identifier to the privilege
checking code, allowing more complex policies regarding the granting of
privilege to be expressed. Two interfaces are provided, replacing the
existing suser(9) interface:

suser(td) -> priv_check(td, priv)
suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags)

A comprehensive list of currently available kernel privileges may be
found in priv.h. New privileges are easily added as required, but the
comments on adding privileges found in priv.h and priv(9) should be read
before doing so.

The new privilege interface exposed sufficient information to the
privilege checking routine that it will now be possible for jail to
determine whether a particular privilege is granted in the check routine,
rather than relying on hints from the calling context via the
SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail
check function, prison_priv_check(), is exposed from kern_jail.c and used
by the privilege check routine to determine if the privilege is permitted
in jail. As a result, a centralized list of privileges permitted in jail
is now present in kern_jail.c.

The MAC Framework is now also able to instrument privilege checks, both
to deny privileges otherwise granted (mac_priv_check()), and to grant
privileges otherwise denied (mac_priv_grant()), permitting MAC Policy
modules to implement privilege models, as well as control a much broader
range of system behavior in order to constrain processes running with
root privilege.

The suser() and suser_cred() functions remain implemented, now in terms
of priv_check() and the PRIV_ROOT privilege, for use during the transition
and possibly continuing use by third party kernel modules that have not
been updated. The PRIV_DRIVER privilege exists to allow device drivers to
check privilege without adopting a more specific privilege identifier.

This change does not modify the actual security policy, rather, it
modifies the interface for privilege checks so changes to the security
policy become more feasible.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


# 162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


# 161675 28-Aug-2006 davidxu

Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.


# 160988 04-Aug-2006 jb

Add a type definition for the cyclic timer callback function.

The cyclic timer is a high-resolution timer allows timeouts at nanosecond
intervals where hardware support is available. Typically on i386 there
is no HPET (high performance event timer) like the one Intel started
specifying some time in 2004, so the best that tye cyclic timer subsystem
can do is run at Hz.

The cyclic timer code itself is ported from OpenSolaris and is covered
by the CDDL, so it is only loaded as a module. This function type definition
is used in machine-dependent code to provide a hook for the module to
register it's callback function.


# 160217 09-Jul-2006 scottl

Use a sleep mutex instead of an sx lock for the kernel environment. This
allows greater flexibility for drivers that want to query the environment.

Reviewed by: jhb, mux


# 155534 11-Feb-2006 phk

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


# 155444 07-Feb-2006 phk

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


# 155383 06-Feb-2006 jeff

- Add the global 'rebooting' variable that is used to detect when
boot() has been called.

Sponsored by: Isilon Systems, Inc.
MFC After: 1 week


# 153855 29-Dec-2005 jhb

Add a new function msleep_spin() which is a slightly stripped down version
of msleep(). msleep_spin() doesn't support changing the priority of the
thread while it is asleep nor does it support interruptible sleeps (PCATCH)
or the PDROP flag. It does support timeouts however. It differs from
msleep() in that the passed in mutex is a spin mutex. This means one can
use msleep_spin() and wakeup() with a spin mutex similar to msleep() and
wakeup() with a regular mutex. Note that the spin mutex in question needs
to come before sched_lock and the sleepq locks in lock order.


# 153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


# 149300 19-Aug-2005 pjd

Avoid code duplication and implement bitcount32() function in systm.h only.

Reviewed by: cperciva
MFC after: 3 days


# 145609 28-Apr-2005 marcel

Inline functions belong in <sys/libkern.h>, not in <sys/systm.h>.
Move crc32() and crc32_raw() from the latter to the former. Move
the declaration of crc32_tab[] to <sys/libkern.h> as well.

Pointed out by: bde@
Tested on: ia64, sparc64


# 145604 27-Apr-2005 marcel

Refactor the CRC-32 code to enhance its usability. Move the actual
CRC logic to a new function: crc32_raw() that obtains the initial
CRC value as well as leaves any post-processing to the caller. As
such, it can be used when the initial CRC value is not ~0U or when
the final CRC value does need to be inverted (bitwise). It also
means that crc32_raw() can be called repeatedly when the data is
not available as a single block, such as for scatter/gather lists
and the likes.

Avoid the additional call overhead incured by the refactoring by
moving the implementation off crc32() to sys/systm.h and making it
inlinable. Since crc32_raw() is itself trivial and since it may
be used in loops that iterate over fragments, having it available
for inlining can be beneficial. Hence, move its implementation
to sys/systm.h as well.

Keep the original implementation of crc32() in libkern/crc32.c for
documentation purposes (as a comment of course).

Triggered by: Jose M Rodriguez (josemi at freebsd dot jazztel dot es)
Discussed on: current@
Tested on: amd64, ia64 (BVO having GPT partitions)
Jargon file candidate: BVO = By Virtue Of :-)


# 145250 18-Apr-2005 phk

Add a named reference-count KPI to hold off mounting of the root filesystem.

While we wait for holds to be released, print a list of who holds us
back once per second.

Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().

Use the new KPI also from ata.

With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.


# 144706 06-Apr-2005 phk

Constify hexdump() harder.


# 144279 29-Mar-2005 phk

Privatize major().


# 143639 15-Mar-2005 phk

Remove findcdev().


# 143622 15-Mar-2005 phk

Move devtoname() prototype to systm.h to reduce #include pollution,
it is (or should be) used in many printf() calls.


# 143283 08-Mar-2005 phk

Reengineer subr_unit

Add support for passing in a mutex. If NULL is passed a global
subr_unit mutex is used.

Add alloc_unrl() which expects the mutex to be held.

Allocating a unit will never sleep as it does not need to allocate
memory.

Cut possible range in half so we can use -1 to mean "out of number".

Collapse first and last runs into the head by means of counters.
This saves memory in the common case(s).


# 143238 07-Mar-2005 phk

Add placeholder mutex argument to new_unrhdr().


# 142834 28-Feb-2005 wes

Add a sysctl that records the amount of physical memory in the machine.

Submitted by: Nicko Dehaine <nicko@stbernard.com>
MFC after: 1 day


# 141458 07-Feb-2005 phk

Add VNASSERT() which is just like KASSERT() but takes a vnode argument
which it will vn_printf() if it triggers.


# 140836 25-Jan-2005 sobomax

More kern_{get,set}itiver() where they belong.

Submitted by: dwmalone
MFC after: 2 weeks


# 140832 25-Jan-2005 sobomax

Split out kernel side of {get,set}itimer(2) into two parts: the first that
pops data from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrappers
of those syscalls.

MFC after: 2 weeks


# 138509 07-Dec-2004 phk

The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).


# 137925 20-Nov-2004 das

Remove the declaration of uarea_pages.

Reviewed by: arch@


# 137099 31-Oct-2004 des

Add TUNABLE_LONG and TUNABLE_ULONG, and use the latter for the
hw.pci.host_mem_start tunable. Add comments to TUNABLE_INT and
TUNABLE_QUAD recommending against their use.

MFC after: 3 weeks


# 137098 31-Oct-2004 des

Whitespace cleanup


# 136945 25-Oct-2004 phk

Add delete_unrhdr() function.

It will fail fatally if all allocated numbers have not been returned first.


# 136836 23-Oct-2004 phk

Move the prototype for g_waitidle() to a more visible place.


# 135956 30-Sep-2004 phk

Add a new API for allocating unit number (-like) resources.

Allocation is always lowest free unit number.

A mixed range/bitmap strategy for maximum memory efficiency. In
the typical case where no unit numbers are freed total memory usage
is 56 bytes on i386.

malloc is called M_WAITOK but no locking is provided (yet). A bit of
experience will be necessary to determine the best strategy. Hopefully
a "caller provides locking" strategy can be maintained, but that may
require use of M_NOWAIT allocation and failure handling.

A userland test driver is included.


# 134493 29-Aug-2004 des

Remove the last remnants of HW_WDOG (no, really!)


# 133159 05-Aug-2004 cperciva

Finish the PRISON_ROOT -> SUSER_ALLOWJAIL renaming by removing
the definition of the old name.


# 132805 28-Jul-2004 phk

Remove global variable rootdevs and rootvp, they are unused as such.

Add local rootvp variables as needed.

Remove checks for miniroot's in the swappartition. We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.


# 132653 26-Jul-2004 cperciva

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# 132255 16-Jul-2004 cperciva

Add a SUSER_RUID flag to suser_cred. This flag indicates that we want to
check if the *real* user is the superuser (vs. the normal behaviour, which
checks the effective user).

Reviewed by: rwatson


# 131935 10-Jul-2004 marcel

Update for the KDB framework:
o Remove prototype of Debugger().
o Remove prototype of backtrace().


# 130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


# 130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 130164 06-Jun-2004 phk

Remove filename+line number from panic messages.


# 127976 07-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 126326 27-Feb-2004 jhb

Switch the sleep/wakeup and condition variable implementations to use the
sleep queue interface:
- Sleep queues attempt to merge some of the benefits of both sleep queues
and condition variables. Having sleep qeueus in a hash table avoids
having to allocate a queue head for each wait channel. Thus, struct cv
has shrunk down to just a single char * pointer now. However, the
hash table does not hold threads directly, but queue heads. This means
that once you have located a queue in the hash bucket, you no longer have
to walk the rest of the hash chain looking for threads. Instead, you have
a list of all the threads sleeping on that wait channel.
- Outside of the sleepq code and the sleep/cv code the kernel no longer
differentiates between cv's and sleep/wakeup. For example, calls to
abortsleep() and cv_abort() are replaced with a call to sleepq_abort().
Thus, the TDF_CVWAITQ flag is removed. Also, calls to unsleep() and
cv_waitq_remove() have been replaced with calls to sleepq_remove().
- The sched_sleep() function no longer accepts a priority argument as
sleep's no longer inherently bump the priority. Instead, this is soley
a propery of msleep() which explicitly calls sched_prio() before
blocking.
- The TDF_ONSLEEPQ flag has been dropped as it was never used. The
associated TDF_SET_ONSLEEPQ and TDF_CLR_ON_SLEEPQ macros have also been
dropped and replaced with a single explicit clearing of td_wchan.
TD_SET_ONSLEEPQ() would really have only made sense if it had taken
the wait channel and message as arguments anyway. Now that that only
happens in one place, a macro would be overkill.


# 126081 21-Feb-2004 phk

Device megapatch 5/6:

Remove the unused second argument from udev2dev().

Convert all remaining users of makedev() to use udev2dev(). The
semantic difference is that udev2dev() will only locate a pre-existing
dev_t, it will not line makedev() create a new one.

Apart from the tiny well controlled windown in D_PSEUDO drivers,
there should no longer be any "anonymous" dev_t's in the system
now, only dev_t's created with make_dev() and make_dev_alias()


# 124732 19-Jan-2004 phk

Add linenumber and source filename to panic(9) output.

Ideally a traceback should be printed too, any takers ?


# 123852 26-Dec-2003 alfred

Add __restrict qualifiers to copyinfrom, copyinstrfrom, copystr, copyinstr,
copyin and copyout.


# 123215 07-Dec-2003 scottl

Re-arrange and consolidate some random debugging stuff


# 120610 30-Sep-2003 mux

Use __predict_false() in the KASSERT() macro. We expect this test
to fail most of the time.


# 120514 27-Sep-2003 phk

Introduce no_poll() default method for device drivers. Have it
do exactly the same as vop_nopoll() for consistency and put a
comment in the two pointing at each other.

Retire seltrue() in favour of no_poll().

Create private default functions in kern_conf.c instead of public
ones.

Change default strategy to return the bio with ENODEV instead of
doing nothing which would lead the bio stranded.

Retire public nullopen() and nullclose() as well as the entire band
of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump}
funtions, they are the default actions now.

Move the final two trivial functions from subr_xxx.c to kern_conf.c
and retire the now empty subr_xxx.c


# 117860 22-Jul-2003 phk

Remove __nonnull() on the second argument of strto[u]l() which I used
to test that the warning actually was emitted.

Spotted by: scottl


# 117856 21-Jul-2003 silby

Fix apparent typo in previous commit.


# 117837 21-Jul-2003 phk

Add a new macro __nonnull(x) to use the new GCC33 attribute which checks
that an argument is not a NULL pointer.

Apply various obvious places.

I belive __printf*() implies __nonnull() so it is not needed on functions
already tagged that way.


# 117391 10-Jul-2003 silby

Add init_param3() to subr_param. This function is called
immediately after the kernel map has been sized, and is
the optimal place for the autosizing of memory allocations
which occur within the kernel map to occur.

Suggested by: bde


# 113090 04-Apr-2003 des

Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector. Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.


# 112898 31-Mar-2003 jeff

- Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space. It will be used as the basic primitive
for a kernel supported user space lock implementation.
- Implement this function in x86's support.s
- Provide stubs that return -1 in all other architectures. Implementations
will follow along shortly.

Reviewed by: jake


# 112564 24-Mar-2003 jhb

Replace the at_fork, at_exec, and at_exit functions with the slightly more
flexible process_fork, process_exec, and process_exit eventhandlers. This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.

Reviewed by: phk, jake, almost complete silence on arch@


# 112367 18-Mar-2003 phk

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


# 110316 04-Feb-2003 phk

Add vsnrprintf() which is just like vsnprintf() but takes a "radix"
argument for the kernel-special %r format.


# 110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


# 110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


# 109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


# 109316 15-Jan-2003 phk

Allow linters to override the CTASSERT macro, since they are unlikely to
like the results.


# 108682 04-Jan-2003 phk

Introduce the
void backtrace(void);
function which will print a backtrace if DDB is in the kernel and an
explanation if not.

This is useful for recording backtraces in non-fatal circumstances and
does not require pollution with DDB #includes in the files where it
is used.

It would of course be nice to have a non-DDB dependent version too,
but since the meat of a backtrace is MD it is probably not worth it.


# 103083 07-Sep-2002 peter

Make UAREA_PAGES and KSTACK_PAGES visible to userland via sysctl, like
PS_STRINGS and USRSTACK is. This is necessary in order to decode a.out
core dumps. kern_proc.c was already referring to both of these values
but was missing the #include "opt_kstack_pages.h". Make the sysctl
variables visible so that certain kld modules can see how their parent
kernel was configured.


# 102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


# 102227 21-Aug-2002 mike

o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif

Concept by: bde
Reviewed by: jake, obrien


# 100073 15-Jul-2002 markm

Fix warning by marking unused function parameter.


# 99098 30-Jun-2002 iedowse

Add a hashdestroy() function to undo the actions of hashinit().


# 99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 98482 20-Jun-2002 peter

Use suword16/fuword16 instead of susword/fusword - this has two different
definitions so far.. 16 bit on x86 and appears to be 32 bit on sparc64.
Be explicit to avoid suprises.


# 98480 20-Jun-2002 peter

Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).


# 98133 12-Jun-2002 kbyanc

Make nselcol, the number of select collisions since boot, unsigned as
negative collisions simply doesn't make sense.

PR: (one small part of) 19720
Approved by: alfred


# 97512 29-May-2002 phk

Add one copy of crc32() and crc32_tab[] in libkern, and remove it two other
places.

Comment out crc32 related definitions in zlib.h, we don't seem to have the
corresponding code in our kernel.


# 97307 26-May-2002 dfr

Add declarations of suword32 and suword64. Add implementations of one or
the other (or both) to all the platforms. Similar for fuword32 and
fuword64.


# 94959 17-Apr-2002 mux

Avoid calling malloc() or free() while holding the
kenv lock.

Reviewed by: jake


# 94936 17-Apr-2002 mux

Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by: peter


# 93600 01-Apr-2002 jake

Move the CTASSERT macro from MD code to systm.h alongside KASSERT so other
code can use it. This takes a single constant argument and fails to compile
if it is 0 (false). The main application of this is to make assertions about
structure sizes at compile time, in order to validate assumptions made in
other code. Examples:

CTASSERT(sizeof(struct foo) == FOO_SIZEOF);
CTASSERT(sizeof(struct foo) == (1 << FOO_SHIFT));

Requested by: jhb, phk


# 93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 93496 31-Mar-2002 phk

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


# 93451 30-Mar-2002 phk

Move the "dumping" variable from systm.h to conf.h.


# 93008 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


# 92976 22-Mar-2002 rwatson

Merge from TrustedBSD MAC branch:

Move the network code from using cr_cansee() to check whether a
socket is visible to a requesting credential to using a new
function, cr_canseesocket(), which accepts a subject credential
and object socket. Implement cr_canseesocket() so that it does a
prison check, a uid check, and add a comment where shortly a MAC
hook will go. This will allow MAC policies to seperately
instrument the visibility of sockets from the visibility of
processes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 92719 19-Mar-2002 alfred

Remove __P


# 92252 13-Mar-2002 alfred

Fixes to make select/poll mpsafe.

Problem:
selwakeup required calling pfind which would cause lock order
reversals with the allproc_lock and the per-process filedesc lock.
Solution:
Instead of recording the pid of the select()'ing process into the
selinfo structure, actually record a pointer to the thread. To
avoid dereferencing a bad address all the selinfo structures that
are in use by a thread are kept in a list hung off the thread
(protected by sellock). When a selwakeup occurs the selinfo is
removed from that threads list, it is also removed on the way out
of select or poll where the thread will traverse its list removing
all the selinfos from its own list.

Problem:
Previously the PROC_LOCK was used to provide the mutual exclusion
needed to ensure proper locking, this couldn't work because there
was a single condvar used for select and poll and condvars can
only be used with a single mutex.
Solution:
Introduce a global mutex 'sellock' which is used to provide mutual
exclusion when recording events to wait on as well as performing
notification when an event occurs.

Interesting note:
schedlock is required to manipulate the per-thread TDF_SELECT
flag, however if given its own field it would not need schedlock,
also because TDF_SELECT is only manipulated under sellock one
doesn't actually use schedlock for syncronization, only to protect
against corruption.

Proc locks are no longer used in select/poll.

Portions contributed by: davidc


# 91429 27-Feb-2002 jhb

Back out part of KSE/M2 that snuck in under the radar: changing the
prototype of bzero() on the i386 to have a volatile first argument.

Requested by: bde, jake


# 90503 11-Feb-2002 bde

Move the declaration of panic() from sys/param.h back to sys/systm.h.

Almost all .c files have to include <sys/systm.h> for more than its
declaration of panic(), so little is gained from declaring panic() in
a wrong place. This probably depends on missing garbage collection
of the includes of <sys/systm.h> that were added to get snprintf()
declared for old versions of the ktr macros.


# 90491 10-Feb-2002 phk

GC the unused einval()

Obtained from: ~bde/sys.dif.gz


# 88633 29-Dec-2001 alfred

Make AIO a loadable module.

Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int". Fix
it by using an inline version of the AS macro against the syscall
arguments. (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.


# 88088 17-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


# 87546 08-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


# 86313 12-Nov-2001 ps

Fix a signed bug in the crashdump code for systems with > 2GB of ram.

Reviewed by: peter


# 85385 23-Oct-2001 jhb

- Change getenv_quad() to return an int instead of a quad_t since it
returns an success/failure code rather than the actual value.
- Add getenv_string() which copies a string from the environment to another
string and returns true on success.


# 83742 20-Sep-2001 rwatson

o Rename u_cansee() to cr_cansee(), making the name more comprehensible
in the face of a rename of ucred to cred, and possibly generally.

Obtained from: TrustedBSD Project


# 83595 17-Sep-2001 peter

Fix a fatal type mismatch (char *static_env; vs char static_env[]).

Submitted by: bde


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 82693 31-Aug-2001 rwatson

o Screw over users of the kern.{security.,}suser_permitted sysctl again,
by renaming it to kern.security.suser_enabled. This makes the name
consistent with other use: "permitted" now refers to a specific right
or privilege, whereas "enabled" refers to a feature. As this hasn't
been MFC'd, and using this destroys a running system currently, I believe
the user base of the sysctl will not be too unhappy.
o While I'm at it, un-staticize and export the supporting variable, as it
will be used by kern_cap.c shortly.

Obtained from: TrustedBSD Project


# 82393 27-Aug-2001 peter

Enable hardwiring of things like tunables from embedded enironments
that do not start from loader(8).


# 82127 22-Aug-2001 dillon

Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by: freebsd-smp, jake


# 81397 10-Aug-2001 jhb

- Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
The only caller outside of M_ASLEEP was the ata driver, which called both
asleep() and await() with spl-raised, so there was no need for the
asleep() and await() pair. M_ASLEEP was unused.

Reviewed by: jasone, peter


# 80766 31-Jul-2001 jhb

Apply the cluebat to myself and undo the await() -> mawait() rename. The
asleep() and await() functions split the functionality of msleep() up into
two halves. Only the asleep() half (which is what puts the process on the
sleep queue) actually needs the lock usually passed to msleep() held to
prevent lost wakeups. await() does not need the lock held, so the lock
can be released prior to calling await() and does not need to be passed in
to the await() function. Typical usage of these functions would be as
follows:

mtx_lock(&foo_mtx);
... do stuff ...
asleep(&foo_cond, PRIxx, "foowt", hz);
...
mtx_unlock&foo_mtx);
...
await(-1, -1);

Inspired by: dillon on the couch at Usenix


# 80418 26-Jul-2001 peter

Move param.c out of the conf directory and make it fully dynamic.
Tunables are now derived at boot time from maxusers. ie: change maxusers
via a tunable and all the derivative settings change. You can change
the other tunables individually as well. Even hz etc is tunable.


# 79419 08-Jul-2001 julian

Small whitespace fix.
BDE'd by: BDE


# 79418 08-Jul-2001 julian

A set of changes to reduce the number of include files the kernel
takes from /usr/include. I cannot check them on alpha.. (will try beast)

Briefly looked at by: Warner Losh <imp@harmony.village.org>


# 79343 05-Jul-2001 jake

Backout mwakeup, etc.


# 79172 03-Jul-2001 jake

Implement mwakeup, mwakeup_one, cv_signal_drop and cv_broadcast_drop.
These take an additional mutex argument, which is dropped before any
processes are made runnable. This can avoid contention on the mutex
if the processes would immediately acquire it, and is done in such a
way that wakeups will not be lost.

Reviewed by: jhb


# 78247 15-Jun-2001 peter

Fix some warnings in kern_environment.c. Make the getenv*() family
take a const 'name', since they dont modify anything.
159: warning: passing arg 1 of `getenv_int' discards qualifiers...
167: warning: passing arg 1 of `getenv' discards qualifiers from pointer..


# 76564 14-May-2001 tanimura

- Convert msleep(9) in select(2) and poll(2) to cv_*wait*(9).

- Since polling should not involve sleeping, keep holding a
process lock upon scanning file descriptors.

- Hold a reference to every file descriptor prior to entering
polling loop in order to avoid lock order reversal between
lockmgr and p_mtx upon calling fdrop() in fo_poll().
(NOTE: this work has not been done for netncp and netsmb
yet because a socket itself has no reference counts.)

Reviewed by: jhb


# 76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# 74956 28-Mar-2001 rwatson

o introduce u_cansee(), which performs access control checks between
two subject ucreds. Unlike p_cansee(), u_cansee() doesn't have
process lock requirements, only valid ucred reference requirements,
so is prefered as process locking improves. For now, back p_cansee()
into u_cansee(), but eventually p_cansee() will go away.

Reviewed by: jhb, tmm
Obtained from: TrustedBSD Project


# 74890 27-Mar-2001 ps

Last commit was broken.. It always prints '[CTRL-C to abort]'.
Move duplicate code for printing the status of the dump and checking
for abort into a separate function.

Pointy hat to: me


# 72908 22-Feb-2001 jhb

The ia64 hasn't needed machine/ipl.h included in sys/systm.h for a while
now.


# 72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


# 72376 11-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 71242 19-Jan-2001 peter

Remove unused splsoftcambio(), splsoftcamnet(), splq() and splz() inlines.


# 71240 19-Jan-2001 peter

Remove the static splXXX functions and replace them by static __inline
stubs. Remove the xxx_imask variables which have been all but gone for
a while.


# 70239 20-Dec-2000 phk

Replace logwakeup() with "int msgbuftrigger". There is little
point in calling a function just to set a flag.

Keep better track of the syslog FAC/PRI code and try to DTRT if
they mingle.

Log all writes to /dev/console to syslog with <console.info>
priority. The formatting is not preserved, there is no robust,
way of doing it. (Ideas with patches welcome).


# 69664 06-Dec-2000 peter

Untangle vfsinit() a bit. Use seperate sysinit functions rather than
having a super-function calling bits all over the place.


# 69214 26-Nov-2000 phk

Simplify the tprintf() API.

Loose the special <sys/tprintf.h> #include file.


# 69211 26-Nov-2000 phk

Make log(-1, ...) do what addlog(...) did.

Replace all uses of addlog(...) with log(-1, ...)

Remove bogus "register" keywords in subr_prf.c

Make log() return void.


# 68794 15-Nov-2000 jhb

- Rename await() to mawait(). mawait() is to await() as msleep() is to
tsleep(). Namely, mawait() takes an extra argument which is a mutex
to drop when going to sleep. Just as with msleep(), if the priority
argument includes the PDROP flag, then the mutex will be dropped and will
not be reacquired when the process wakes up.
- Add in a backwards compatible macro await() that passes in NULL as the
mutex argument to mawait().


# 68443 07-Nov-2000 jhb

Remove the now unused and unneeded splassert macros and prototypes.


# 67893 29-Oct-2000 phk

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


# 67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


# 67364 20-Oct-2000 jhb

GC the unused safepri variable.


# 67158 15-Oct-2000 phk

Move DELAY() from <machine/clock.h> to <sys/systm.h>


# 67093 13-Oct-2000 ps

Do not allocate a callout for all crashdumps, not just when you panic.


# 66698 05-Oct-2000 jhb

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


# 66457 29-Sep-2000 dfr

Add ia64 support.


# 65708 10-Sep-2000 jake

Rename tsleep to msleep and add a mutex argument, which is
released before sleeping and re-acquired before msleep
returns. A compatibility cpp macro has been provided for
tsleep to avoid changing all occurences of it in the kernel.

Remove an assertion that the Giant mutex be held before
calling tsleep or asleep.

This is intended to serve the same purpose as condition
variables, but does not preclude their addition in the
future.

Approved by: jasone
Obtained from: BSD/OS


# 65268 30-Aug-2000 msmith

Make it possible to pass boot()'s flags to shutdown_nice() so that the
kernel can instigate an orderly shutdown but still determine the form of
that shutdown. Make it possible eg. to cleanly shutdown and power off the
system under ACPI when the power button is pressed.


# 61287 05-Jun-2000 rwatson

o bde suggested moving the SYSCTL from kern_mib to the more appropriate
kern_prot, which cleans up some namespace issues
o Don't need a special handler to limit un-setting, as suser is used to
protect suser_permitted, making it one-way by definition.

Suggested by: bde


# 61282 05-Jun-2000 rwatson

o Introduce kern.suser_permitted, a sysctl that disables the suser_xxx()
returning anything but EPERM.
o suser is enabled by default; once disabled, cannot be reenabled
o To be used in alternative security models where uid0 does not connote
additional privileges
o Should be noted that uid0 still has some additional powers as it
owns many important files and executables, so suffers from the same
fundamental security flaws as securelevels. This is fixed with
MAC integrity protection code (in progress)
o Not safe for consumption unless you are *really* sure you don't want
things like shutdown to work, et al :-)

Obtained from: TrustedBSD Project


# 61077 29-May-2000 dfr

Declare splsoftqtassert().


# 61033 28-May-2000 dfr

Add taskqueue system for easy-to-use SWIs among other things.

Reviewed by: arch


# 58283 19-Mar-2000 ps

Add conditional splassert.

Reviewed by: peter


# 57704 02-Mar-2000 dufault

I applied the wrong patch set. Back out anything associated
with the known bogus currtpriority. This undoes the previous changes to
sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h

Now we have the patch set approved by bde.

Approved by: bde


# 57701 02-Mar-2000 dufault

Patches that eliminate extra context switches in FIFO case.
Fixes p1003_1b regression test in the simple case of no RR and
FIFO processes competing.

Reviewed by: jkh, bde


# 57294 17-Feb-2000 bde

Don't include <machine/ipl.h> in <sys/systm.h> in the i386 case. This
fixes some namespace pollution in general and breakage of modules that
aren't in the sys tree in particular (<machine/ipl.h> includes further
headers that aren't installed under /usr/include).

Reimplemented SPLASSERT() so that it is more machine independent and
less bloated and doesn't require the <machine/ipl.h> include spam.
In particular, don't assume that `cpl' can be printed using %08x
format. The alpha arch doesn't even have `cpl'. SPLASSERT() was
harmless on alphas because it isn't actually used.


# 56079 16-Jan-2000 jmb

Add SPLASSERT() macro. SPLASSERT() compiles to a no-op
unless both "option INVARIANTS" and "options INVARIANT_SUPPORT"
are defined in the kernel's config(8) file.

SPLASSERT(expression, msg) used KASSERT to check that the
expression is true, panic()ing the kernel otherwise.

Approved by: jkh
Reviewed by: jdp, dfr, phk, eivind and green


# 54471 12-Dec-1999 green

Move the wakeup_one() prototype from proc.h to systm.h. It now hangs
out with it's sibling, wakeup().


# 54216 06-Dec-1999 archie

Prototypes should either have explicit parameter names, or not,
but some combination of the two.


# 53899 29-Nov-1999 phk

Report swapdevices as cdevs rather than bdevs.

Remove unused dev2budev() function.


# 53648 23-Nov-1999 archie

Change the prototype of the strto* routines to make the second
parameter a char ** instead of a const char **. This make these
kernel routines consistent with the corresponding libc userland
routines.

Which is actually 'correct' is debatable, but consistency and
following the spec was deemed more important in this case.

Reviewed by (in concept): phk, bde


# 53592 22-Nov-1999 phk

Isolate the swapdev_vp "not quite" vnode in the only source file which
needs it now that /dev/drum is gone.

Reviewed by: eivind, peter


# 52946 06-Nov-1999 mjacob

add in getenv_quad function


# 52843 03-Nov-1999 phk

Move isfoo() and friends to the newly created sys/ctype.h.

Urged by: bde


# 52815 02-Nov-1999 archie

Consolidate some of the various ctype(3) macros in one location.


# 52757 01-Nov-1999 phk

Add strtol & strtoul to kernel. Derived from libc versions.


# 51680 26-Sep-1999 eivind

Move the declaration of panic() from sys/systm.h to sys/param.h.
Rationale: Wider access, so we can add assertions to header files.
panicstr is still in sys/systm.h

Suggested by: phk
Discussed with: peter


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50107 21-Aug-1999 msmith

Implement a new generic mechanism for attaching handler functions to
events, in order to pave the way for removing a number of the ad-hoc
implementations currently in use.

Retire the at_shutdown family of functions and replace them with
new event handler lists.

Rework kern_shutdown.c to take greater advantage of the use of event
handlers.

Reviewed by: green


# 49047 24-Jul-1999 dfr

This makes the in kernel printf routines conform to the documented
behavior of their userland counterparts with respect to return values.

Submitted by: Matthew N. Dodd <winter@jurai.net>


# 49043 23-Jul-1999 alc

atomic.h:
Change "void *" to "volatile TYPE *", improving type safety
and eliminating some warnings (e.g., mp_machdep.c rev 1.106).

cpufunc.h:
Eliminate setbits. As defined, it's not precisely correct;
and it's redundant. (Use atomic_set_int instead.)

ipl_funcs.c:
Use atomic_set_int instead of setbits.

systm.h:
Include atomic.h.

Reviewed by: bde


# 48948 20-Jul-1999 green

Make a dev2budev() function, and use it. This refixes pstat (working, broken,
working, broken, working) and savecore (working, working, broken, working,
working).

Sorta Reviewed by: phk


# 48868 17-Jul-1999 phk

Centralize dumpdev handling.


# 48859 17-Jul-1999 phk

I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name. s/umakedev/makeudev/g


# 47644 31-May-1999 dt

Remove declaration of the nblkdev and nchrdev variables.


# 47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


# 45897 21-Apr-1999 peter

Stage 1 of a cleanup of the i386 interrupt registration mechanism.
Interrupts under the new scheme are managed by the i386 nexus with the
awareness of the resource manager. There is further room for optimizing
the interfaces still. All the users of register_intr()/intr_create()
should be gone, with the exception of pcic and i386/isa/clock.c.


# 44666 11-Mar-1999 phk

Make even more of the PPSAPI implementations generic.

FLL support in hardpps()

Various magic shuffles and improved comments

Style fixes from Bruce.


# 44495 05-Mar-1999 bde

Fixed bitrot in the types of fusword() and susword(). swords will
probably always be 16 bits if they exist at all, and fusword() and
susword() are only used in i386 code, so there aren't any portability
functions with them.


# 43311 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 42706 15-Jan-1999 msmith

Add getenv_int(), specifically for retrieving integer values from kernel
environment variables. This makes it easy to pass tuning parameters
in from the bootloader.


# 42680 14-Jan-1999 msmith

Add sscanf/vsscanf/strtoq/strtouq to the kernel. Initially these will be used
for parsing kernel environment values, although they have utility elsewhere.


# 42453 09-Jan-1999 eivind

KNFize, by bde.


# 42408 08-Jan-1999 eivind

Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by: msmith


# 41971 21-Dec-1998 dillon

Add asleep() and await() support. Currently highly experimental. A
small support structure had to be added to the proc structure, and
a few minor conditional panics no longer apply.


# 41479 03-Dec-1998 archie

Add snprintf(3) and vsnprintf(3) capability to the kernel.
Reviewed by: bde


# 40751 30-Oct-1998 msmith

Add the ability to specify where on the at_shutdown queue a handler is
installed.

Remove cpu_power_down, and replace it with an entry at the end of the
SHUTDOWN_FINAL queue in the only place it's used (APM).

Submitted by: Some ideas from Bruce Walter <walter@fortean.com>


# 40096 08-Oct-1998 msmith

Missing defines for the kernel environment and module metadata lookup
functions


# 39248 15-Sep-1998 gibbs

system.h:
Add definition for at_shutdown(9)'s SHUTDOWN_FINAL.

ccdvar.h:
chio.h:
mtio.h:
Add CAM support.


# 38874 06-Sep-1998 ache

Store formatted panic string in static buffer to make it available later
for savecore.
Previous code give only panic format to savecore


# 38130 05-Aug-1998 bde

Removed prototype for gone-away hzto().

Fixed disorder and missing K&R support in prototypes for interrupt
functions.


# 37614 13-Jul-1998 bde

Added macros __printflike() and __scanflike() to <sys/cdefs.h>.
Use them to `make gcc -Wformat' check formats for all printf-like
and scanf-like functions in /usr/src except for the err()/warn()
family. err() isn't quite printf-like since its format arg can
legitimately be NULL. syslog() isn't quite printf-like, but gcc
already accepts %m, even for plain printf() when it shouldn't.


# 36809 09-Jun-1998 bde

Pass lists of possible root devices and their names up to the
machine-independent code and try mounting the devices in the
lists instead of guessing alternative root devices in a machine-
dependent way.

autoconf.c:
Reject preposterous slice numbers instead of silently converting
them to COMPATIBILITY_SLICE.

Don't forget to force slice = COMPATIBILITY_SLICE in the floppy
device name.

Eliminated most magic numbers and magic device names in setroot().

Fixed dozens of style bugs.

vfs_conf.c:
Put the actual root device name instead of "root_device" in the
mount struct if the actual name is available. This is useful after
booting with -s. If it were set in all cases then it could be used
to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 33777 24-Feb-1998 bde

Forward declare more structs that are used in prototypes here - don't
depend on <sys/types.h> forward declaring common ones.


# 32677 21-Jan-1998 gibbs

Add prototypes for swi_vm, setsoftvm, schedsoftvm, and splsoftvm that were
missed when I originally committed the bus dma code.


# 32513 14-Jan-1998 phk

Move almost all the ntp related stuff from kern_clock.c to
kern_ntptime.c. The only bit left over is that which is executed
in all calls to hardclock(). Various cleanups and staticizing
along the road.


# 32390 10-Jan-1998 phk

Whoops. softclock is called from doreti_swi as well. Abandon call from
hardclock().


# 32388 10-Jan-1998 phk

Effect the divorce of kern_clock.c and kern_timeout.c (which was
repository copied from kern_clock.c)


# 31960 23-Dec-1997 nate

- Add prototype for adjust_timeout_calltodo().

Submitted/forgotten by: Ken Key <key@cs.utk.edu>


# 31337 21-Nov-1997 bde

Fixed setting of `safepri'. It should be SWI_AST_MASK most of the
time, but was left at 0. This caused the "can't happen" case in
splz_swi to happen for panics when tsleep() calls splx(safepri)
and there is a SWI_AST pending. This was harmless because the
the error handling happens to be right. Debugging this was tricky
because debugger traps force SWI_AST_MASK on in `cpl'.


# 31333 21-Nov-1997 bde

Const poisoning from ks_shortdesc.


# 31276 18-Nov-1997 bde

Staticized boot().


# 30282 10-Oct-1997 phk

Remove a bunch of unused malloc types.
A couple of potential bogons flagged.
Various prototypes changed.


# 29683 21-Sep-1997 gibbs

buf.h:
Change the definition of a buffer queue so that bufqdisksort can
properly deal with bordered writes.

Add inline functions for accessing buffer queues. This should be
considered an opaque data structure by clients.

callout.h:
New callout implementation.

device.h:
Add support for CAM interrupts.

disk.h:
disklabel.h:
tqdisksort->bufqdisksort

kernel.h:
Add new configuration entries for configuration hooks and calling
cpu_rootconf and cpu_dumpconf.

param.h:
Add a priority for sleeping waiting on config hooks.

proc.h:
Update for new callout implementation.

queue.h:
Add TAILQ_HEAD_INITIALIZER from NetBSD.

systm.h:
Add prototypes for cpu_root/dumpconf, splcam, splsoftcam, etc..


# 29509 16-Sep-1997 bde

Removed declaration of nonexistent function fuibyte().

Sorted some declarations.

Fixed missing __P(())'s.

Removed `timeout_func_t (pointer to timeout function) typedef. It was
mainly used in bogus casts. The more useful `timeout_t' (timeout function)
typedef should be used instead.

Cleaned up callout declarations and comments.


# 29208 07-Sep-1997 bde

Removed yet more vestiges of config-time swap configuration and/or
cleaned up nearby cruft.


# 29190 07-Sep-1997 bde

Some staticized variables were still declared to be extern.


# 28440 20-Aug-1997 fsmp

Moved splq() to isa/ipl_funcs.c for SMP only.
This is in preperation for moving all cpl accesses behind a critical region lock.


# 27997 08-Aug-1997 julian

Use up 4 precious bytes to give the kernel a hook to
support hardware watchdogs. The actual functions would be supplied in an LKM
or a linked file, but they need to hang off something.


# 27993 08-Aug-1997 dyson

VM86 kernel support.
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>


# 27585 21-Jul-1997 bde

Store SWI_MASK in a variable so that LKMs can use it portably.


# 26312 31-May-1997 peter

Add prototypes for the spl* funcs and add externs for *_imask. Leaving
the *_imask down in the isa machine dependent layers requires code changes
to all pci drivers, but the interrupt registration mechanism is in flux
at the moment. These can go away when the interface is cleaned and settled.


# 26259 29-May-1997 peter

forward declare struct timeval so that pcibus.c doesn't get a warning.
(it doesn't #include <sys/time.h> since it doesn't need it)


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21101 30-Dec-1996 jhay

Update our kernel ntp code to the latest from David Mills. The main change
is the addition of the FLL code, which is used by the latest versions of
xntpd. The kernel PPS code is also updated, although I can't test that yet.


# 18884 12-Oct-1996 bde

Moved declarations of tsleep() and wakeup() from proc.h to systm.h so
that proc.h doesn't have to be included so often.


# 18883 12-Oct-1996 bde

Removed verbose comment about `securelevel'. It just duplicated part
of init.8 except for bugs and anachronisms.


# 18556 29-Sep-1996 bde

Started unspamming <sys/systm.h>. Don't include <machine/stdarg.h>
to get the declaration of va_list; just use _BSD_VA_LIST. Fixed
the 2 places that need <machine.stdarg.h> but didn't include it
explicitly.


# 18277 13-Sep-1996 bde

Don't use __dead in the kernel. It was an obfuscation for gcc >= 2.5
and a no-op for gcc >= 2.6.


# 17975 31-Aug-1996 bde

cpu_boot() always returns, so don't declare it as __dead*.


# 17768 22-Aug-1996 julian

Some cleanups to the callout lists recently added.
note that at_shutdown has a new parameter to indicate When
during a shutdown the callout should be made. also
add a RB_POWEROFF flag to reboot "howto" parameter..
tells the reboot code in our at_shutdown module to turn off the UPS
and kill the power. bound to be useful eventually on laptops


# 17658 19-Aug-1996 julian

move all functions related to shutting down to one file
called kern_shutdown.c

note: I couldn't see anything machine dependant in the
functions boot() and dumpsys() which were in machdep.c
I have left a prototype for cpu_boot() which would go in
machdep.c, but I have nothing to put in it. Iexpect others will
let me know in no uncertain ways that this or that is machine dependant
and should be there, but I'll way for that to happen.. :)

I haven't actually taken the functions OUT of machdep
or anywhere else yet.. I'm checking in this file so others can have a look
at it and comment. SO PLEASE DO COMMENT!

I am also (in another checkin) addinf a man(9) page for the new
at_shotdown().. er freudian slip there.. at_shutdown() call
so have a look at that (and at_exit and at_fork as well)
and feed me comments..

I'll heck in the changes to make these (shutdown) changes active tomorrow
if no-one objects too strongly..


# 16875 01-Jul-1996 bde

Moved declarations of non-cpu things from <machine/cpufunc.h> to better
places.


# 15680 08-May-1996 gpalmer

Clean up various compiler warnings. Most (if not all) were benign

Reviewed by: bde


# 15113 07-Apr-1996 bde

systm.h:
Moved declaration of bootverbose to a better place. It isn't
machine-dependent.

proc.h:
Moved declaration of cpu_fork() to a better place. Only its
implementation is machine-dependent.


# 14508 11-Mar-1996 hsu

Merge in Lite2: add function prototype.
Note: Lite2 struct sysent changes need to go in FreeBSD sysent.h instead.
Leave out prototype for nosys() until convert makesyscalls.sh
over to Lite2 format, if we ever do so.
Reviewed by: davidg & bde


# 13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 13697 29-Jan-1996 gibbs

Add LIST_INSERT_BEFORE and TAILQ_INSERT_BEFORE. These are used by the
new eisaconf code.


# 13459 16-Jan-1996 bde

Moved BCD declarations to the correct header (libkern.h).

Fixed BCD declarations. They didn't match their definitions...

libkern.h, bcd.c:
KNFised. `indent' worked 99% perfectly on bcd.c. It worked 99%
_imperfectly_ on subr_prf.c.


# 13456 16-Jan-1996 bde

Removed declarations of nonexistent functions copyoutstr(), fuiword()
and suiword(). They are no longer referenced in the machine-independent
code (I think fuiword() and suiword() were only used by old versions of
ptrace(), and copyoutstr() by old versions of exec).

Added `const' where appropriate.

Changed u_int to size_t' where appropriate.

Named last arg of copystr() and copyinstr() better.


# 13446 15-Jan-1996 phk

Get rid of two and a half printf in the kernel.
Add more features to the one remaining to handle the job:
+ signed quantity.
# alternate format
- left padding
* read width as next arg.
n numeric in (argument specified) default radix.

Fix the DDB debugger to use these.
Use vprintf in debug routine in pcvt.

The warnings from gcc may become more wrong and intolerable because
of this.

Warning: I have not checked the entire source for unsupported or
changed constructs, but generally belive that there are only a few.

Suggested by: bde


# 13438 15-Jan-1996 phk

Make bin2bcd and bcd2bin global macroes instead of having local
implementations all over the place.


# 13414 13-Jan-1996 phk

Avoid bzero becomming a common symbol in all .o files.


# 13155 01-Jan-1996 peter

Fix the reversed source and dest args to bcopy() in the kernel space
sysctl handler (ouch!)

Add a "const" qualifier to the source of the copyin() and copyout()
functions - the other const warning in kern_sysctl.c was silenced when
copyout was declared as having a const source.. (which it is)


# 13086 28-Dec-1995 dg

Made bzero a function vector and added a 586/686 optimized version of
bzero.
Deprecated blkclr (removed it).
Removed some old cruft from cpufunc.h.

The optimized bzero was submitted by Torbjorn Granlund <tege@matematik.su.se>
The kernel adaption and other changes by me.


# 12163 09-Nov-1995 bde

Finished nuking enxio, enodev and enoioctl.


# 11332 07-Oct-1995 swallace

Remove prototype definitions from <sys/systm.h>.
Prototypes are located in <sys/sysproto.h>.

Add appropriate #include <sys/sysproto.h> to files that needed
protos from systm.h.

Add structure definitions to appropriate files that relied on sys/systm.h,
right before system call definition, as in the rest of the kernel source.

In kern_prot.c, instead of using the dummy structure "args", create
individual dummy structures named <syscall>_args. This makes
life easier for prototype generation.


# 11329 07-Oct-1995 swallace

Remove compat int from wait_args structure since no longer used or desired.


# 10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 9405 05-Jul-1995 dg

Killed "maxmem" declaration. We don't have that variable in FreeBSD.


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8504 14-May-1995 dg

Changed swap partition handling/allocation so that it doesn't
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).

From Poul-Henning:

The visible effect is this:

As default, unless
options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.

There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.

The invisible effect is that:

Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.

Reviewed by: John Dyson, David Greenman
Submitted by: Poul-Henning Kamp, with minor changes by me.


# 8215 02-May-1995 dg

Added prototype for memcpy(). Changed size argument of "b" functions to
size_t.


# 7612 04-Apr-1995 dg

Added prototype for phashinit() function.


# 7566 01-Apr-1995 joerg

subr_prf.c used to provide an exported function kprintf(), but only had
a private declaration for it. Declare the function publically instead.


# 7430 28-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.


# 7109 17-Mar-1995 phk

<libkern/libkern.h> has moved to <sys/libkern.h> (repository copy).
Since /usr/include/libkern doesn't and shouldn't exist, this is the
least evil way to handle this.


# 7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 6358 14-Feb-1995 phk

YFfix.
added int susword __P((void *base, int word));


# 3484 09-Oct-1994 phk

Cosmetics. (sort of) Added 19 prototypes.


# 3304 02-Oct-1994 phk

Prototypes, prototypes and even more prototypes. Not quite done yet, but
getting closer all the time.


# 2865 18-Sep-1994 bde

Remove "#ifdef notdef" around declaration of fuibyte(). fuibyte() _is_
actually used.

Remove "#ifdef __GNUC__" around some __dead* declarations. __dead* is
harmless if __GNUC__ is not defined.

Uniformize idempotency #ifdef.


# 2811 15-Sep-1994 bde

Add some prototypes.


# 2256 24-Aug-1994 sos

Changes preparing for iBCS support

Reviewed by:
Submitted by:


# 2165 21-Aug-1994 paul

Made them all idempotent.
Reviewed by:
Submitted by:


# 2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 2059 13-Aug-1994 dg

Made the kernel compile cleanly with gcc 2.6.0. Thanks go to Bruce
Evans for suggesting a method to detect various versions of gcc.


# 1858 05-Aug-1994 dg

Converted 'vmunix' to 'kernel'.


# 1830 04-Aug-1994 dg

Nuke redefinition of insque and remque.


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources