History log of /freebsd-current/sys/vm/vm_glue.c
Revision Date Author Comments
# 9e016408 10-May-2024 John Baldwin <jhb@FreeBSD.org>

vm: Change the return types of kernacc and useracc to bool

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D45156


# 661a83f9 28-Apr-2024 Mark Johnston <markj@FreeBSD.org>

vm: Fix error handling in vm_thread_stack_back()

vm_object_page_remove() wants to busy the page, but that won't work
here. (Kernel stack pages are always busy.)

Make the error handling path look more like vm_thread_stack_dispose().

Reported by: pho
Reviewed by: kib, bnovkov
Fixes: 7a79d0669761 ("vm: improve kstack_object pindex calculation to avoid pindex holes")
Differential Revision: https://reviews.freebsd.org/D45019


# 800da341 22-Apr-2024 Mark Johnston <markj@FreeBSD.org>

thread: Simplify sanitizer integration with thread creation

fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA. In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.

This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
it.
- Avoid redundant shadow stack initialization: thread_alloc()
initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().

No functional change intended.

Reviewed by: khng
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44891


# 7a79d066 09-Apr-2024 Bojan Novković <bnovkov@FreeBSD.org>

vm: improve kstack_object pindex calculation to avoid pindex holes

This commit replaces the linear transformation of kernel virtual
addresses to kstack_object pindex values with a non-linear
scheme that circumvents physical memory fragmentation caused by
kernel stack guard pages. The new mapping scheme is used to
effectively "skip" guard pages and assign pindices for
non-guard pages in a contiguous fashion.

The new allocation scheme requires that all default-sized kstack KVAs
come from a separate, specially aligned region of the KVA space.
For this to work, this commited introduces a dedicated per-domain
kstack KVA arena used to allocate kernel stacks of default size.
The behaviour on 32-bit platforms remains unchanged due to a
significatly smaller KVA space.

Aside from fullfilling the requirements imposed by the new scheme, a
separate kstack KVA arena facilitates superpage promotion in the rest
of kernel and causes most kstacks to have guard pages at both ends.

Reviewed by: alc, kib, markj
Tested by: markj
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D38852


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# bbb6228e 13-Feb-2023 Mateusz Guzik <mjg@FreeBSD.org>

vm: ansify

Sponsored by: Rubicon Communications, LLC ("Netgate")


# f54882a8 05-Jan-2022 Konstantin Belousov <kib@FreeBSD.org>

Remove special kstack allocation code for mips.

The arch required two-pages alignment due to single TLB entry caching
two consequtive mappings.

Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D33763


# c28e39c3 03-Nov-2021 Gordon Bergling <gbe@FreeBSD.org>

Fix a common typo in syctl descriptions

- s/maxiumum/maximum/

MFC after: 3 days


# 10094910 10-Aug-2021 Mark Johnston <markj@FreeBSD.org>

uma: Add KMSAN hooks

For now, just hook the allocation path: upon allocation, items are
marked as initialized (absent M_ZERO). Some zones are exempted from
this when it would otherwise raise false positives.

Use kmsan_orig() to update the origin map for UMA and malloc(9)
allocations. This allows KMSAN to print the return address when an
uninitialized UMA item is implicated in a report. For example:
panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe

Sponsored by: The FreeBSD Foundation


# 9246b309 13-May-2021 Mark Johnston <markj@FreeBSD.org>

fork: Suspend other threads if both RFPROC and RFMEM are not set

Otherwise, a multithreaded parent process may trigger races in
vm_forkproc() if one thread calls rfork() with RFMEM set and another
calls rfork() without RFMEM.

Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to
see if the address space is shared.

Reported by: syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com
Reported by: syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30220


# 244f3ec6 13-Apr-2021 Mark Johnston <markj@FreeBSD.org>

kstack: Add KASAN state transitions

We allocate kernel stacks using a UMA cache zone. Cache zones have
KASAN disabled by default, but in this case it makes sense to enable it.

Reviewed by: andrew
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29457


# f7db0c95 04-Nov-2020 Mark Johnston <markj@FreeBSD.org>

vmspace: Convert to refcount(9)

This is mostly mechanical except for vmspace_exit(). There, use the new
refcount_release_if_last() to avoid switching to vmspace0 unless other
processes are sharing the vmspace. In that case, upon switching to
vmspace0 we can unconditionally release the reference.

Remove the volatile qualifier from vm_refcnt now that accesses are
protected using refcount(9) KPIs.

Reviewed by: alc, kib, mmel
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27057


# 89d2fb14 08-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Add interruptible variant of vm_wait(9), vm_wait_intr(9).

Also add msleep flags argument to vm_wait_doms(9).

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652


# f13fa9df 26-Apr-2020 Mark Johnston <markj@FreeBSD.org>

Use a single VM object for kernel stacks.

Previously we allocated a separate VM object for each kernel stack.
However, fully constructed kernel stacks are cached by UMA, so there is
no harm in using a single global object for all stacks. This reduces
memory consumption and makes it easier to define a memory allocation
policy for kernel stack pages, with the aim of reducing physical memory
fragmentation.

Add a global kstack_object, and use the stack KVA address to index into
the object like we do with kernel_object.

Reviewed by: kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24473


# 7aaf252c 28-Feb-2020 Jeff Roberson <jeff@FreeBSD.org>

Convert a few triviail consumers to the new unlocked grab API.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23847


# e9ceb9dd 19-Feb-2020 Jeff Roberson <jeff@FreeBSD.org>

Don't release xbusy on kmem pages. After lockless page lookup we will not
be able to guarantee that they can be racquired without blocking.

Reviewed by: kib
Discussed with: markj
Differential Revision: https://reviews.freebsd.org/D23506


# 91e31c3c 22-Jan-2020 Jeff Roberson <jeff@FreeBSD.org>

Consistently use busy and vm_page_valid() rather than touching page bits
directly. This improves API compliance, asserts, etc.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23283


# a314aba8 11-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vm: add missing CLTFLAG_MPSAFE annotations

This covers all vm/* files.


# dfe13344 04-Jan-2020 Jeff Roberson <jeff@FreeBSD.org>

UMA NUMA flag day. UMA_ZONE_NUMA was a source of confusion. Make the names
more consistent with other NUMA features as UMA_ZONE_FIRSTTOUCH and
UMA_ZONE_ROUNDROBIN. The system will now pick a select a default depending
on kernel configuration. API users need only specify one if they want to
override the default.

Remove the UMA_XDOMAIN and UMA_FIRSTTOUCH kernel options and key only off
of NUMA. XDOMAIN is now fast enough in all cases to enable whenever NUMA
is.

Reviewed by: markj
Discussed with: rlibby
Differential Revision: https://reviews.freebsd.org/D22831


# 91caa9b8 07-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vm: fix sysctl vm.kstack_cache_size change report

Cache gets resized correctly, but sysctl reports the wrong number:
# sysctl vm.kstack_cache_size=512
vm.kstack_cache_size: 128 -> 128

patched:
vm.kstack_cache_size: 128 -> 512

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22717
Fixes: r355002 "Revise the page cache size policy."


# 0f9e06e1 02-Dec-2019 Jeff Roberson <jeff@FreeBSD.org>

Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22611


# 003cf08b 22-Nov-2019 Mark Johnston <markj@FreeBSD.org>

Revise the page cache size policy.

In r353734 the use of the page caches was limited to systems with a
relatively large amount of RAM per CPU. This was to mitigate some
issues reported with the system not able to keep up with memory pressure
in cases where it had been able to do so prior to the addition of the
direct free pool cache. This change re-enables those caches.

The change modifies uma_zone_set_maxcache(), which was introduced
specifically for the page cache zones. Rather than using it to limit
only the full bucket cache, have it also set uz_count_max to provide an
upper bound on the per-CPU cache size that is consistent with the number
of items requested. Remove its return value since it has no use.

Enable the page cache zones unconditionally, and limit them to 0.1% of
the domain's pages. The limit can be overridden by the
vm.pgcache_zone_max tunable as before.

Change the item size parameter passed to uma_zcache_create() to the
correct size, and stop setting UMA_ZONE_MAXBUCKET. This allows the page
cache buckets to be adaptively sized, like the rest of UMA's caches.
This also causes the initial bucket size to be small, so only systems
which benefit from large caches will get them.

Reviewed by: gallatin, jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22393


# 4a9c211a 19-Sep-2019 Allan Jude <allanjude@FreeBSD.org>

sys/vm/vm_glue.c: Incorrect function name in panic string

Use __func__ to avoid this issue in the future.

Submitted by: Wuyang Chung <wuyang.chung1@gmail.com>
Reviewed by: markj, emaste
Obtained from: https://github.com/freebsd/freebsd/pull/410


# c7575748 10-Sep-2019 Jeff Roberson <jeff@FreeBSD.org>

Replace redundant code with a few new vm_page_grab facilities:
- VM_ALLOC_NOCREAT will grab without creating a page.
- vm_page_grab_valid() will grab and page in if necessary.
- vm_page_busy_acquire() automates some busy acquire loops.

Discussed with: alc, kib, markj
Tested by: pho (part of larger branch)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21546


# fee2a2fa 09-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486


# b5d239cb 28-Aug-2019 Mark Johnston <markj@FreeBSD.org>

Wire pages in vm_page_grab() when appropriate.

uiomove_object_page() and exec_map_first_page() would previously wire a
page after having grabbed it. Ask vm_page_grab() to perform the wiring
instead: this removes some redundant code, and is cheaper in the case
where the requested page is not resident since the page allocator can be
asked to initialize the page as wired, whereas a separate vm_page_wire()
call requires the page lock.

In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of
vm_page_unwire(PQ_NONE). The latter ensures that the page is dequeued
before returning, but this is unnecessary since vm_page_free() will
trigger a batched dequeue of the page.

Reviewed by: alc, kib
Tested by: pho (part of a larger patch)
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21440


# b48d4efe 25-Aug-2019 Mark Johnston <markj@FreeBSD.org>

Handle UMA_ANYDOMAIN in kstack_import().

The kernel thread stack zone performs first-touch allocations by
default, and must handle the case where the local memory domain
is empty. For most UMA zones this is handled in the keg layer,
but cache zones currently must implement a policy for this case.
Simply use a round-robin policy if UMA_ANYDOMAIN is passed.

Reported and tested by: bcran
Reviewed by: kib
Sponsored by: The FreeBSD Foundation


# 0b26119b 06-Aug-2019 Jeff Roberson <jeff@FreeBSD.org>

Cache kernel stacks in UMA. This gives us NUMA support, better concurrency,
and more statistics.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20931


# eeacb3b0 08-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Merge the vm_page hold and wire mechanisms.

The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended. __FreeBSD_version is bumped.

Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247


# 88ea538a 07-Jun-2019 Mark Johnston <markj@FreeBSD.org>

Replace uses of vm_page_unwire(m, PQ_NONE) with vm_page_unwire_noq(m).

These calls are not the same in general: the former will dequeue the
page if it is enqueued, while the latter will just leave it alone. But,
all existing uses of the former apply to unmanaged pages, which are
never enqueued in the first place. No functional change intended.

Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20470


# 54a3a114 13-May-2019 Mark Johnston <markj@FreeBSD.org>

Provide separate accounting for user-wired pages.

Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes. User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.

The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2). Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks. In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process. The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.

The choice to count virtual user-wired pages rather than physical
pages was done for simplicity. There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.

The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded. For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.

Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM. Users that wish to exceed the limit must tune
vm.max_user_wired.

Reviewed by: kib, ngie (mlock() test changes)
Tested by: pho (earlier version)
MFC after: 45 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19908


# 4c29d2de 23-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Refactor domainset iterators for use by malloc(9) and UMA.

Before this change we had two flavours of vm_domainset iterators: "page"
and "malloc". The latter was only used for kmem_*() and hard-coded its
behaviour based on kernel_object's policy. Moreover, its use contained
a race similar to that fixed by r338755 since the kernel_object's
iterator was being run without the object lock.

In some cases it is useful to be able to explicitly specify a policy
(domainset) or policy+iterator (domainset_ref) when performing memory
allocations. To that end, refactor the vm_dominset_* KPI to permit
this, and get rid of the "malloc" domainset_iter KPI in the process.

Reviewed by: jeff (previous version)
Tested by: pho (part of a larger patch)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17417


# 2801dd08 22-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Fix the build after r339601.

I committed some patches out of order and didn't build-test one of them.

Reported by: Jenkins, O. Hartmann <ohartmann@walstatt.org>
X-MFC with: r339601


# b0058196 22-Oct-2018 Mark Johnston <markj@FreeBSD.org>

Swap in processes unless there's a global memory shortage.

On NUMA systems, we would not swap in processes unless all domains
had some free pages. This is too conservative in general. Instead,
permit swapins so long as at least one domain has free pages, and add
a kernel stack NUMA policy which ensures that we will try to allocate
kernel stack pages from any domain.

Reported and tested by: pho, Jan Bramkamp <crest@bultmann.eu>
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17304


# c0352925 19-Sep-2018 Mateusz Guzik <mjg@FreeBSD.org>

vm: check for empty kstack cache before locking

The current cache logic checks the total number of stacks in the kernel,
which even on small boxes significantly exceeds the 128 limit (e.g. an
8-way box with zfs has almost 800 stacks allocated).

Stacks are cached earlier for each main thread.

As a result the code is rarely executed, but when it is then (on boxes like
the above) it always fails. Since there are no provisions made for NUMA and
release time is approaching, just do a quick check to avoid acquiring the
lock.

Approved by: re (kib)


# 23984ce5 06-Sep-2018 Mark Johnston <markj@FreeBSD.org>

Avoid resource deadlocks when one domain has exhausted its memory. Attempt
other allowed domains if the requested domain is below the minimum paging
threshold. Block in fork only if all domains available to the forking
thread are below the severe threshold rather than any.

Submitted by: jeff
Reported by: mjg
Reviewed by: alc, kib, markj
Approved by: re (rgrimes)
Differential Revision: https://reviews.freebsd.org/D16191


# f0165b1c 28-Aug-2018 Konstantin Belousov <kib@FreeBSD.org>

Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.

Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.

Based on the submission by: royger
Reviewed by: alc, markj (previous version)
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Approved by: re (marius)
Differential revision: https://reviews.freebsd.org/D16881


# ed9e8bc4 24-Mar-2018 Konstantin Belousov <kib@FreeBSD.org>

Account the size of the vslock-ed memory by the thread.
Assert that all such memory is unwired on return to usermode.

The count of the wired memory will be used to detect the copyout mode.

Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e958ad4c 12-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273


# e2068d0b 06-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.

Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000


# d0389015 11-Jan-2018 Ed Maste <emaste@FreeBSD.org>

ANSIfy function definitions in sys/vm/


# 796df753 30-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

SPDX: Consider code from Carnegie-Mellon University.

Interesting cases, most likely from CMU Mach sources.


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# ac04195b 20-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

Move swapout code into vm/vm_swapout.c.

There is no NO_SWAPPING #ifdef left in the code.

Requested by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12663


# 9df950b3 11-Aug-2017 Mark Johnston <markj@FreeBSD.org>

Modify vm_page_grab_pages() to handle VM_ALLOC_NOWAIT.

This will allow its use in sendfile_swapin().

Reviewed by: alc, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11942


# 5471caf6 08-Aug-2017 Alan Cox <alc@FreeBSD.org>

Introduce vm_page_grab_pages(), which is intended to replace loops calling
vm_page_grab() on consecutive page indices. Besides simplifying the code
in the caller, vm_page_grab_pages() allows for batching optimizations.
For example, the current implementation replaces calls to vm_page_lookup()
on consecutive page indices by cheaper calls to vm_page_next().

Reviewed by: kib, markj
Tested by: pho (an earlier version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11926


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# ce3ee09b 14-Aug-2016 Alan Cox <alc@FreeBSD.org>

Eliminate unneeded vm_page_xbusy() and vm_page_xunbusy() operations when
neither vm_pager_has_page() nor vm_pager_get_pages() is called.

Reviewed by: kib, markj
MFC after: 3 weeks


# 95e2409a 22-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Fix a LOR between vnode locks and allproc_lock.

There is an order between covered vnode lock and allproc_lock, which
is established by calling mountcheckdirs() while owning the covered
vnode lock. mountcheckdirs() iterates over the processes, protected by
allproc_lock. This order is needed and seems to be not avoidable.

On the other hand, various VM daemons also need to iterate over all
processes, and they lock and unlock user maps. Since unlock of the
user map may trigger processing of the deferred map entries, it causes
vnode locking to occur. Or, when vmspace is freed, dropping references
on the vnode-backed object also lock vnodes. We get reverted order
comparing with the mount/unmount order.

For VM daemons, there is no need to own allproc_lock while we operate
on vmspaces. If the process is held, it serves as the marker for
allproc list, which allows to continue the iteration.

Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in
of the kernel stacks. It is used instead of _PHOLD() in vm code,
since e.g. calling faultin() in OOM conditions only exaggerates the
problem.

Modernize comment describing PHOLD.

Reported by: lists@yamagi.org
Tested by: pho (previous version)
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 week
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6679


# a525fd17 22-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Remove false claim. Giant is dropped by mi_startup() before passing
the control to swapper.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 763df3ec 02-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/vm: minor spelling fixes in comments.

No functional change.


# b0cd2017 16-Dec-2015 Gleb Smirnoff <glebius@FreeBSD.org>

A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().

o With new KPI consumers can request contiguous ranges of pages, and
unlike before, all pages will be kept busied on return, like it was
done before with the 'reqpage' only. Now the reqpage goes away. With
new interface it is easier to implement code protected from race
conditions.

Such arrayed requests for now should be preceeded by a call to
vm_pager_haspage() to make sure that request is possible. This
could be improved later, making vm_pager_haspage() obsolete.

Strenghtening the promises on the business of the array of pages
allows us to remove such hacks as swp_pager_free_nrpage() and
vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
values for read ahead and read behind, that a pager may do, if it
can. These pages are completely owned by pager, and not controlled
by the caller.

This shifts the UFS-specific readahead logic from vm_fault.c, which
should be file system agnostic, into vnode_pager.c. It also removes
one VOP_BMAP() request per hard fault.

Discussed with: kib, alc, jeff, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# e595970a 08-Oct-2015 Alan Cox <alc@FreeBSD.org>

Exploit r288122 to avoid pointlessly enqueueing a page that is about to be
freed.

Submitted by: kmacy
Differential Revision: https://reviews.freebsd.org/D1674


# edc82223 10-Aug-2015 Konstantin Belousov <kib@FreeBSD.org>

Make kstack_pages a tunable on arm, x86, and powepc. On i386, the
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.

The tunable was tested on x86 only. From the visual inspection, it
seems that it might work on arm and powerpc. The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size. I only
changed the macros to use variable instead of constant, since I cannot
test.

On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.

Sponsored by: The FreeBSD Foundation
MFC after: 2 week


# 093c7f39 12-Jun-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Make KPI of vm_pager_get_pages() more strict: if a pager changes a page
in the requested array, then it is responsible for disposition of previous
page and is responsible for updating the entry in the requested array.
Now consumers of KPI do not need to re-lookup the pages after call to
vm_pager_get_pages().

Reviewed by: kib
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# b7627840 04-Oct-2014 Konstantin Belousov <kib@FreeBSD.org>

Add kernel option KSTACK_USAGE_PROF to sample the stack depth on
interrupts and report the largest value seen as sysctl
debug.max_kstack_used. Useful to estimate how close the kernel stack
size is to overflow.

In collaboration with: Larry Baird <lab@gta.com>
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week


# 70978c93 12-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

If vm_page_grab() allocates a new page, the page is not inserted into
page queue even when the allocation is not wired. It is
responsibility of the vm_page_grab() caller to ensure that the page
does not end on the vm_object queue but not on the pagedaemon queue,
which would effectively create unpageable unwired page.

In exec_map_first_page() and vm_imgact_hold_page(), activate the page
immediately after unbusying it, to avoid leak.

In the uiomove_object_page(), deactivate page before the object is
unlocked. There is no leak, since the page is deactivated after
uiomove_fromphys() finished. But allowing non-queued non-wired page
in the unlocked object queue makes it impossible to assert that leak
does not happen in other places.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 3ae10f74 16-Jun-2014 Attilio Rao <attilio@FreeBSD.org>

- Modify vm_page_unwire() and vm_page_enqueue() to directly accept
the queue where to enqueue pages that are going to be unwired.
- Add stronger checks to the enqueue/dequeue for the pagequeues when
adding and removing pages to them.

Of course, for unmanaged pages the queue parameter of vm_page_unwire() will
be ignored, just as the active parameter today.
This makes adding new pagequeues quicker.

This change effectively modifies the KPI. __FreeBSD_version will be,
however, bumped just when the full cache of free pages will be
evicted.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


# 44f1c916 22-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename global cnt to vm_cnt to avoid shadowing.

To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from: arch@
Sponsored by: EMC / Isilon Storage Division


# 5944de8e 22-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).
The flag was mandatory since r209792, where vm_page_grab(9) was
changed to only support the alloc retry semantic.

Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


# c7aebda8 09-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


# 5df87b21 07-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# be996836 05-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

Revert r253939:
We cannot busy a page before doing pagefaults.
Infact, it can deadlock against vnode lock, as it tries to vget().
Other functions, right now, have an opposite lock ordering, like
vm_object_sync(), which acquires the vnode lock first and then
sleeps on the busy mechanism.

Before this patch is reinserted we need to break this ordering.

Sponsored by: EMC / Isilon storage division
Reported by: kib


# 3b6714ca 04-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

The page hold mechanism is fast but it has couple of fallouts:
- It does not let pages respect the LRU policy
- It bloats the active/inactive queues of few pages

Try to avoid it as much as possible with the long-term target to
completely remove it.
Use the soft-busy mechanism to protect page content accesses during
short-term operations (like uiomove_fromphys()).

After this change only vm_fault_quick_hold_pages() is still using the
hold mechanism for page content access.
There is an additional complexity there as the quick path cannot
immediately access the page object to busy the page and the slow path
cannot however busy more than one page a time (to avoid deadlocks).

Fixing such primitive can bring to complete removal of the page hold
mechanism.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff
Tested by: pho


# 785797c3 24-Jul-2013 Andriy Gapon <avg@FreeBSD.org>

rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST

Also directly call swapper() at the end of mi_startup instead of
relying on swapper being the last thing in sysinits order.

Rationale:

- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage
- "scheduler" was misleading, the function swaps in the swapped out processes
- another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be
invoked depending on its relative order with scheduler; this was not obvious
and the bug actually used to exist

Reviewed by: kib (ealier version)
MFC after: 14 days


# 002f377a 06-Jun-2013 Attilio Rao <attilio@FreeBSD.org>

Complete r251452:
Avoid to busy/unbusy a page in cases where there is no need to drop the
vm_obj lock, more nominally when the page is full valid after
vm_page_grab().

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 25c1e164 26-Oct-2012 Andre Oppermann <andre@FreeBSD.org>

Move the corresponding MTX_SYSINIT() next to their struct mtx declaration
to make their relationship more obvious as done with the other such mutexs.


# e145130e 30-Aug-2012 Andrey Zonov <zont@FreeBSD.org>

- Remove accounting of locked memory from vsunlock(9) that I missed in r239818.

Approved by: kib (mentor)


# 126a63ce 29-Aug-2012 Andrey Zonov <zont@FreeBSD.org>

- Don't take an account of locked memory for current process in vslock(9).

There are two consumers of vslock(9): sysctl code and drm driver. These
consumers are using locked memory as transient memory, it doesn't belong
to a process's memory.

Suggested by: avg
Reviewed by: alc
Approved by: kib (mentor)
MFC after: 2 weeks


# e878d997 16-Dec-2011 Konstantin Belousov <kib@FreeBSD.org>

Move kstack_cache_entry into the private header, and make the
stack cache list header accessible outside vm_glue.c.

MFC after: 1 week


# afcc55f3 06-Jul-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

All the racct_*() calls need to happen with the proc locked. Fixing this
won't happen before 9.0. This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".


# e806d352 06-Apr-2011 John Baldwin <jhb@FreeBSD.org>

Fix several places to ignore processes that are not yet fully constructed.

MFC after: 1 week


# 1ba5ad42 05-Apr-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add accounting for most of the memory-related resources.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 50a57dfb 08-Jan-2011 Konstantin Belousov <kib@FreeBSD.org>

Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by: alc


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# ef7b0ac4 13-May-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r207365:
When doing kstack swapin, read as much pages in one run as possible.


# 83d6c6d8 13-May-2010 Konstantin Belousov <kib@FreeBSD.org>

MFC r206545 (by alc):
Simplify vm_thread_swapin().

Approved by: alc


# eb00b276 06-May-2010 Alan Cox <alc@FreeBSD.org>

Eliminate page queues locking around most calls to vm_page_free().


# e3ef0d2f 04-May-2010 Alan Cox <alc@FreeBSD.org>

Push down the acquisition of the page queues lock into vm_page_unwire().

Update the comment describing which lock should be held on entry to
vm_page_wire().

Reviewed by: kib


# 2965a453 29-Apr-2010 Kip Macy <kmacy@FreeBSD.org>

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 6fb8c0c1 29-Apr-2010 Konstantin Belousov <kib@FreeBSD.org>

When doing kstack swapin, read as much pages in one run as possible.

Suggested and reviewed by: alc (previous version)
Tested by: pho
MFC after: 2 weeks


# 7b7d5b6c 18-Apr-2010 Alan Cox <alc@FreeBSD.org>

vm_thread_swapout() can safely dirty the page before rather than after
acquiring the page queues lock.


# ca596a25 18-Apr-2010 Juli Mallett <jmallett@FreeBSD.org>

o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches the
address space for an address as aligned by the new pmap_align_tlb()
function, which is for constraints imposed by the TLB. [1]
o) Add a kmem_alloc_nofault_space() function, which acts like
kmem_alloc_nofault() but allows the caller to specify which find-space
option to use. [1]
o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the
kernel stack address on MIPS. [1]
o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on
an odd boundary within the TLB, so that they are suitable for insertion as
wired entries and do not have to share a TLB entry with another mapping,
assuming they are appropriately-sized.
o) Eliminate md_realstack now that the kstack will be appropriately-aligned on
MIPS.
o) Increase the number of guard pages to 2 so that we retain the proper
alignment of the kstack address.

Reviewed by: [1] alc
X-MFC-after: Making sure alc has not come up with a better interface.


# 6c68c971 13-Apr-2010 Alan Cox <alc@FreeBSD.org>

Simplify vm_thread_swapin().


# ac45ee97 11-Apr-2010 Alan Cox <alc@FreeBSD.org>

Initialize the virtual memory-related resource limits in a single place.
Previously, one of these limits was initialized in two places to a
different value in each place. Moreover, because an unsigned int was used
to represent the amount of pageable physical memory, some of these limits
were incorrectly initialized on 64-bit architectures. (Currently, this
error is masked by login.conf's default settings.)

Make vm_thread_swapin() and vm_thread_swapout() static.

Submitted by: bde (an earlier version)
Reviewed by: kib


# dfeca187 30-Mar-2010 Marcel Moolenaar <marcel@FreeBSD.org>

MFC rev 198341 and 198342:
o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).


# 1a4fcaeb 21-Oct-2009 Marcel Moolenaar <marcel@FreeBSD.org>

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


# 2af00dec 08-Sep-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r196730:
Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ.

Introduce separate kernel stack cache that keeps some limited amount of
preallocated kernel stacks to lower the latency of thread allocation.

Not a merge: instead of removing td_altkstack* members of struct thread,
replace them with placeholders to keep struct thread layout on the
stable branch.

Also, record r196640, r196644 and r196648 as merged.

Approved by: re (kensmith)


# 8a945d10 01-Sep-2009 Konstantin Belousov <kib@FreeBSD.org>

Reintroduce the r196640, after fixing the problem with my testing.

Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].

This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.

Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.

Submitted by: peter [1]
Discussed with: jhb, julian, peter
Reviewed by: jhb
Tested by: pho (and retested according to new test scenarious)
MFC after: 1 week


# f25fa6ab 29-Aug-2009 Konstantin Belousov <kib@FreeBSD.org>

Reverse r196640 and r196644 for now.


# c3cf0b47 29-Aug-2009 Konstantin Belousov <kib@FreeBSD.org>

Remove the altkstacks, instead instantiate threads with kernel stack
allocated with the right size from the start. For the thread that has
kernel stack cached, verify that requested stack size is equial to the
actual, and reallocate the stack if sizes differ [1].

This fixes the bug introduced by r173361 that was committed several days
after r173004 and consisted of kthread_add(9) ignoring the non-default
kernel stack size.

Also, r173361 removed the caching of the kernel stacks for a non-first
thread in the process. Introduce separate kernel stack cache that keeps
some limited amount of preallocated kernel stacks to lower the latency
of thread allocation. Add vm_lowmem handler to prune the cache on
low memory condition. This way, system with reasonable amount of the
threads get lower latency of thread creation, while still not exhausting
significant portion of KVA for unused kstacks.

Submitted by: peter [1]
Discussed with: jhb, julian, peter
Reviewed by: jhb
Tested by: pho
MFC after: 1 week


# 0a2e596a 07-Jun-2009 Alan Cox <alc@FreeBSD.org>

Eliminate unnecessary obfuscation when testing a page's valid bits.


# d1a6e42d 06-Jun-2009 Alan Cox <alc@FreeBSD.org>

If vm_pager_get_pages() returns VM_PAGER_OK, then there is no need to check
the page's valid bits. The page is guaranteed to be fully valid. (For the
record, this is documented in vm/vm_pager.h's comments.)


# 7a122777 05-Jun-2009 Alan Cox <alc@FreeBSD.org>

vm_thread_swapin() needn't validate any pages. The pages are already
validated by vm_pager_get_pages().


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# da7bbd2c 05-Aug-2008 John Baldwin <jhb@FreeBSD.org>

If a thread that is swapped out is made runnable, then the setrunnable()
routine wakes up proc0 so that proc0 can swap the thread back in.
Historically, this has been done by waking up proc0 directly from
setrunnable() itself via a wakeup(). When waking up a sleeping thread
that was swapped out (the usual case when waking proc0 since only sleeping
threads are eligible to be swapped out), this resulted in a bit of
recursion (e.g. wakeup() -> setrunnable() -> wakeup()).

With sleep queues having separate locks in 6.x and later, this caused a
spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock).
An attempt was made to fix this in 7.0 by making the proc0 wakeup use
the ithread mechanism for doing the wakeup. However, this required
grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep
elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated
into the same LOR since the thread lock would be some other sleepq lock.

Fix this by deferring the wakeup of the swapper until after the sleepq
lock held by the upper layer has been locked. The setrunnable() routine
now returns a boolean value to indicate whether or not proc0 needs to be
woken up. The end result is that consumers of the sleepq API such as
*sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup
proc0 if they get a non-zero return value from sleepq_abort(),
sleepq_broadcast(), or sleepq_signal().

Discussed with: jeff
Glanced at by: sam
Tested by: Jurgen Weber jurgen - ish com au
MFC after: 2 weeks


# 8df78c41 16-Apr-2008 Jeff Roberson <jeff@FreeBSD.org>

- Make SCHED_STATS more generic by adding a wrapper to create the
variables and sysctl nodes.
- In reset walk the children of kern_sched_stats and reset the counters
via the oid_arg1 pointer. This allows us to add arbitrary counters to
the tree and still reset them properly.
- Define a set of switch types to be passed with flags to mi_switch().
These types are named SWT_*. These types correspond to SCHED_STATS
counters and are automatically handled in this way.
- Make the new SWT_ types more specific than the older switch stats.
There are now stats for idle switches, remote idle wakeups, remote
preemption ithreads idling, etc.
- Add switch statistics for ULE's pickcpu algorithm. These stats include
how much migration there is, how often affinity was successful, how
often threads were migrated to the local cpu on wakeup, etc.

Sponsored by: Nokia


# 374ae2a3 19-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.


# 237fdd78 16-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# 6617724c 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# c5aa6b58 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Pass the priority argument from *sleep() into sleepq and down into
sched_sleep(). This removes extra thread_lock() acquisition and
allows the scheduler to decide what to do with the static boost.
- Change the priority arguments to cv_* to match sleepq/msleep/etc.
where 0 means no priority change. Catch -1 in cv_broadcastpri() and
convert it to 0 for now.
- Set a flag when sleeping in a way that is compatible with swapping
since direct priority comparisons are meaningless now.
- Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
controls the boost behavior. Turning it off gives better performance
in some workloads but needs more investigation.
- While we're modifying sleepq, change signal and broadcast to both
return with the lock held as the lock was held on enter.

Reviewed by: jhb, peter


# 89b57fcf 05-Nov-2007 Konstantin Belousov <kib@FreeBSD.org>

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


# 258853ab 20-Sep-2007 Jeff Roberson <jeff@FreeBSD.org>

- Redefine p_swtime and td_slptime as p_swtick and td_slptick. This
changes the units from seconds to the value of 'ticks' when swapped
in/out. ULE does not have a periodic timer that scans all threads in
the system and as such maintaining a per-second counter is difficult.
- Change computations requiring the unit in seconds to subtract ticks
and divide by hz. This does make the wraparound condition hz times
more frequent but this is still in the range of several months to
years and the adverse effects are minimal.

Approved by: re


# b61ce5b0 16-Sep-2007 Jeff Roberson <jeff@FreeBSD.org>

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


# 982d11f8 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 1c4bcd05 31-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- Move rusage from being per-process in struct pstats to per-thread in
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.

Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)


# 2feb50bf 31-May-2007 Attilio Rao <attilio@FreeBSD.org>

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 222d0195 18-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# f0393f06 23-Jan-2007 Jeff Roberson <jeff@FreeBSD.org>

- Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.

Discussed with: julian

- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.

Suggested by: jhb

Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.


# ad1e7d28 05-Dec-2006 Julian Elischer <julian@FreeBSD.org>

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


# 8460a577 26-Oct-2006 John Birrell <jb@FreeBSD.org>

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 66bdd5d6 22-Oct-2006 Alan Cox <alc@FreeBSD.org>

The page queues lock is no longer required by vm_page_wakeup().


# 57051fdc 29-May-2006 Tor Egge <tegge@FreeBSD.org>

Close race between vmspace_exitfree() and exit1() and races between
vmspace_exitfree() and vmspace_free() which could result in the same
vmspace being freed twice.

Factor out part of exit1() into new function vmspace_exit(). Attach
to vmspace0 to allow old vmspace to be freed earlier.

Add new function, vmspace_acquire_ref(), for obtaining a vmspace
reference for a vmspace belonging to another process. Avoid changing
vmspace refcount from 0 to 1 since that could also lead to the same
vmspace being freed twice.

Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().

Reviewed by: alc


# da61b9a6 16-Dec-2005 Alan Cox <alc@FreeBSD.org>

Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the
ephemeral mappings that are used as the source for three copy
operations from kernel space to user space. There are two reasons for
making this change: (1) Under heavy load exec_map can fill up causing
vm_map_find() to fail. When it fails, the nascent process is aborted
(SIGABRT). Whereas, this reimplementation using sf_buf_alloc()
sleeps. (2) Although it is possible to sleep on vm_map_find()'s
failure until address space becomes available (see kmem_alloc_wait()),
using sf_buf_alloc() is faster. Furthermore, the reimplementation
uses a CPU private mapping, avoiding a TLB shootdown on
multiprocessors.

Problem uncovered by: kris@
Reviewed by: tegge@
MFC after: 3 weeks


# d13ec713 23-May-2005 Stephan Uphoff <ups@FreeBSD.org>

Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.


# 10c447fa 22-May-2005 Alan Cox <alc@FreeBSD.org>

Swap in can occur safely without Giant. Release Giant on entry to
scheduler().


# 35cf2323 21-May-2005 Alan Cox <alc@FreeBSD.org>

Remove GIANT_REQUIRED from swapout_procs().


# 75337a56 22-Jan-2005 Alan Cox <alc@FreeBSD.org>

Guard against address wrap in kernacc(). Otherwise, a program accessing a
bad address range through /dev/kmem can panic the machine.

Submitted by: Mark W. Krentel
Reported by: Kris Kennaway
MFC after: 1 week


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 6004362e 26-Nov-2004 David Schultz <das@FreeBSD.org>

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


# 9799b417 19-Nov-2004 David Schultz <das@FreeBSD.org>

Disable U area swapping and remove the routines that create, destroy,
copy, and swap U areas.

Reviewed by: arch@


# d19ef814 03-Nov-2004 Alan Cox <alc@FreeBSD.org>

The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.


# ddf4bb37 24-Oct-2004 Alan Cox <alc@FreeBSD.org>

Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().


# 8daa8c60 19-Sep-2004 David Schultz <das@FreeBSD.org>

The zone from which proc structures are allocated is marked
UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should
never be called. Move an assertion from proc_fini() to proc_dtor()
and garbage-collect the rest of the unreachable code. I have retained
vm_proc_dispose(), since I consider its disuse a bug.


# 94ddc707 02-Sep-2004 Alan Cox <alc@FreeBSD.org>

Push Giant deep into vm_forkproc(), acquiring it only if the process has
mapped System V shared memory segments (see shmfork_myhook()) or requires
the allocation of an ldt (see vm_fault_wire()).


# 9be60284 30-Jul-2004 Alan Cox <alc@FreeBSD.org>

Giant is no longer required by vm_waitproc() and vmspace_exitfree().
Eliminate it acquisition and release around vm_waitproc() in kern_wait().


# 1a276a3f 26-Jul-2004 Alan Cox <alc@FreeBSD.org>

- Use atomic ops for updating the vmspace's refcnt and exitingcnt.
- Push down Giant into shmexit(). (Giant is acquired only if the vmspace
contains shm segments.)
- Eliminate the acquisition of Giant from proc_rwmem().
- Reduce the scope of Giant in exit1(), uncovering the destruction of the
address space.


# d202e0cc 01-Jul-2004 John Baldwin <jhb@FreeBSD.org>

- Don't use a variable to point to the user area that we only use once.
Just use p2->p_uarea directly instead.
- Remove an old and mostly bogus assertion regarding p2->p_sigacts.
- Use RANGEOF macro ala fork1() to clean up bzero/bcopy of p_stats.


# 17d9d0d0 26-Jun-2004 David Schultz <das@FreeBSD.org>

Update a stale comment. The heuristic to swap processes out based on
the number of pages already paged out was broken in rev 1.10 and
removed in rev 1.11.


# fa885116 15-Jun-2004 Julian Elischer <julian@FreeBSD.org>

Nice, is a property of a process as a whole..
I mistakenly moved it to the ksegroup when breaking up the process
structure. Put it back in the proc structure.


# d9b2500e 07-May-2004 Brian Feldman <green@FreeBSD.org>

In r1.190, vslock() and vsunlock() were bogusly made to do a "user wire"
and a "system unwire." Make this a "system wire" and "system unwire."

Reviewed by: alc


# 05eb3785 06-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# bb734798 15-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Make overflow/wraparound checking more robust and unbreak len=0 in
vslock(), mlock(), and munlock().

Reviewed by: bde


# f0ea4612 14-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Style(9) changes.

Pointed out by: bde


# ce8660e3 14-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Revert to the original vslock() and vsunlock() API with the following
exceptions:
Retain the recently added vslock() error return.

The type of the len argument should be size_t, not u_int.

Suggested by: bde


# fcffa790 07-Mar-2004 Alan Cox <alc@FreeBSD.org>

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


# 16929939 05-Mar-2004 Don Lewis <truckman@FreeBSD.org>

Undo the merger of mlock()/vslock and munlock()/vsunlock() and the
introduction of kern_mlock() and kern_munlock() in
src/sys/kern/kern_sysctl.c 1.150
src/sys/vm/vm_extern.h 1.69
src/sys/vm/vm_glue.c 1.190
src/sys/vm/vm_mmap.c 1.179
because different resource limits are appropriate for transient and
"permanent" page wiring requests.

Retain the kern_mlock() and kern_munlock() API in the revived
vslock() and vsunlock() functions.

Combine the best parts of each of the original sets of implementations
with further code cleanup. Make the mclock() and vslock()
implementations as similar as possible.

Retain the RLIMIT_MEMLOCK check in mlock(). Move the most strigent
test, which can return EAGAIN, last so that requests that have no
hope of ever being satisfied will not be retried unnecessarily.

Disable the test that can return EAGAIN in the vslock() implementation
because it will cause the sysctl code to wedge.

Tested by: Cy Schubert <Cy.Schubert AT komquats.com>


# 47934cef 25-Feb-2004 Don Lewis <truckman@FreeBSD.org>

Split the mlock() kernel code into two parts, mlock(), which unpacks
the syscall arguments and does the suser() permission check, and
kern_mlock(), which does the resource limit checking and calls
vm_map_wire(). Split munlock() in a similar way.

Enable the RLIMIT_MEMLOCK checking code in kern_mlock().

Replace calls to vslock() and vsunlock() in the sysctl code with
calls to kern_mlock() and kern_munlock() so that the sysctl code
will obey the wired memory limits.

Nuke the vslock() and vsunlock() implementations, which are no
longer used.

Add a member to struct sysctl_req to track the amount of memory
that is wired to handle the request.

Modify sysctl_wire_old_buffer() to return an error if its call to
kern_mlock() fails. Only wire the minimum of the length specified
in the sysctl request and the length specified in its argument list.
It is recommended that sysctl handlers that use sysctl_wire_old_buffer()
should specify reasonable estimates for the amount of data they
want to return so that only the minimum amount of memory is wired
no matter what length has been specified by the request.

Modify the callers of sysctl_wire_old_buffer() to look for the
error return.

Modify sysctl_old_user to obey the wired buffer length and clean up
its implementation.

Reviewed by: bms


# 91d5354a 04-Feb-2004 John Baldwin <jhb@FreeBSD.org>

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


# 9a44a82b 28-Jan-2004 Bruce Evans <bde@FreeBSD.org>

Fixed breakage of scheduling in rev.1.29 of subr_4bsd.c. The
"scheduler" here has very little to do with scheduling. It is actually
the swapper, and it really must be the last SYSINIT'ed item like its
comment says, since proc0 metamorphoses into swapper by calling
scheduler() last in mi_start(), and scheduler() never returns.. Rev.1.29
of subr_4bsd.c broke this by adding another SI_ORDER_FIRST item
(kproc_start() for schedcpu_thread() onto the SI_SUB_RUN_SCHEDULER_LIST.
The sorting of SYSINITs with identical orders (at all levels) is
apparently nondeterministic, so this resulted in schedule() sometimes
being called second last and schedcpu_thread() not being called at all.

This quick fix just changes the code to almost match the comment
(SI_ORDER_FIRST -> SI_ORDER_ANY). "LAST" is misspelled "ANY", and
there is no way to ensure that there is only 1 very lst SYSINIT.
A more complete fix would remove the SYSINIT obfuscation.


# d8834602 09-Nov-2003 Alan Cox <alc@FreeBSD.org>

- The Open Group Base Specifications Issue 6 specifies that an munmap(2)
must return EINVAL if size is zero. Submitted by: tegge
- In order to avoid a race condition in multithreaded applications, the
check and removal operations by munmap(2) must be in the same critical
section. To accomodate this, vm_map_check_protection() is modified to
require its caller to obtain at least a read lock on the map.


# 5d264f84 04-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Revert previous commit. Come back vslock(), all is forgiven.

Pointy hat to: bms


# aac7652e 05-Oct-2003 Bruce M Simpson <bms@FreeBSD.org>

Retire vslock() and vsunlock() with extreme prejudice.

Discussed with: pete


# ef13663b 17-Aug-2003 Alan Cox <alc@FreeBSD.org>

Three unrelated changes to vm_proc_new(): (1) add vm object locking on the
U pages object; (2) reorganize such that the U pages object is created and
filled in one block; and (3) remove an unnecessary clearing of PG_ZERO.


# 710338e9 16-Aug-2003 Marcel Moolenaar <marcel@FreeBSD.org>

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


# abd498aa 11-Aug-2003 Bruce M Simpson <bms@FreeBSD.org>

Add the mlockall() and munlockall() system calls.
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).

PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks


# 8f60c087 03-Aug-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Change the layout policy of the swap_pager from a hardcoded width
striping to a per device round-robin algorithm.

Because of the policy of not attempting to retain previous swap
allocation on page-out, this means that a newly added swap device
almost instantly takes its 1/N share of the I/O load but it takes
somewhat longer for it to assume it's 1/N share of the pages if there
is plenty of space on the other devices.

Change the 8G total swapspace limitation to 8G per device instead
by using a per device blist rather than one global blist. This
reduces the memory footprint by 75% (typically a couple hundred
kilobytes) for the common case with one swapdevice but NSWAPDEV=4.

Remove the compile time constant limit of number of swap devices,
there is no limit now. Instead of a fixed size array, store the
per swapdev structure in a TAILQ.

Total swap space is still addressed by a 32 bit page number and
therefore the upper limit is now 2^42 bytes = 16TB (for i386).

We still do not allocate the first page of each device in order to
give some amount of protection to any bsdlabel at the start of the
device.

A new device is appended after the existing devices in the swap space,
no attempt is made to fill in holes left behind by swapoff (this can
trivially be changed should it ever become a problem).

The sysctl vm.nswapdev now reflects the number of currently configured
swap devices.

Rename vm_swap_size to swap_pager_avail for consistency with other
exported names.

Change argument type for vm_proc_swapin_all() and swap_pager_isswapped()
to be a struct swdevt pointer rather than an index.

Not changed: we are still using blists to manage the free space,
but since the swapspace is no longer fragmented by the striping
different resource managers might fare better.


# 15a7ad60 30-Jul-2003 Peter Wemm <peter@FreeBSD.org>

Add #include "opt_kstack_pages.h" and "opt_kstack_max_pages.h" to remain
in sync with the backend machdep code. When cpu_thread_init() does not
have the same idea of KSTACK_PAGES as the thing that created the kstack,
all hell breaks loose.

Bad alc! no cookie! :-)


# a04a7f22 14-Jun-2003 Alan Cox <alc@FreeBSD.org>

Use #ifdef __alpha__, not __alpha.


# 49a2507b 14-Jun-2003 Alan Cox <alc@FreeBSD.org>

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


# 89f4fca2 14-Jun-2003 Alan Cox <alc@FreeBSD.org>

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


# 8630c117 12-Jun-2003 Alan Cox <alc@FreeBSD.org>

Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.


# 874651b1 11-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 77e2a274 10-Jun-2003 Peter Wemm <peter@FreeBSD.org>

GC unused cpu_wait() function


# 0b074f6c 31-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused variables

Found by: FlexeLint


# 90af4afa 13-May-2003 John Baldwin <jhb@FreeBSD.org>

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


# 104a9b7e 29-Apr-2003 Alexander Kabaev <kan@FreeBSD.org>

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


# 17cd3642 28-Apr-2003 Alan Cox <alc@FreeBSD.org>

- Lock the vm_object when performing swap_pager_isswapped().
- Assert that the vm_object is locked in swap_pager_isswapped().


# 8f887403 25-Apr-2003 John Baldwin <jhb@FreeBSD.org>

- Don't bother using the proc lock to test just P_SYSTEM as that is set in
fork1() and never changes.
- The proc lock is enough to cover reading p_state, so push down sched_lock
into the PRS_NORMAL case of the switch on p_state.


# 6a07e90d 25-Apr-2003 Alan Cox <alc@FreeBSD.org>

- Lock the vm_object when iterating over its list of resident pages.


# 11edc1e0 23-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Fix compiling in the NO_SWAPPING case.

Submitted by: bde (partially)


# 664f718b 22-Apr-2003 John Baldwin <jhb@FreeBSD.org>

- Always call faultin() in _PHOLD() if PS_INMEM is clear. This closes a
race where a thread could assume that a process was swapped in by
PHOLD() when it actually wasn't fully swapped in yet.
- In faultin(), always msleep() if PS_SWAPPINGIN is set instead of doing
this check after bumping p_lock in the PS_INMEM == 0 case. Also,
sched_lock is only needed for setting and clearning swapping PS_*
flags and the swap thread inhibitor.
- Don't set and clear the thread swap inhibitor in the same loops as the
pmap_swapin/out_thread() since we have to do it under sched_lock.
Instead, mimic the treatment of the PS_INMEM flag and use separate loops
to set the inhibitors when clearing PS_INMEM and clear the inhibitors
when setting PS_INMEM.
- swapout() now returns with the proc lock held as it holds the lock
while adjusting the swapping-related PS_* flags so that the proc lock
can be used to test those flags.
- Only use the proc lock to check the swapping-related PS_* flags in
several places.
- faultin() no longer requires sched_lock to be held by callers.
- Rename PS_SWAPPING to PS_SWAPPINGOUT to be less ambiguous now that we
have PS_SWAPPINGIN.


# 9faaf3b3 17-Apr-2003 Tom Rhodes <trhodes@FreeBSD.org>

Add some tunable descriptions.

Submitted by: hmp
Discussed with: bde


# 2a3eeaa2 17-Apr-2003 Tom Rhodes <trhodes@FreeBSD.org>

Pre-content whitespace commit.

Discussed with: bde


# c3dfdfd1 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

use 'void *' instead of 'caddr_t' for useracc, kernacc, vslock and vsunlock.


# 2d5c7e45 20-Jan-2003 Matthew Dillon <dillon@FreeBSD.org>

Close the remaining user address mapping races for physical
I/O, CAM, and AIO. Still TODO: streamline useracc() checks.

Reviewed by: alc, tegge
MFC after: 7 days


# dc907f66 23-Dec-2002 Alan Cox <alc@FreeBSD.org>

- Hold the page queues lock around vm_page_wakeup().


# 92da00bb 15-Dec-2002 Matthew Dillon <dillon@FreeBSD.org>

This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after: 3 weeks


# 1c865ac7 22-Oct-2002 John Baldwin <jhb@FreeBSD.org>

- Check that a process isn't a new process (p_state == PRS_NEW) before
trying to acquire it's proc lock since the proc lock may not have been
constructed yet.
- Split up the one big comment at the top of the loop and put the pieces
in the right order above the various checks.

Reported by: kris (1)


# d524d69b 14-Oct-2002 Julian Elischer <julian@FreeBSD.org>

Remove old useless debugging code


# 37c84183 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 05ba50f5 21-Sep-2002 Jake Burkholder <jake@FreeBSD.org>

Use the fields in the sysentvec and in the vm map header in place of the
constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS.
This is mainly so that they can be variable even for the native abi, based
on different machine types. Get stack protections from the sysentvec too.
This makes it trivial to map the stack non-executable for certain abis, on
machines that support it.


# 71fad9fd 11-Sep-2002 Julian Elischer <julian@FreeBSD.org>

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


# b1f99ebe 09-Sep-2002 Seigo Tanimura <tanimura@FreeBSD.org>

- Do not swap out a process if it is in creation. The process may have no
address space yet.

- Check whether a process is a system process prior to dereferencing
its p_vmspace. Aio assumes that only the curthread switches the address
space of a system process.


# 1faf202e 06-Sep-2002 Julian Elischer <julian@FreeBSD.org>

Use UMA as a complex object allocator.
The process allocator now caches and hands out complete process structures
*including substructures* .

i.e. it get's the process structure with the first thread (and soon KSE)
already allocated and attached, all in one hit.

For the average non threaded program (non KSE that is) the allocated thread and its stack remain attached to the process, even when the process is
unused and in the process cache. This saves having to allocate and attach it
later, effectively bringing us (hopefully) close to the efficiency
of pre-KSE systems where these were a single structure.

Reviewed by: davidxu@freebsd.org, peter@freebsd.org


# 1279572a 05-Sep-2002 David Xu <davidxu@FreeBSD.org>

s/SGNL/SIG/
s/SNGL/SINGLE/
s/SNGLE/SINGLE/

Fix abbreviation for P_STOPPED_* etc flags, in original code they were
inconsistent and difficult to distinguish between them.

Approved by: julian (mentor)


# 239b5b97 31-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped
by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably
leads to unnecessary pmap_page_protect() calls if one of these pages is
paged out after unwiring.

Note: setting PG_MAPPED asserts that the page's pv list may be
non-empty. Since checking the status of the page's pv list isn't any
harder than checking this flag, the flag should probably be eliminated.
Alternatively, PG_MAPPED could be set by pmap_enter() exclusively
rather than various places throughout the kernel.


# 9eb881f8 30-Jul-2002 Seigo Tanimura <tanimura@FreeBSD.org>

- Optimize wakeup() and its friends; if a thread waken up is being
swapped in, we do not have to ask for the scheduler thread to do
that.

- Assert that a process is not swapped out in runq functions and
swapout().

- Introduce thread_safetoswapout() for readability.

- In swapout_procs(), perform a test that may block (check of a
thread working on its vm map) first. This lets us call swapout()
with the sched_lock held, providing a better atomicity.


# b7f2cf17 29-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Remove a XXXKSE comment. the code is no longer a problem..


# 1d7b9ed2 29-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Create a new thread state to describe threads that would be ready to run
except for the fact tha they are presently swapped out. Also add a process
flag to indicate that the process has started the struggle to swap
back in. This will be needed for the case where multiple threads
start the swapin action top a collision. Also add code to stop
a process fropm being swapped out if one of the threads in this
process is actually off running on another CPU.. that might hurt...

Submitted by: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>


# 14f8ceaa 28-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire()
in pmap_new_thread(), pmap_pinit(), and vm_proc_new().
o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().


# 1b64ed3b 21-Jul-2002 Seigo Tanimura <tanimura@FreeBSD.org>

Do not pass a thread with the state TDS_RUNQ to setrunqueue(), otherwise
assertion in setrunqueue() fails.


# e16cfdbe 14-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Lock page queue accesses by vm_page_wire().


# 2d09a6ad 13-Jul-2002 Alan Cox <alc@FreeBSD.org>

o Lock some page queue accesses, in particular, those by vm_page_unwire().


# a7e9138e 11-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Avoid a vm_page_lookup() - that uses a spinlock protected hash. We can
just use the object's memq for our nefarious purposes.


# f59685a4 07-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Avoid vm_page_lookup() [grabs a spinlock] and just process the upage
object memq instead.

Suggested by: alc


# a136efe9 07-Jul-2002 Peter Wemm <peter@FreeBSD.org>

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


# 8108a145 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

A small cleanup.


# a30ec8f8 03-Jul-2002 Julian Elischer <julian@FreeBSD.org>

Don;t call teh thread setup routines from here..
they are already called when uma calls thread_init()


# e602ba25 29-Jun-2002 Julian Elischer <julian@FreeBSD.org>

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 43a90f3a 21-Jun-2002 Alan Cox <alc@FreeBSD.org>

o Remove GIANT_REQUIRED from vslock().
o Annotate kernacc(), useracc(), and vslock() as MPSAFE.

Motivated by: alfred


# 319490fb 15-Jun-2002 Alan Cox <alc@FreeBSD.org>

o Remove GIANT_REQUIRED from useracc() and vsunlock(). Neither
vm_map_check_protection() nor vm_map_unwire() expect Giant
to be held.


# 1d7cf06c 14-Jun-2002 Alan Cox <alc@FreeBSD.org>

o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and
vm_map_user_pageable().
o Remove vm_map_pageable() and vm_map_user_pageable().
o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were
only used by vm_map_pageable() and vm_map_user_pageable().)

Reviewed by: tegge


# d974f03c 28-Apr-2002 Alan Cox <alc@FreeBSD.org>

o Introduce and use vm_map_trylock() to replace several direct uses
of lockmgr().
o Add missing synchronization to vmspace_swap_count(): Obtain a read lock
on the vm_map before traversing it.


# 11caded3 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 30171114 19-Mar-2002 Peter Wemm <peter@FreeBSD.org>

Fix a gcc-3.1+ warning.
warning: deprecated use of label at end of compound statement

ie: you cannot do this anymore:
switch(foo) {
....

default:
}


# 25adb370 18-Mar-2002 Brian Feldman <green@FreeBSD.org>

Back out the modification of vm_map locks from lockmgr to sx locks. The
best path forward now is likely to change the lockmgr locks to simple
sleep mutexes, then see if any extra contention it generates is greater
than removed overhead of managing local locking state information,
cost of extra calls into lockmgr, etc.

Additionally, making the vm_map lock a mutex and respecting it properly
will put us much closer to not needing Giant magic in vm.


# 5ee9fe6b 17-Mar-2002 Alan Cox <alc@FreeBSD.org>

Undo part of revision 1.57: Now that (o)sendsig() doesn't call useracc(),
the motivation for saving and restoring the map->hint in useracc() is gone.
(The same tests that motivated this change in revision 1.57 now show that
there is no performance loss from removing it.) This was really a hack and
some day we would have had to add new synchronization here on map->hint
to maintain it.


# 2f6c16e1 16-Mar-2002 Alan Cox <alc@FreeBSD.org>

Acquire a read lock on the map inside of vm_map_check_protection() rather
than expecting the caller to do so. This (1) eliminates duplicated code in
kernacc() and useracc() and (2) fixes missing synchronization in munmap().


# 0e0af8ec 13-Mar-2002 Brian Feldman <green@FreeBSD.org>

Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.
While doing this, move it earlier in the sysinit boot process so that the
VM system can use it.

After that, the system is now able to use sx locks instead of lockmgr
locks in the VM system. To accomplish this, some of the more
questionable uses of the locks (such as testing whether they are
owned or not, as well as allowing shared+exclusive recursion) are
removed, and simpler logic throughout is used so locks should also be
easier to understand.

This has been tested on my laptop for months, and has not shown any
problems on SMP systems, either, so appears quite safe. One more
user of lockmgr down, many more to go :)


# a1287949 10-Mar-2002 Eivind Eklund <eivind@FreeBSD.org>

- Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by: alc


# dd50331c 25-Feb-2002 Peter Wemm <peter@FreeBSD.org>

Remove unused variable (td)


# 2c100766 11-Feb-2002 Julian Elischer <julian@FreeBSD.org>

In a threaded world, differnt priorirites become properties of
different entities. Make it so.

Reviewed by: jhb@freebsd.org (john baldwin)


# 079b7bad 07-Feb-2002 Julian Elischer <julian@FreeBSD.org>

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# 582ec34c 05-Feb-2002 Alfred Perlstein <alfred@FreeBSD.org>

Fix a race with free'ing vmspaces at process exit when vmspaces are
shared.

Also introduce vm_endcopy instead of using pointer tricks when
initializing new vmspaces.

The race occured because of how the reference was utilized:
test vmspace reference,
possibly block,
decrement reference

When sharing a vmspace between multiple processes it was possible
for two processes exiting at the same time to test the reference
count, possibly block and neither one free because they wouldn't
see the other's update.

Submitted by: green


# e50f5c2e 17-Jan-2002 Bruce Evans <bde@FreeBSD.org>

Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined.

Fixed some style bugs.


# c86b6ff5 05-Jan-2002 John Baldwin <jhb@FreeBSD.org>

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# cbc89bfb 10-Oct-2001 Paul Saab <ps@FreeBSD.org>

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# eb30c1c0 09-Sep-2001 Peter Wemm <peter@FreeBSD.org>

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


# 54d92145 04-Jul-2001 Matthew Dillon <dillon@FreeBSD.org>

whitespace / register cleanup


# 0cddd8f0 04-Jul-2001 Matthew Dillon <dillon@FreeBSD.org>

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 69a78d46 19-Jun-2001 John Baldwin <jhb@FreeBSD.org>

Put the scheduler, vmdaemon, and pagedaemon kthreads back under Giant for
now. The proc locking isn't actually safe yet and won't be until the proc
locking is finished.


# 3a2189d4 23-May-2001 John Baldwin <jhb@FreeBSD.org>

- Lock the VM around the pmap_swapin_proc() call in faultin().
- Don't lock Giant in the scheduler() function except for when calling
faultin().
- In swapout_procs(), lock the VM before the proccess to avoid a lock order
violation.
- In swapout_procs(), release the allproc lock before calling swapout().
We restart the process scan after swapping out a process.
- In swapout_procs(), un #if 0 the code to bump the vmspace reference count
and lock the process' vm structures. This bug was introduced by me and
could result in the vmspace being free'd out from under a running
process.
- Fix an old bug where the vmspace reference count was not free'd if we
failed the swap_idle_threshold2 test.


# 23955314 18-May-2001 Alfred Perlstein <alfred@FreeBSD.org>

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# ea754954 17-May-2001 John Baldwin <jhb@FreeBSD.org>

- Use a timeout for the tsleep in scheduler() instead of having vmmeter()
wakeup proc0 by hand to enforce the timeout.
- When swapping out a process, keep the process locked via the proc lock
from the first checks up until we clear PS_INMEM and set PS_SWAPPING in
swapout(). The swapout() function now must be called with the proc lock
held and releases it before returning.
- Comment out the code to attempt to lock a process' VM structures before
swapping out. It is broken in that it releases the lock after obtaining
it. If it does grab the lock, it needs to hand it off to swapout()
instead of releasing it. This can be revisisted when the VM is locked
as this is a valid test to perform. It also causes a lock order reversal
for the time being, which is the immediate cause for temporarily
disabling it.


# c96d52a9 15-May-2001 John Baldwin <jhb@FreeBSD.org>

- Use PROC_LOCK_ASSERT instead of a direct mtx_assert.
- Don't hold Giant in the swapper daemon while we walk the list of
processes looking for a process to swap back in.
- Don't bother grabbing the sched_lock while checking a process' sleep
time in swapout_procs() to ensure that a process has been idle for at
least swap_idle_threshold2 before swapping it out. If we lose the race
we just let a process stay in memory until the next call of
swapout_procs().
- Remove some unneeded spl's, sched_lock does all the locking needed in
this case.


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 1005a129 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# d5a08a60 11-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 45ece682 24-Jan-2001 John Baldwin <jhb@FreeBSD.org>

- Doh, lock faultin() with proc lock in scheduler().
- Lock p_swtime with sched_lock in scheduler() as well.


# 69b40456 23-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Argh, I didn't get this test right when I converted it. Break this up
into two separate if's instead of nested if's. Also, reorder things
slightly to avoid unnecessary mutex operations.


# 5074aecd 24-Jan-2001 John Baldwin <jhb@FreeBSD.org>

- Catch up to proc flag changes.
- Proc locking in a few places.
- faultin() now must be called with the proc lock held.
- Split up swappable() into a couple of tests so that it can be locke in
swapout_procs().
- Use queue macros.


# c0c25570 12-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# c8a6b001 01-Dec-2000 John Baldwin <jhb@FreeBSD.org>

Protect p_stat with sched_lock.


# 553629eb 22-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 915cf38b 24-Oct-2000 John Baldwin <jhb@FreeBSD.org>

- Catch a machine/mutex.h -> sys/mutex.h I somehow missed.
- Close a small race condition. The sched_lock mutex protects
p->p_stat as well as the run queues. Another CPU could change p_stat
of the process while we are waiting for the lock, and we would end up
scheduling a process that isn't runnable.


# 0384fff8 06-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# ed6aff73 18-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded <sys/buf.h> includes.

Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.


# 5929bcfa 27-Mar-2000 Philippe Charnier <charnier@FreeBSD.org>

Revert spelling mistake I made in the previous commit
Requested by: Alan and Bruce


# 956f3135 26-Mar-2000 Philippe Charnier <charnier@FreeBSD.org>

Spelling


# 7de47255 13-Mar-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused 3rd argument from vsunlock() which abused B_WRITE.


# 91c28bfd 05-Dec-1999 Luoqi Chen <luoqi@FreeBSD.org>

User ldt sharing.


# 12a69a6a 30-Oct-1999 Alan Cox <alc@FreeBSD.org>

Reverse the sense of the test in the KASSERT's from the last commit.


# 02c58685 30-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the
"rw" argument, rather than hijacking B_{READ|WRITE}.

Fix two bugs (physio & cam) resulting by the confusion caused by this.

Submitted by: Tor.Egge@fast.no
Reviewed by: alc, ken (partly)


# 923502ff 29-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 90ecac61 16-Sep-1999 Matthew Dillon <dillon@FreeBSD.org>

Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>

Replace various VM related page count calculations strewn over the
VM code with inlines to aid in readability and to reduce fragility
in the code where modules depend on the same test being performed
to properly sleep and wakeup.

Split out a portion of the page deactivation code into an inline
in vm_page.c to support vm_page_dontneed().

add vm_page_dontneed(), which handles the madvise MADV_DONTNEED
feature in a related commit coming up for vm_map.c/vm_object.c. This
code prevents degenerate cases where an essentially active page may
be rotated through a subset of the paging lists, resulting in premature
disposal.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 8928d440 18-Aug-1999 Peter Wemm <peter@FreeBSD.org>

Update for run queue code.


# d4da2dba 21-Jul-1999 Alan Cox <alc@FreeBSD.org>

Fix the following problem:

When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR: kern/12378
Submitted by: tegge
Reviewed by: dfr


# 6ea5bd80 19-Jun-1999 Alan Cox <alc@FreeBSD.org>

Remove some unused function and variable declarations.


# 637cae1d 05-Apr-1999 Peter Wemm <peter@FreeBSD.org>

Only use p->p_lock (manage by PHOLD()/PRELE()) - P_NOSWAP/P_PHYSIO is no
longer set.


# b1028ad1 19-Feb-1999 Luoqi Chen <luoqi@FreeBSD.org>

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


# 88c5ea45 25-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 6de60793 21-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

Removed low-memory blockages at fork. This is the wrong place to put
this sort of test. We need to fix the low-memory handling in general.


# 1c7c3c6a 21-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# dc9c271a 07-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Changes to the LINUX_THREADS support to only allocate extra memory for
shared signal handling when there is shared signal handling being
used.

This removes the main objection to making the shared signal handling
a standard ability in rfork() and friends and 'unconditionalising'
this code. (i.e. the allocation of an extra 328 bytes per process).

Signal handling information remains in the U area until such a time as
it's reference count would be incremented to > 1. At that point a new
struct is malloc'd and maintained in KVM so that it can be shared between
the processes (threads) using it.

A function to check the reference count and move the struct back to the U
area when it drops back to 1 is also supplied. Signal information is
therefore now swapable for all processes that are not sharing that
information with other processes. THis should addres the concerns raised
by Garrett and others.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 39fb8e6b 19-Dec-1998 Julian Elischer <julian@FreeBSD.org>

Fix two bogons created by 'patch(1)' in my last commit.


# 6626c604 18-Dec-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.


# 6cde7a16 13-Oct-1998 David Greenman <dg@FreeBSD.org>

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


# faa5f8d8 29-Sep-1998 Andrzej Bialecki <abial@FreeBSD.org>

Make #define NO_SWAPPING a normal kernel config option.

Reviewed by: jkh


# 917e476d 04-Mar-1998 Peter Dufault <dufault@FreeBSD.org>

Reviewed by: msmith, bde long ago
POSIX.4 headers and sysctl variables. Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.


# 303b270b 08-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Staticize.


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 95461b45 04-Feb-1998 John Dyson <dyson@FreeBSD.org>

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 2d8acc0f 22-Jan-1998 John Dyson <dyson@FreeBSD.org>

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 3a2dc656 10-Dec-1997 John Dyson <dyson@FreeBSD.org>

Fix the prototype for swapout_procs();
Submitted by: dima@best.net


# ceb0cf87 05-Dec-1997 John Dyson <dyson@FreeBSD.org>

Support an optional, sysctl enabled feature of idle process swapout. This
is apparently useful for large shell systems, or systems with long running
idle processes. To enable the feature:

sysctl -w vm.swap_idle_enabled=1

Please note that some of the other vm sysctl variables have been renamed
to be more accurate.
Submitted by: Much of it from Matt Dillon <dillon@best.net>


# 4a11ca4e 07-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 79624e21 31-Aug-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# b1037dcd 21-Aug-1997 Bruce Evans <bde@FreeBSD.org>

#include <machine/limits.h> explicitly in the few places that it is required.


# e108835b 13-Apr-1997 Peter Wemm <peter@FreeBSD.org>

Unused variable (upobj is now purely handled within pmap)


# 5856e12e 12-Apr-1997 John Dyson <dyson@FreeBSD.org>

Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.

Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.


# a2a1c95c 07-Apr-1997 Peter Wemm <peter@FreeBSD.org>

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 16a02c11 15-Jan-1997 Bruce Evans <bde@FreeBSD.org>

Removed redundant spl0()'s from kernel processes. They were work-arounds
for a bug in fork().


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# bc0d3334 29-Dec-1996 John Dyson <dyson@FreeBSD.org>

EEEK!!! useracc and kernacc didn't lock their respective
maps. Additionally, eliminate the map->hint distortion
associated with useracc. That may/may-not be the "right"
thing to do -- but time will tell.
Submitted by: Partially by Alan Cox <alc@cs.rice.edu>


# e9822d92 22-Dec-1996 Joerg Wunsch <joerg@FreeBSD.org>

Make DFLDSIZ and MAXDSIZ fully-supported options.

"Don't forget to do a ``make depend''" :-)


# a669a6e9 16-Oct-1996 John Dyson <dyson@FreeBSD.org>

Make processes waken up eligible for immediate swap-in.


# 675878e7 14-Oct-1996 John Dyson <dyson@FreeBSD.org>

Move much of the machine dependent code from vm_glue.c into
pmap.c. Along with the improved organization, small proc fork
performance is now about 5%-10% faster.


# dd106ca7 15-Sep-1996 Bruce Evans <bde@FreeBSD.org>

Removed iprintf(). It was copied to db_iprintf() in ddb.


# a6e6bcc5 01-Jul-1996 John Dyson <dyson@FreeBSD.org>

Properly set the PG_MAPPED and PG_WRITEABLE flags. This fixes some potential
problems with vm_map_remove/vm_map_delete.


# 877329e0 30-Jun-1996 John Dyson <dyson@FreeBSD.org>

Make -current consistant with -stable regarding time that a process
sleeps before being swapped out. The time is increased from 4 secs to
10 secs. Originally I had decreased it from 20 to 4, but that is a bit
severe. 20 is too long though.


# f35329ac 30-May-1996 John Dyson <dyson@FreeBSD.org>

This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also. There is still
a hang problem using X in small memory machines.


# b18bfc3d 17-May-1996 John Dyson <dyson@FreeBSD.org>

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


# a8c5fef5 02-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

KGDB is dead. It may come back one day if somebody does it.


# d3a34985 08-Apr-1996 John Dyson <dyson@FreeBSD.org>

Reinstitute the map lock for processes being swapped out. This
is needed because of the vm_fault used to bring the page table page
for the kernel stack (UPAGES) back in. The consequence of the
previous incorrect change was a system hang.


# b5cfb15f 07-Apr-1996 John Dyson <dyson@FreeBSD.org>

Map lock checks not needed anymore for swapping out. We don't use
map operations for it anymore. Certain deadlocks should never happen
anymore.


# 6ffde942 07-Apr-1996 Bruce Evans <bde@FreeBSD.org>

Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.


# 030ad080 02-Apr-1996 John Dyson <dyson@FreeBSD.org>

Fixed a problem that the UPAGES of a process were being run down
in a suboptimal manner. I had also noticed some panics that appeared
to be at least superficially caused by this problem. Also, included
are some minor mods to support more general handling of page table page
faulting. More details in a future commit.


# 8169788f 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.


# 1b67ec6d 10-Mar-1996 Jeffrey Hsu <hsu@FreeBSD.org>

For Lite2: proc LIST changes.
Reviewed by: davidg & bde


# 9ea85708 08-Mar-1996 John Dyson <dyson@FreeBSD.org>

Delay forking a process until there are more pages available. It was
possible to deadlock with the low threshold that we had used.


# de5f6a77 01-Mar-1996 John Dyson <dyson@FreeBSD.org>

1) Eliminate unnecessary bzero of UPAGES.
2) Eliminate unnecessary copying of pages during/after forks.
3) Add user map simplification.


# dabee6fe 23-Feb-1996 Peter Wemm <peter@FreeBSD.org>

kern_descrip.c: add fdshare()/fdcopy()
kern_fork.c: add the tiny bit of code for rfork operation.
kern/sysv_*: shmfork() takes one less arg, it was never used.
sys/shm.h: drop "isvfork" arg from shmfork() prototype
sys/param.h: declare rfork args.. (this is where OpenBSD put it..)
sys/filedesc.h: protos for fdshare/fdcopy.
vm/vm_mmap.c: add minherit code, add rounding to mmap() type args where
it makes sense.
vm/*: drop unused isvfork arg.

Note: this rfork() implementation copies the address space mappings,
it does not connect the mappings together. ie: once the two processes
have split, the pages may be shared, but the address space is not. If one
does a mmap() etc, it does not appear in the other. This makes it not
useful for pthreads, but it is useful in it's own right for having
light-weight threads in a static shared address space.

Obtained from: Original by Ron Minnich, extended by OpenBSD


# 5afce282 22-Feb-1996 David Greenman <dg@FreeBSD.org>

Add a "NO_SWAPPING" option to disable swapping. This was originally done
to help diagnose a problem on wcarchive (where the kernel stack was
sometimes not present), but is useful in its own right since swapping
actually reduces performance on some systems (such as wcarchive).
Note: swapping in this context means making the U pages pageable and has
nothing to do with generic VM paging, which is unaffected by this option.

Reviewed by: <dyson>


# 2c68345a 28-Jan-1996 David Greenman <dg@FreeBSD.org>

Added a check/panic for vm_map_find failing to find space for the page
tables/u-pages when forking. This is a "can't happen" case. :-)


# f782b11a 25-Jan-1996 Poul-Henning Kamp <phk@FreeBSD.org>

Don't use %r, we havn't got it anymore.
Submitted by: bde


# bd7e5f99 18-Jan-1996 John Dyson <dyson@FreeBSD.org>

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# 0e41ee30 04-Jan-1996 Garrett Wollman <wollman@FreeBSD.org>

Convert DDB to new-style option.


# 50c73f36 04-Jan-1996 Garrett Wollman <wollman@FreeBSD.org>

Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.


# f708ef1b 14-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Another mega commit to staticize things.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# d841aaa7 02-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Finished (?) cleaning up sysinit stuff.


# aef922f5 05-Nov-1995 John Dyson <dyson@FreeBSD.org>

Greatly simplify the msync code. Eliminate complications in vm_pageout
for msyncing. Remove a bug that manifests itself primarily on NFS
(the dirty range on the buffers is not set on msync.)


# a91c5a7e 22-Oct-1995 John Dyson <dyson@FreeBSD.org>

Get rid of machine-dependent NBPG and replace with PAGE_SIZE.


# 0bc3a911 15-Oct-1995 John Dyson <dyson@FreeBSD.org>

Remove an unnecessary tsleep in the swapin code. This tsleep
can defer swapping in processes and is just not the right thing to do.


# 0e27a03f 24-Sep-1995 John Dyson <dyson@FreeBSD.org>

Perform more checking for proper loading of the UPAGES when a process
is swapped in. Also, remove unnecessary map locking/unlocking during
selection of processes to be swapped out.

This code might afford proper panics as opposed to spontaneous reboots
on certain systems. This should allow us to debug these problems better.


# c87ad37e 16-Sep-1995 David Greenman <dg@FreeBSD.org>

Check the return value from vm_map_pageable() when mapping the process's
UPAGES and associated page table page. Panic on error. This is less than
optimial and will be fixed in the future, but is better than the old
behavior of panicing with a "kernel page directory invalid" in pmap_enter.


# 4590fd3a 09-Sep-1995 David Greenman <dg@FreeBSD.org>

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 24a1cce3 13-Jul-1995 David Greenman <dg@FreeBSD.org>

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 6306c897 10-Jul-1995 David Greenman <dg@FreeBSD.org>

swapout_threads() -> swapout_procs().


# f2daac0c 10-Jul-1995 David Greenman <dg@FreeBSD.org>

Increased global RSS limit to total RAM.


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# c3cb3e12 15-Apr-1995 David Greenman <dg@FreeBSD.org>

Moved some zero-initialized variables into .bss. Made code intended to be
called only from DDB #ifdef DDB. Removed some completely unused globals.


# 3aa12267 28-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.


# 2c0bc396 20-Feb-1995 David Greenman <dg@FreeBSD.org>

Panic if u_map allocation fails.


# ff16eeef 20-Feb-1995 David Greenman <dg@FreeBSD.org>

VM for the kernel stack and page tables doesn't need to be explicitly
deallocated as it isn't inherited across the fork.
Use vm_map_find not vm_allocate.

Submitted by: John Dyson


# 7aa3a026 13-Feb-1995 Poul-Henning Kamp <phk@FreeBSD.org>

YF Fix.


# a1f6d91c 02-Feb-1995 David Greenman <dg@FreeBSD.org>

swap_pager.c:
Fixed long standing bug in freeing swap space during object collapses.
Fixed 'out of space' messages from printing out too often.
Modified to use new kmem_malloc() calling convention.
Implemented an additional stat in the swap pager struct to count the
amount of space allocated to that pager. This may be removed at some
point in the future.
Minimized unnecessary wakeups.

vm_fault.c:
Don't try to collect fault stats on 'swapped' processes - there aren't
any upages to store the stats in.
Changed read-ahead policy (again!).

vm_glue.c:
Be sure to gain a reference to the process's map before swapping.
Be sure to lose it when done.

kern_malloc.c:
Added the ability to specify if allocations are at interrupt time or
are 'safe'; this affects what types of pages can be allocated.

vm_map.c:
Fixed a variety of map lock problems; there's still a lurking bug that
will eventually bite.

vm_object.c:
Explicitly initialize the object fields rather than bzeroing the struct.
Eliminated the 'rcollapse' code and folded it's functionality into the
"real" collapse routine.
Moved an object_unlock() so that the backing_object is protected in
the qcollapse routine.
Make sure nobody fools with the backing_object when we're destroying it.
Added some diagnostic code which can be called from the debugger that
looks through all the internal objects and makes certain that they
all belong to someone.

vm_page.c:
Fixed a rather serious logic bug that would result in random system
crashes. Changed pagedaemon wakeup policy (again!).

vm_pageout.c:
Removed unnecessary page rotations on the inactive queue.
Changed the number of pages to explicitly free to just free_reserved
level.

Submitted by: John Dyson


# 6d40c3d3 24-Jan-1995 David Greenman <dg@FreeBSD.org>

Added ability to detect sequential faults and DTRT. (swap_pager.c)
Added hook for pmap_prefault() and use symbolic constant for new third
argument to vm_page_alloc() (vm_fault.c, various)
Changed the way that upages and page tables are held. (vm_glue.c)
Fixed architectural flaw in allocating pages at interrupt time that was
introduced with the merged cache changes. (vm_page.c, various)
Adjusted some algorithms to acheive better paging performance and to
accomodate the fix for the architectural flaw mentioned above. (vm_pageout.c)
Fixed pbuf handling problem, changed policy on handling read-behind page.
(vnode_pager.c)

Submitted by: John Dyson


# 480dff54 10-Jan-1995 David Greenman <dg@FreeBSD.org>

Fixed some formatting weirdness that I overlooked in the previous commit.


# 0d94caff 09-Jan-1995 David Greenman <dg@FreeBSD.org>

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# dd0bd066 17-Dec-1994 David Greenman <dg@FreeBSD.org>

Change swapping policy to be a bit more aggressive about finding a
candidate for swapout. Increased default RSS limit to a minimum of 2MB.


# bfbfac11 12-Nov-1994 David Greenman <dg@FreeBSD.org>

Implemented swap locking via P_SWAPPING flag. It was possible for a process
to be chosen for swap-in while it was being swapped-out. This was BAD.

Submitted by: John Dyson


# 05f0fdd2 08-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc.
Reviewed by: davidg


# 11b224dc 12-Sep-1994 David Greenman <dg@FreeBSD.org>

Fixed a bug I introduced when fixing the rss limit code. Changed swapout
policy to be a bit more selective about what processes get swapped out.

Reviewed by: John Dyson


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 92a34c0c 09-Aug-1994 David Greenman <dg@FreeBSD.org>

Removed an old, obsolete call to vmmeter(). This is called now in the
schedcpu() routine in kern/kern_synch.c. This extra call to vmmeter() in
vm_glue.c was what was totally messing up the load average calculations.


# bbc0ec52 03-Aug-1994 David Greenman <dg@FreeBSD.org>

Integrated VM system improvements/fixes from FreeBSD-1.1.5.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources