History log of /freebsd-current/sys/vm/vm_pageout.h
Revision Date Author Comments
# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 81a34d37 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: retire pr_drain and use EVENTHANDLER(9) directly

The method was called for two different conditions: 1) the VM layer is
low on pages or 2) one of UMA zones of mbuf allocator exhausted.
This change 2) into a new event handler, but all affected network
subsystems modified to subscribe to both, so this change shall not
bring functional changes under different low memory situations.

There were three subsystems still using pr_drain: TCP, SCTP and frag6.
The latter had its protosw entry for the only reason to register its
pr_drain method.

Reviewed by: tuexen, melifaro
Differential revision: https://reviews.freebsd.org/D36164


# 89d2fb14 08-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Add interruptible variant of vm_wait(9), vm_wait_intr(9).

Also add msleep flags argument to vm_wait_doms(9).

Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652


# 245139c6 16-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix OOM handling of some corner cases.

In addition to pagedaemon initiating OOM, also do it from the
vm_fault() internals. Namely, if the thread waits for a free page to
satisfy page fault some preconfigured amount of time, trigger OOM.
These triggers are rate-limited, due to a usual case of several
threads of the same multi-threaded process to enter fault handler
simultaneously. The faults from pagedaemon threads participate in the
calculation of OOM rate, but are not under the limit.

Reviewed by: markj (previous version)
Tested by: pho
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D13671


# 54a3a114 13-May-2019 Mark Johnston <markj@FreeBSD.org>

Provide separate accounting for user-wired pages.

Historically we have not distinguished between kernel wirings and user
wirings for accounting purposes. User wirings (via mlock(2)) were
subject to a global limit on the number of wired pages, so if large
swaths of physical memory were wired by the kernel, as happens with
the ZFS ARC among other things, the limit could be exceeded, causing
user wirings to fail.

The change adds a new counter, v_user_wire_count, which counts the
number of virtual pages wired by user processes via mlock(2) and
mlockall(2). Only user-wired pages are subject to the system-wide
limit which helps provide some safety against deadlocks. In
particular, while sources of kernel wirings typically support some
backpressure mechanism, there is no way to reclaim user-wired pages
shorting of killing the wiring process. The limit is exported as
vm.max_user_wired, renamed from vm.max_wired, and changed from u_int
to u_long.

The choice to count virtual user-wired pages rather than physical
pages was done for simplicity. There are mechanisms that can cause
user-wired mappings to be destroyed while maintaining a wiring of
the backing physical page; these make it difficult to accurately
track user wirings at the physical page layer.

The change also closes some holes which allowed user wirings to succeed
even when they would cause the system limit to be exceeded. For
instance, mmap() may now fail with ENOMEM in a process that has called
mlockall(MCL_FUTURE) if the new mapping would cause the user wiring
limit to be exceeded.

Note that bhyve -S is subject to the user wiring limit, which defaults
to 1/3 of physical RAM. Users that wish to exceed the limit must tune
vm.max_user_wired.

Reviewed by: kib, ngie (mlock() test changes)
Tested by: pho (earlier version)
MFC after: 45 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19908


# f82dd310 20-Nov-2018 Ben Widawsky <bwidawsk@FreeBSD.org>

linuxkpi: Use pageproc instead of vmproc

According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages. vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary. It's a
somewhat legacy component of the system and isn't required. You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.

Based on this, we want pageproc to emulate kswapd, not vmproc.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D18061


# c3f4f28c 20-Nov-2018 Ben Widawsky <bwidawsk@FreeBSD.org>

linuxkpi: Add some basic swap functions

These are used by kms-drm to determine various heuristics relate
memory conditions.

The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.

This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h

Reviewed by: mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D18052


# 23984ce5 06-Sep-2018 Mark Johnston <markj@FreeBSD.org>

Avoid resource deadlocks when one domain has exhausted its memory. Attempt
other allowed domains if the requested domain is below the minimum paging
threshold. Block in fork only if all domains available to the forking
thread are below the severe threshold rather than any.

Submitted by: jeff
Reported by: mjg
Reviewed by: alc, kib, markj
Approved by: re (rgrimes)
Differential Revision: https://reviews.freebsd.org/D16191


# 5d3b3666 16-Mar-2018 Conrad Meyer <cem@FreeBSD.org>

Fix GCC build: Remove redundant pagedaemon_wakeup declaration

Introduced in r331018.

Reported by: kevans
Sponsored by: Dell EMC Isilon


# cd84455f 24-Feb-2018 Konstantin Belousov <kib@FreeBSD.org>

Hide all vm/vm_pageout.h content under #ifdef _KERNEL.

There are no parts useful for usermode applications in
vm/vm_pageout.h. Even for the specific applications like fstat and
lsof.

In my opinion, this protection is redundant and instead userspace
should not include the header at all. Since there are apparently
broken third party codebases, give them a bit of slack by providing
transitional period.

Reported by: julian
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2c0f13aa 20-Feb-2018 Konstantin Belousov <kib@FreeBSD.org>

vm_wait() rework.

Make vm_wait() take the vm_object argument which specifies the domain
set to wait for the min condition pass. If there is no object
associated with the wait, use curthread' policy domainset. The
mechanics of the wait in vm_wait() and vm_wait_domain() is supplied by
the new helper vm_wait_doms(), which directly takes the bitmask of the
domains to wait for passing min condition.

Eliminate pagedaemon_wait(). vm_domain_clear() handles the same
operations.

Eliminate VM_WAIT and VM_WAITPFAULT macros, the direct functions calls
are enough.

Eliminate several control state variables from vm_domain, unneeded
after the vm_wait() conversion.

Scetched and reviewed by: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation, Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D14384


# e2068d0b 06-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.

Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000


# 5c515efc 29-Dec-2017 Alan Cox <alc@FreeBSD.org>

After r327168, the variable "vm_pageout_wanted" can be static.

MFC after: 2 weeks


# 280d15cd 24-Dec-2017 Mark Johnston <markj@FreeBSD.org>

Fix two problems with the page daemon control loop.

Both issues caused the page daemon to erroneously go to sleep when
applications are consuming free pages at a high rate, leaving the
application threads blocked in VM_WAIT.

1) After completing an inactive queue scan, concurrent allocations may
have prevented the page daemon from meeting the v_free_min threshold.
In this case, the page daemon was going to sleep even when the
inactive queue contained plenty of clean pages.
2) pagedaemon_wakeup() may be called without the free queues lock held.
This can lead to a lost wakeup if a call occurs after the page daemon
clears vm_pageout_wanted but before going to sleep.

Fix 1) by ensuring that we start a new inactive queue scan immediately
if v_free_count < v_free_min after a prior scan.

Fix 2) by adding a new subroutine, pagedaemon_wait(), called from
vm_wait() and vm_waitpfault(). It wakes up the page daemon if either
vm_pages_needed or vm_pageout_wanted is false, and atomically sleeps
on v_free_count.

Reported by: jeff
Reviewed by: alc
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D13424


# 796df753 30-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

SPDX: Consider code from Carnegie-Mellon University.

Interesting cases, most likely from CMU Mach sources.


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# ac04195b 20-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

Move swapout code into vm/vm_swapout.c.

There is no NO_SWAPPING #ifdef left in the code.

Requested by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12663


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 9b43bc27 25-Feb-2017 Andriy Gapon <avg@FreeBSD.org>

call vm_lowmem hook in uma_reclaim_worker

A comment near kmem_reclaim() implies that we already did that.
Calling the hook is useful, because some handlers, e.g. ARC,
might be able to release significant amounts of KVA.

Now that we have more than one place where vm_lowmem hook is called,
use this change as an opportunity to introduce flags that describe
a reason for calling the hook. No handler makes use of the flags yet.

Reviewed by: markj, kib
MFC after: 1 week
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D9764


# 56ce0690 27-May-2016 Alan Cox <alc@FreeBSD.org>

The flag "vm_pages_needed" has long served two distinct purposes: (1) to
indicate that threads are waiting for free pages to become available and
(2) to indicate whether a wakeup call has been sent to the page daemon.
The trouble is that a single flag cannot really serve both purposes, because
we have two distinct targets for when to wakeup threads waiting for free
pages versus when the page daemon has completed its work. In particular,
the flag will be cleared by vm_page_free() before the page daemon has met
its target, and this can lead to the OOM killer being invoked prematurely.
To address this problem, a new flag "vm_pageout_wanted" is introduced.

Discussed with: jeff
Reviewed by: kib, markj
Tested by: markj
Sponsored by: EMC / Isilon Storage Division


# c869e672 19-Dec-2015 Alan Cox <alc@FreeBSD.org>

Introduce a new mechanism for relocating virtual pages to a new physical
address and use this mechanism when:

1. kmem_alloc_{attr,contig}() can't find suitable free pages in the physical
memory allocator's free page lists. This replaces the long-standing
approach of scanning the inactive and inactive queues, converting clean
pages into PG_CACHED pages and laundering dirty pages. In contrast, the
new mechanism does not use PG_CACHED pages nor does it trigger a large
number of I/O operations.

2. on 32-bit MIPS processors, uma_small_alloc() and the pmap can't find
free pages in the physical memory allocator's free page lists that are
covered by the direct map. Tested by: adrian

3. ttm_bo_global_init() and ttm_vm_page_alloc_dma32() can't find suitable
free pages in the physical memory allocator's free page lists.

In the coming months, I expect that this new mechanism will be applied in
other places. For example, balloon drivers should use relocation to
minimize fragmentation of the guest physical address space.

Make vm_phys_alloc_contig() a little smarter (and more efficient in some
cases). Specifically, use vm_phys_segs[] earlier to avoid scanning free
page lists that can't possibly contain suitable pages.

Reviewed by: kib, markj
Glanced at: jhb
Discussed with: jeff
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4444


# bc727596 03-Oct-2015 Alan Cox <alc@FreeBSD.org>

Reduce the scope of a variable to the only file where it is used.


# 85eeca35 17-Jul-2012 Alan Cox <alc@FreeBSD.org>

Move what remains of vm/vm_contig.c into vm/vm_pageout.c, where similar
code resides. Rename vm_contig_grow_cache() to vm_pageout_grow_cache().

Reviewed by: kib


# 126d6082 17-Mar-2012 Konstantin Belousov <kib@FreeBSD.org>

In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag
if the filesystem performed short write and we are skipping the page
due to this.

Propogate write error from the pager back to the callers of
vm_pageout_flush(). Report the failure to write a page from the
requested range as the FALSE return value from vm_object_page_clean(),
and propagate it back to msync(2) to return EIO to usermode.

While there, convert the clearobjflags variable in the
vm_object_page_clean() and arguments of the helper functions to
boolean.

PR: kern/165927
Reviewed by: alc
MFC after: 2 weeks


# 1e8a675c 18-Nov-2010 Konstantin Belousov <kib@FreeBSD.org>

vm_pageout_flush() might cache the pages that finished write to the
backing storage. Such pages might be then reused, racing with the
assert in vm_object_page_collect_flush() that verified that dirty
pages from the run (most likely, pages with VM_PAGER_AGAIN status) are
write-protected still. In fact, the page indexes for the pages that
were removed from the object page list should be ignored by
vm_object_page_clean().

Return the length of successfully written run from vm_pageout_flush(),
that is, the count of pages between requested page and first page
after requested with status VM_PAGER_AGAIN. Supply the requested page
index in the array to vm_pageout_flush(). Use the returned run length
to forward the index of next page to clean in vm_object_page_clean().

Reported by: avg
Reviewed by: alc
MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 17dca144 04-Jun-2010 Jayachandran C. <jchandra@FreeBSD.org>

Make vm_contig_grow_cache() extern, and use it when vm_phys_alloc_contig()
fails to allocate MIPS page table pages. The current usage of VM_WAIT in
case of vm_phys_alloc_contig() failure is not correct, because:

"There is no guarantee that any of the available free (or cached) pages
after the VM_WAIT will fall within the range of suitable physical
addresses. Every time this function sleeps and a single page is freed
(or cached) by someone else, this function will be reawakened. With
a little bad luck, you could spin indefinitely."

We also add low and high parameters to vm_contig_grow_cache() and
vm_contig_launder() so that we restrict vm_contig_launder() to the range
of pages we are interested in.

Reported by: alc

Reviewed by: alc
Approved by: rrs (mentor)


# 8c616246 05-May-2010 Konstantin Belousov <kib@FreeBSD.org>

Add a helper function vm_pageout_page_lock(), similar to tegge'
vm_pageout_fallback_object_lock(), to obtain the page lock
while having page queue lock locked, and still maintain the
page position in a queue.

Use the helper to lock the page in the pageout daemon and contig launder
iterators instead of skipping the page if its lock is contested.
Skipping locked pages easily causes pagedaemon or launder to not make a
progress with page cleaning.

Proposed and reviewed by: alc


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 2025d69b 29-Sep-2008 Konstantin Belousov <kib@FreeBSD.org>

Move the code for doing out-of-memory grass from vm_pageout_scan()
into the separate function vm_pageout_oom(). Supply a parameter for
vm_pageout_oom() describing a reason for the call.

Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone
is exhausted.

Reviewed by: alc
Tested by: pho, jhb
MFC after: 2 weeks


# da31e3aa 25-Nov-2007 Alan Cox <alc@FreeBSD.org>

Make contigmalloc(9)'s page laundering more robust. Specifically, use
vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better
handle a lock-ordering problem. Consequently, trylock's failure on the
page's containing object no longer implies that the page cannot be
laundered.

MFC after: 6 weeks


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 9799b417 19-Nov-2004 David Schultz <das@FreeBSD.org>

Disable U area swapping and remove the routines that create, destroy,
copy, and swap U areas.

Reviewed by: arch@


# 05eb3785 06-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 7a935082 18-Oct-2003 Alan Cox <alc@FreeBSD.org>

- Increase the object lock's scope in vm_contig_launder() so that access
to the object's type field and the call to vm_pageout_flush() are
synchronized.
- The above change allows for the eliminaton of the last parameter
to vm_pageout_flush().
- Synchronize access to the page's valid field in vm_pageout_flush()
using the containing object's lock.


# 8f60c087 03-Aug-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Change the layout policy of the swap_pager from a hardcoded width
striping to a per device round-robin algorithm.

Because of the policy of not attempting to retain previous swap
allocation on page-out, this means that a newly added swap device
almost instantly takes its 1/N share of the I/O load but it takes
somewhat longer for it to assume it's 1/N share of the pages if there
is plenty of space on the other devices.

Change the 8G total swapspace limitation to 8G per device instead
by using a per device blist rather than one global blist. This
reduces the memory footprint by 75% (typically a couple hundred
kilobytes) for the common case with one swapdevice but NSWAPDEV=4.

Remove the compile time constant limit of number of swap devices,
there is no limit now. Instead of a fixed size array, store the
per swapdev structure in a TAILQ.

Total swap space is still addressed by a 32 bit page number and
therefore the upper limit is now 2^42 bytes = 16TB (for i386).

We still do not allocate the first page of each device in order to
give some amount of protection to any bsdlabel at the start of the
device.

A new device is appended after the existing devices in the swap space,
no attempt is made to fill in holes left behind by swapoff (this can
trivially be changed should it ever become a problem).

The sysctl vm.nswapdev now reflects the number of currently configured
swap devices.

Rename vm_swap_size to swap_pager_avail for consistency with other
exported names.

Change argument type for vm_proc_swapin_all() and swap_pager_isswapped()
to be a struct swdevt pointer rather than an index.

Not changed: we are still using blists to manage the free space,
but since the swapspace is no longer fragmented by the striping
different resource managers might fare better.


# f92039a1 02-May-2003 Alan Cox <alc@FreeBSD.org>

Move a declaration to its proper place.


# 24b3046a 29-Apr-2003 Alan Cox <alc@FreeBSD.org>

Remove unused declarations and definitions.


# 875791f6 23-Apr-2003 Alan Cox <alc@FreeBSD.org>

- Move swap_pager_isswapped()'s prototype to a more logical place.


# b6e48e03 23-Apr-2003 Alan Cox <alc@FreeBSD.org>

- Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
whether its caller has locked the vm_object. (This is a temporary
measure to bootstrap vm_object locking.)


# ff2023a5 13-Jan-2003 Alan Cox <alc@FreeBSD.org>

Make vm_pageout_page_free() static.


# 92da00bb 15-Dec-2002 Matthew Dillon <dillon@FreeBSD.org>

This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after: 3 weeks


# 11caded3 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# a1287949 10-Mar-2002 Eivind Eklund <eivind@FreeBSD.org>

- Remove a number of extra newlines that do not belong here according to
style(9)
- Minor space adjustment in cases where we have "( ", " )", if(), return(),
while(), for(), etc.
- Add /* SYMBOL */ after a few #endifs.

Reviewed by: alc


# ef6020d1 19-Feb-2002 Mike Silbersack <silby@FreeBSD.org>

Changes to make the OOM killer much more effective:

- Allow the OOM killer to target processes currently locked in
memory. These very often are the ones doing the memory hogging.
- Drop the wakeup priority of processes currently sleeping while
waiting for their page fault to complete. In order for the OOM
killer to work well, the killed process and other system processes
waiting on memory must be allowed to wakeup first.

Reviewed by: dillon
MFC after: 1 week


# 8ec48c6d 10-Aug-2001 John Baldwin <jhb@FreeBSD.org>

- Remove asleep(), await(), and M_ASLEEP.
- Callers of asleep() and await() have been converted to calling tsleep().
The only caller outside of M_ASLEEP was the ata driver, which called both
asleep() and await() with spl-raised, so there was no need for the
asleep() and await() pair. M_ASLEEP was unused.

Reviewed by: jasone, peter


# c4473420 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 1c7c3c6a 21-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# 2d8acc0f 22-Jan-1998 John Dyson <dyson@FreeBSD.org>

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


# 925a3a41 11-Jan-1998 John Dyson <dyson@FreeBSD.org>

Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.


# ceb0cf87 05-Dec-1997 John Dyson <dyson@FreeBSD.org>

Support an optional, sysctl enabled feature of idle process swapout. This
is apparently useful for large shell systems, or systems with long running
idle processes. To enable the feature:

sysctl -w vm.swap_idle_enabled=1

Please note that some of the other vm sysctl variables have been renamed
to be more accurate.
Submitted by: Much of it from Matt Dillon <dillon@best.net>


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# e0c5a895 28-Nov-1996 John Dyson <dyson@FreeBSD.org>

Make the kernel smaller with at worst a neutral effect on perf by
de-inlining some VM calls. (Actually, I measured a small improvement.)


# 8169788f 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.


# 512fef80 20-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Completed function declarations and/or added prototypes.


# 3af76890 19-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused vars & funcs, make things static, protoize a little bit.


# aef922f5 05-Nov-1995 John Dyson <dyson@FreeBSD.org>

Greatly simplify the msync code. Eliminate complications in vm_pageout
for msyncing. Remove a bug that manifests itself primarily on NFS
(the dirty range on the buffers is not set on msync.)


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 24a1cce3 13-Jul-1995 David Greenman <dg@FreeBSD.org>

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# f6b04d2b 09-Apr-1995 David Greenman <dg@FreeBSD.org>

Changes from John Dyson and myself:

Fixed remaining known bugs in the buffer IO and VM system.

vfs_bio.c:
Fixed some race conditions and locking bugs. Improved performance
by removing some (now) unnecessary code and fixing some broken
logic.
Fixed process accounting of # of FS outputs.
Properly handle NFS interrupts (B_EINTR).

(various)
Replaced calls to clrbuf() with calls to an optimized routine
called vfs_bio_clrbuf().

(various FS sync)
Sync out modified vnode_pager backed pages.

ffs_vnops.c:
Do two passes: Sync out file data first, then indirect blocks.

vm_fault.c:
Fixed deadly embrace caused by acquiring locks in the wrong order.

vnode_pager.c:
Changed to use buffer I/O system for writing out modified pages. This
should fix the problem with the modification date previous not getting
updated. Also dramatically simplifies the code. Note that this is
going to change in the future and be implemented via VOP_PUTPAGES().

vm_object.c:
Fixed a pile of bugs related to cleaning (vnode) objects. The performance
of vm_object_page_clean() is terrible when dealing with huge objects,
but this will change when we implement a binary tree to keep the object
pages sorted.

vm_pageout.c:
Fixed broken clustering of pageouts. Fixed race conditions and other
lockup style bugs in the scanning of pages. Improved performance.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# f919ebde 01-Mar-1995 David Greenman <dg@FreeBSD.org>

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


# 480dff54 10-Jan-1995 David Greenman <dg@FreeBSD.org>

Fixed some formatting weirdness that I overlooked in the previous commit.


# 0d94caff 09-Jan-1995 David Greenman <dg@FreeBSD.org>

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 05f0fdd2 08-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Cosmetics: unused vars, ()'s, #include's &c &c to silence gcc.
Reviewed by: davidg


# 3301cc3c 21-Aug-1994 Paul Richards <paul@FreeBSD.org>

Made idempotent
Reviewed by:
Submitted by:


# f23b4c91 18-Aug-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources