History log of /freebsd-9.3-release/sys/vm/vm_contig.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 262933 08-Mar-2014 dumbbell

MFC vm_page_alloc_contig()

This function will be used by TTM, a memory manager used by the Radeon
KMS driver.

Compared to HEAD, the type of the "boundary" argument to several functions
is left as "unsigned long" (instead of "vm_paddr_t" in HEAD) to prevent an
API change in a stable branch.

The following revisions were merged in this single commit:

MFC r226928:
Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at
eliminating duplicated code in the various pmap implementations.

Micro-optimize vm_phys_free_pages().

Introduce vm_phys_free_contig(). It is fast routine for freeing an
arbitrary number of physically contiguous pages. In particular, it
doesn't require the number of pages to be a power of two.

Use "u_long" instead of "unsigned long".

Bruce Evans (bde@) has convinced me that the "boundary" parameters
to kmem_alloc_contig(), vm_phys_alloc_contig(), and
vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not
"u_long". Make this change.

MFC r227012:
Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist()
and use these new options in the mips pmap.

Wake up the page daemon in vm_page_alloc_freelist() if the number of free
and cached pages becomes too low.

Tidy up vm_page_alloc_init(). In particular, add a comment about an
important restriction on its use.

Tested by: jchandra@

MFC r227072:
Simplify the implementation of the failure case in kmem_alloc_attr().

MFC r227127:
Wake up the page daemon in vm_page_alloc_freelist() if it couldn't
allocate the requested page because too few pages are cached or free.

Document the VM_ALLOC_COUNT() option to vm_page_alloc() and
vm_page_alloc_freelist().

Make style changes to vm_page_alloc() and vm_page_alloc_freelist(),
such as using a variable name that more closely corresponds to the
comments.

MFC r227568:
Refactor the code that performs physically contiguous memory allocation,
yielding a new public interface, vm_page_alloc_contig(). This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig(). For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space. It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages. Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary. That said, at present, this new function can't be applied to all
types of vm objects. However, that restriction will be eliminated in the
coming weeks.

From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations. Now, vm_page_alloc_contig() is responsible
for these things.

Reviewed by: kib
Discussed with: jhb


# 262874 06-Mar-2014 dumbbell

MFC r226891:

Use "u_long" instead of "unsigned long".


# 262863 06-Mar-2014 dumbbell

MFC r226824:

contigmalloc(9) and contigfree(9) are now implemented in terms of other
more general VM system interfaces. So, their implementation can now
reside in kern_malloc.c alongside the other functions that are declared
in malloc.h.


# 240757 20-Sep-2012 alc

MFC r238456 and r238536
Various improvements to vm_contig_grow_cache().


# 233728 31-Mar-2012 kib

MFC r233100:
In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flag
if the filesystem performed short write and we are skipping the page
due to this.

Propogate write error from the pager back to the callers of
vm_pageout_flush(). Report the failure to write a page from the
requested range as the FALSE return value from vm_object_page_clean(),
and propagate it back to msync(2) to return EIO to usermode.

While there, convert the clearobjflags variable in the
vm_object_page_clean() and arguments of the helper functions to
boolean.

PR: kern/165927


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 224689 06-Aug-2011 alc

Fix an error in kmem_alloc_attr(). Unless "tries" is updated,
kmem_alloc_attr() could get stuck in a loop.

Approved by: re (kib)
MFC after: 3 days


# 217265 11-Jan-2011 jhb

Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by: bde


# 216807 29-Dec-2010 alc

There is no point in vm_contig_launder{,_page}() flushing held pages,
instead skip over them. As long as a page is held, it can't be reclaimed by
contigmalloc(M_WAITOK). Moreover, a held page may be undergoing
modification, e.g., vmapbuf(), so even if the hold were released before the
completion of contigmalloc(), the page might have to be flushed again.

MFC after: 3 weeks


# 215471 18-Nov-2010 kib

vm_pageout_flush() might cache the pages that finished write to the
backing storage. Such pages might be then reused, racing with the
assert in vm_object_page_collect_flush() that verified that dirty
pages from the run (most likely, pages with VM_PAGER_AGAIN status) are
write-protected still. In fact, the page indexes for the pages that
were removed from the object page list should be ignored by
vm_object_page_clean().

Return the length of successfully written run from vm_pageout_flush(),
that is, the count of pages between requested page and first page
after requested with status VM_PAGER_AGAIN. Supply the requested page
index in the array to vm_pageout_flush(). Use the returned run length
to forward the index of next page to clean in vm_object_page_clean().

Reported by: avg
Reviewed by: alc
MFC after: 1 week


# 209647 02-Jul-2010 alc

With the demise of page coloring, the page queue macros no longer serve any
useful purpose. Eliminate them.

Reviewed by: kib


# 208794 04-Jun-2010 jchandra

Make vm_contig_grow_cache() extern, and use it when vm_phys_alloc_contig()
fails to allocate MIPS page table pages. The current usage of VM_WAIT in
case of vm_phys_alloc_contig() failure is not correct, because:

"There is no guarantee that any of the available free (or cached) pages
after the VM_WAIT will fall within the range of suitable physical
addresses. Every time this function sleeps and a single page is freed
(or cached) by someone else, this function will be reawakened. With
a little bad luck, you could spin indefinitely."

We also add low and high parameters to vm_contig_grow_cache() and
vm_contig_launder() so that we restrict vm_contig_launder() to the range
of pages we are interested in.

Reported by: alc

Reviewed by: alc
Approved by: rrs (mentor)


# 208791 03-Jun-2010 kib

Do not leak vm page lock in vm_contig_launder(), vm_pageout_page_lock()
always returns with the page locked.

Submitted by: alc
Pointy hat to: kib


# 207846 10-May-2010 kib

Continue cleaning the queue instead of moving to the next queue or
bailing out if acquisition of page lock caused page position in the
queue to change.

Pointed out by: alc


# 207694 06-May-2010 kib

Add a helper function vm_pageout_page_lock(), similar to tegge'
vm_pageout_fallback_object_lock(), to obtain the page lock
while having page queue lock locked, and still maintain the
page position in a queue.

Use the helper to lock the page in the pageout daemon and contig launder
iterators instead of skipping the page if its lock is contested.
Skipping locked pages easily causes pagedaemon or launder to not make a
progress with page cleaning.

Proposed and reviewed by: alc


# 207552 03-May-2010 alc

The pages allocated by kmem_alloc_attr() and kmem_malloc() are unmanaged.
Consequently, neither the page lock nor the page queues lock is needed to
unwire and free them.


# 207519 02-May-2010 alc

This change addresses the race condition that was introduced by the previous
revision, r207450, to this file. Specifically, between dropping the page
queues lock in vm_contig_launder() and reacquiring it in
vm_contig_launder_page(), the page may be removed from the active or
inactive queue. It could be wired, freed, cached, etc. None of which
vm_contig_launder_page() is prepared for.

Reviewed by: kib, kmacy


# 207450 30-Apr-2010 kmacy

- acquire the page lock in vm_contig_launder_page before checking page fields
- release page queue lock before calling vm_pageout_flush


# 207410 29-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 206409 09-Apr-2010 alc

Introduce the function kmem_alloc_attr(), which allocates kernel virtual
memory with the specified physical attributes. In particular, like
kmem_alloc_contig(), the caller can specify the physical address range
from which the physical pages are allocated and the memory attributes
(i.e., cache behavior) for these physical pages. However, in contrast to
kmem_alloc_contig() or contigmalloc(), the physical pages that are
allocated by kmem_alloc_attr() are not necessarily physically contiguous.
This function is needed by DRM and VirtualBox.

Correct an error in the prototype for kmem_malloc(). The third argument
had the wrong type.

Tested by: rnoland
MFC after: 3 days


# 195649 12-Jul-2009 alc

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


# 195033 26-Jun-2009 alc

This change is the next step in implementing the cache control functionality
required by video card drivers. Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures. In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with: jhb


# 194376 17-Jun-2009 alc

Refactor contigmalloc() into two functions: a simple front-end that deals
with the malloc tag and calls a new back-end, kmem_alloc_contig(), that
allocates the pages and maps them.

The motivations for this change are two-fold: (1) A cache mode parameter
will be added to kmem_alloc_contig(). In other words, kmem_alloc_contig()
will be extended to support the allocation of memory with caller-specified
caching. (2) The UMA allocation function that is used by the two jumbo
frames zones can use kmem_alloc_contig() in place of contigmalloc() and
thereby avoid having free jumbo frames held by the zone counted as live
malloc()ed memory.


# 194337 17-Jun-2009 alc

Pass the size of the mapping to contigmapping() as a "vm_size_t" rather
than a "vm_pindex_t". A "vm_size_t" is more convenient for it to use.


# 194331 17-Jun-2009 alc

Make the maintenance of a page's valid bits by contigmalloc() more like
kmem_alloc() and kmem_malloc(). Specifically, defer the setting of the
page's valid bits until contigmapping() when the mapping is known to be
successful.


# 193521 05-Jun-2009 alc

Simplify contigfree().


# 192360 18-May-2009 kmacy

- back out direct map hack
- it is no longer needed


# 192207 16-May-2009 kmacy

apply band-aid to x86_64 systems with more physical memory than kmem by allocating from the direct map


# 175294 13-Jan-2008 attilio

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 175202 09-Jan-2008 attilio

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 173918 25-Nov-2007 alc

Make contigmalloc(9)'s page laundering more robust. Specifically, use
vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better
handle a lock-ordering problem. Consequently, trylock's failure on the
page's containing object no longer implies that the page cannot be
laundered.

MFC after: 6 weeks


# 173901 25-Nov-2007 alc

Tidy up: Add comments. Eliminate the pointless
malloc_type_allocated(..., 0) calls that occur when contigmalloc() has
failed. Eliminate the acquisition and release of the page queues lock
from vm_page_release_contig(). Rename contigmalloc2() to
contigmapping(), reflecting what it does.


# 172317 25-Sep-2007 alc

Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)


# 170816 16-Jun-2007 alc

Enable the new physical memory allocator.

This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.

Approved by: re


# 170529 11-Jun-2007 alc

Conditionally acquire Giant in vm_contig_launder_page().


# 170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 168852 19-Apr-2007 alc

Correct contigmalloc2()'s implementation of M_ZERO. Specifically,
contigmalloc2() was always testing the first physical page for PG_ZERO,
not the current page of interest.

Submitted by: Michael Plass
PR: 81301
MFC after: 1 week


# 166508 05-Feb-2007 alc

Change the free page queue lock from a spin mutex to a default (blocking)
mutex. With the demise of Alpha support, there is no longer a reason for
it to be a spin mutex.


# 164089 08-Nov-2006 alc

Ensure that the page's oflags field is initialized by contigmalloc().


# 163604 22-Oct-2006 alc

Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.


# 163259 12-Oct-2006 kmacy

sun4v requires TSBs (translation storage buffers) to be contiguous and be
size aligned requiring heavy usage of vm_page_alloc_contig

This change makes vm_page_alloc_contig SMP safe

Approved by: scottl (acting as backup for mentor rwatson)


# 161968 03-Sep-2006 alc

Make vm_page_release_contig() static.


# 161629 26-Aug-2006 alc

Prevent a call to contigmalloc() that asks for more physical memory than
the machine has from causing a panic.

Submitted by: Michael Plass
PR: 101668
MFC after: 3 days


# 156415 07-Mar-2006 tegge

Ignore dirty pages owned by "dead" objects.


# 156225 02-Mar-2006 tegge

Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must
be called without any vnode locks held. Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.


# 156224 02-Mar-2006 tegge

Hold extra reference to vm object while cleaning pages.


# 154989 29-Jan-2006 scottl

The change a few years ago of having contigmalloc start its scan at the top
of physical RAM instead of the bottom was a sound idea, but the implementation
left a lot to be desired. Scans would spend considerable time looking at
pages that are above of the address range given by the caller, and multiple
calls (like what happens in busdma) would spend more time on top of that
rescanning the same pages over and over.

Solve this, at least for now, with two simple optimizations. The first is
to not bother scanning high ordered pages that are outside of the provided
address range. Second is to cache the page index from the last successful
operation so that subsequent scans don't have to restart from the top. This
is conditional on the numpages argument being the same or greater between
calls.

MFC After: 2 weeks


# 154849 26-Jan-2006 alc

Plug a leak in the newer contigmalloc() implementation. Specifically, if
a multipage allocation was aborted midway, the pages that were already
allocated were not always returned to the free list.

Submitted by: tegge


# 154799 25-Jan-2006 alc

The previous revision incorrectly changed a switch statement into an if
statement. Specifically, a break statement that previously broke out of
the enclosing switch was not changed. Consequently, the enclosing loop
terminated prematurely.

This could result in "vm_page_insert: page already inserted" panics.

Submitted by: tegge


# 153940 31-Dec-2005 netchild

MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)

MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)


# 148997 12-Aug-2005 tegge

Check for marker pages when scanning active and inactive page queues.

Reviewed by: alc


# 147283 10-Jun-2005 green

The new contigmalloc(9) has a bad degenerate case where there were
many regions checked again and again despite knowing the pages
contained were not usable and only satisfied the alignment constraints
This case was compounded, especially for large allocations, by the
practice of looping from the top of memory so as to keep out of the
important low-memory regions. While the old contigmalloc(9) has the
same problem, it is not as noticeable due to looping from the low
memory to high.

This degenerate case is fixed, as well as reversing the sense of the
rest of the loops within it, to provide a tremendous speed increase.
This makes the best case O(n * VM overhead) much more likely than the
worst case O(4 * VM overhead). For comparison, the worst case for old
contigmalloc would be O(5 * VM overhead) in addition to its strategy
of turning used memory into free being highly pessimal.

Also, fix a bug that in practice most likely couldn't have been triggered,
int the new contigmalloc(9): it walked backwards from the end of memory
without accounting for how many pages it needed. Potentially, nonexistant
pages could have been mapped. This hasn't occurred because the kernel
generally requests as its first contigmalloc(9) a single page.

Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes
MFC After: 1 month
More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes


# 139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 138066 24-Nov-2004 delphij

Try to close a potential, but serious race in our VM subsystem.

Historically, our contigmalloc1() and contigmalloc2() assumes
that a page in PQ_CACHE can be unconditionally reused by busying
and freeing it. Unfortunatelly, when object happens to be not
NULL, the code will set m->object to NULL and disregard the fact
that the page is actually in the VM page bucket, resulting in
page bucket hash table corruption and finally, a filesystem
corruption, or a 'page not in hash' panic.

This commit has borrowed the idea taken from DragonFlyBSD's fix
to the VM fix by Matthew Dillon[1]. This version of patch will
do the following checks:

- When scanning pages in PQ_CACHE, check hold_count and
skip over pages that are held temporarily.
- For pages in PQ_CACHE and selected as candidate of being
freed, check if it is busy at that time.

Note: It seems that this is might be unrelated to kern/72539.

Obtained from: DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1]
Reminded by: Matt Dillon
Reworked by: alc
MFC After: 1 week


# 137168 03-Nov-2004 alc

The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.


# 136924 24-Oct-2004 alc

Acquire the vm object lock before rather than after calling
vm_page_sleep_if_busy(). (The motivation being to transition
synchronization of the vm_page's PG_BUSY flag from the global page queues
lock to the per-object lock.)


# 133185 05-Aug-2004 green

Turn on the new contigmalloc(9) by default. There should not actually
be a reason to use the old contigmalloc(9), but if desired, it the
vm.old_contigmalloc setting can be tuned/sysctld back to 0 for now.


# 132420 19-Jul-2004 green

Remove extraneous locks on the VM free page queue mutex; it is not
meant to be recursed upon, and could cauuse a deadlock inside the
new contigmalloc (vm.old_contigmalloc=0) code.

Submitted by: alc


# 132379 19-Jul-2004 green

Reimplement contigmalloc(9) with an algorithm which stands a greatly-
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.

The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less. The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block. If this is not enough, the subsequent passes
page out any unwired memory. To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.

The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed. Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.

There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used. Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month. It is safe to switch at
run-time to see the difference it makes.

A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig(). These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.

When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats. Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well. This invalidates the BUGS section of the
contigmalloc(9) manpage.


# 130502 14-Jun-2004 green

Make contigmalloc() more reliable:

1. Remove a race whereby contigmalloc() would deadlock against the
running processes in the system if they kept reinstantiating
the memory on the active and inactive page queues that it was
trying to flush out. The process doing the contigmalloc() would
sit in "swwrt" forever and the swap pager would be going at full
force, but never get anywhere. Instead of doing it until the
queues are empty, launder for as many iterations as there are
pages in the queue.
2. Do all laundering to swap synchronously; previously, the vnode
laundering was synchronous and the swap laundering not.
3. Increase the number of launder-or-allocate passes to three, from
two, while failing without bothering to do all the laundering on
the third pass if allocation was not possible. This effectively
gives exactly two chances to launder enough contiguous memory,
helpful with high memory churn where a lot of memory from one pass
to the next (and during a single laundering loop) becomes dirtied
again.

I can now reliably hot-plug hardware requiring a 256KB contigmalloc()
without having the kldload/cbb ithread sit around failing to make
progress, while running a busy X session. Previously, it took killing
X to get contigmalloc() to get further (that is, quiescing the system),
and even then contigmalloc() returned failure.


# 127961 06-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 126911 13-Mar-2004 alc

Remove GIANT_REQUIRED from contigfree().


# 126632 05-Mar-2004 alc

In the last revision, I introduced a physical contiguity check that is both
unnecessary and wrong. While it is necessary to verify that the page is
still free after dropping and reacquiring the free page queue lock, the
physical contiguity of the page can not change, making this check
unnecessary. This check was wrong in that it could cause an out-of-bounds
array access.

Tested by: rwatson


# 126479 02-Mar-2004 alc

Modify contigmalloc1() so that the free page queues lock is not held when
vm_page_free() is called. The problem with holding this lock is that it is
a spin lock and vm_page_free() may attempt the acquisition of a different
default-type lock.


# 125861 16-Feb-2004 alc

Correct a long-standing race condition in vm_contig_launder() that could
result in a panic "vm_page_cache: caching a dirty page, ...": Access to the
page must be restricted or removed before calling vm_page_cache(). This
race condition is identical in nature to that which was addressed by
vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275.

MFC after: 7 days


# 124513 14-Jan-2004 alc

Remove vm_page_alloc_contig(). It's now unused.


# 124353 10-Jan-2004 alc

- Unmanage pages allocated by contigmalloc1(). (There is no point in
having PV entries for these pages.)
- Remove splvm() and splx() calls.


# 124261 08-Jan-2004 alc

- Enable recursive acquisition of the mutex synchronizing access to the
free pages queue. This is presently needed by contigmalloc1().
- Move a sanity check against attempted double allocation of two pages
to the same vm object offset from vm_page_alloc() to vm_page_insert().
This provides better protection because double allocation could occur
through a direct call to vm_page_insert(), such as that by
vm_page_rename().
- Modify contigmalloc1() to hold the mutex synchronizing access to the
free pages queue while it scans vm_page_array in search of free pages.
- Correct a potential leak of pages by contigmalloc1() that I introduced
in revision 1.20: We must convert all cache queue pages to free pages
before we begin removing free pages from the free queue. Otherwise,
if we have to restart the scan because we are unable to acquire the
vm object lock that is necessary to convert a cache queue page to a
free page, we leak those free pages already removed from the free queue.


# 124195 06-Jan-2004 alc

Don't bother clearing PG_ZERO in contigmalloc1(), kmem_alloc(), or
kmem_malloc(). It serves no purpose.


# 121226 18-Oct-2003 alc

- Increase the object lock's scope in vm_contig_launder() so that access
to the object's type field and the call to vm_pageout_flush() are
synchronized.
- The above change allows for the eliminaton of the last parameter
to vm_pageout_flush().
- Synchronize access to the page's valid field in vm_pageout_flush()
using the containing object's lock.


# 118771 11-Aug-2003 bms

Add the mlockall() and munlockall() system calls.
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).

PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks


# 118076 27-Jul-2003 mux

Use pmap_zero_page() to zero pages instead of bzero() because
they haven't been vm_map_wire()'d yet.


# 118071 26-Jul-2003 alc

Acquire Giant rather than asserting it is held in contigmalloc(). This is
a prerequisite to removing further uses of Giant from UMA.


# 118029 25-Jul-2003 mux

Add support for the M_ZERO flag to contigmalloc().

Reviewed by: jeff


# 117262 05-Jul-2003 alc

Lock a vm object when freeing a page from it.


# 117143 01-Jul-2003 mux

Fix a few style(9) nits.


# 116226 11-Jun-2003 obrien

Use __FBSDID().


# 113955 24-Apr-2003 alc

- Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
whether its caller has locked the vm_object. (This is a temporary
measure to bootstrap vm_object locking.)


# 113458 13-Apr-2003 alc

Update locking on the kernel_object to use the new macros.


# 112569 24-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


# 108233 23-Dec-2002 alc

- Hold the kernel_object's lock around vm_page_insert(..., kernel_object,
...).


# 101304 04-Aug-2002 alc

o Extend the scope of the page queues lock in contigmalloc1().
o Replace vm_page_sleep_busy() with vm_page_sleep_if_busy()
in vm_contig_launder().


# 100779 27-Jul-2002 alc

o Require that the page queues lock is held on entry to vm_pageout_clean()
and vm_pageout_flush().
o Acquire the page queues lock before calling vm_pageout_clean()
or vm_pageout_flush().


# 100397 20-Jul-2002 alc

o Lock page queue accesses by vm_page_cache() in vm_contig_launder().
o Micro-optimize the control flow in vm_contig_launder().


# 100031 15-Jul-2002 alc

o Create vm_contig_launder() to replace code that appears twice
in contigmalloc1().


# 99850 12-Jul-2002 alc

o Lock some (unfortunately, not yet all) accesses to the page queues.


# 99427 05-Jul-2002 alc

o Lock accesses to the free page queues in contigmalloc1().


# 98226 14-Jun-2002 alc

o Use vm_map_wire() and vm_map_unwire() in place of vm_map_pageable() and
vm_map_user_pageable().
o Remove vm_map_pageable() and vm_map_user_pageable().
o Remove vm_map_clear_recursive() and vm_map_set_recursive(). (They were
only used by vm_map_pageable() and vm_map_user_pageable().)

Reviewed by: tegge


# 97088 21-May-2002 alc

o Make contigmalloc1() static.


# 91605 03-Mar-2002 alc

Call vm_pageq_remove_nowakeup() rather than duplicating it.


# 85070 17-Oct-2001 dillon

contigmalloc1() could cause the vm_page_zero_count to become incorrect.
Properly track the count.

Submitted by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>


# 84869 13-Oct-2001 dillon

Makes contigalloc[1]() create the vm_map / underlying wired pages in the
kernel map and object in a manner that contigfree() is actually able to
free. Previously contigfree() freed up the KVA space but could not
unwire & free the underlying VM pages due to mismatched pageability between
the map entry and the VM pages.

Submitted by: Thomas Moestl <tmoestl@gmx.net>
Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
MFC after: 3 days


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 79263 04-Jul-2001 dillon

Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)