History log of /freebsd-current/sys/vm/vm_reserv.c
Revision Date Author Comments
# e3537f92 03-Jun-2024 Doug Moore <dougm@FreeBSD.org>

Revert "subr_pctrie: use ilog2(x) instead of fls(x)-1"

This reverts commit 574ef650695088d56ea12df7da76155370286f9f.


# 574ef650 02-Jun-2024 Doug Moore <dougm@FreeBSD.org>

subr_pctrie: use ilog2(x) instead of fls(x)-1

In three instances where fls(x)-1 is used, the compiler does not know
that x is nonzero and so adds needless zero checks. Using ilog(x)
instead saves, in each instance, about 4 instructions, including a
conditional, and 16 or so bytes, on an amd64 build.

Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D45330


# 989a2cf1 09-Apr-2024 Minsoo Choo <minsoochoo0122@proton.me>

vm_reserv_reclaim_contig: Return NULL not false

Reviewed by: dougm, zlei
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44667


# 1526667b 06-Apr-2024 Doug Moore <dougm@FreeBSD.org>

vm_reserv: Add vm_reserv_is_populated

Add a function to check whether an aligned block of vm pages are
allocated, for use with impending changes to arm64 superpage
managment.

Reviewed by: alc
Differential Revision: http://reviews.freebsd.org/D44575


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 84e2ae64 12-Jan-2022 Doug Moore <dougm@FreeBSD.org>

vm_reserv: use enhanced bitstring for popmaps

vm_reserv.c uses its own bitstring implemenation for popmaps. Using
the bitstring_t type from a standard header eliminates the code
duplication, allows some bit-at-a-time operations to be replaced with
more efficient bitstring range operations, and, in
vm_reserv_test_contig, allows bit_ffc_area_at to more efficiently
search for a big-enough set of consecutive zero-bits.

Make bitstring changes improve the vm_reserv code. Define a bit_ntest
method to test whether a range of bits is all set, or all clear.
Define bit_ff_at and bit_ff_area_at to implement the ffs and ffc
versions with a parameter to choose between set- and clear- bits.
Improve the area_at implementation. Modify the bit_nset and
bit_nclear implementations to allow code optimization in the cases
when start or end are multiples of _BITSTR_BITS.

Add a few new cases to bitstring_test.

Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D33312


# f76916c0 30-Dec-2021 Doug Moore <dougm@FreeBSD.org>

vm_reserv: #include vm_extern.h explicitly, for arm.

Fixes: c606ab59e7f9 vm_extern: use standard address checkers everywhere


# c606ab59 30-Dec-2021 Doug Moore <dougm@FreeBSD.org>

vm_extern: use standard address checkers everywhere

Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D33685


# 49fd2d51 26-Dec-2021 Doug Moore <dougm@FreeBSD.org>

vm_reserv: fix zero-boundary error

Handle specially the boundary==0 case of vm_reserv_reclaim_config,
by turning off boundary adjustment in that case.

Reviewed by: alc
Tested by: pho, madpilot


# 0d5fac28 23-Dec-2021 Doug Moore <dougm@FreeBSD.org>

vm: alloc pages from reserv before breaking it

Function vm_reserv_reclaim_contig breaks a reservation with enough
free space to satisfy an allocation request and returns the free space
to the buddy allocator. Change the function to allocate the request
memory from the reservation before breaking it, and return that memory
to the caller. That avoids a second call to the buddy allocator and
guarantees successful allocation after breaking the reservation, where
that success is not currently guaranteed.

Reviewed by: alc, kib (previous version)
Differential Revision: https://reviews.freebsd.org/D33644


# f7aa4476 16-Dec-2021 Doug Moore <dougm@FreeBSD.org>

Correct type size format error in KASSERT.
Reported by: jenkins
Fixes: 6f1c8908272f vm: Don't break vm reserv that can't meet align reqs


# 6f1c8908 15-Dec-2021 Doug Moore <dougm@FreeBSD.org>

vm: Don't break vm reserv that can't meet align reqs

Function vm_reserv_test_contig has incorrectly used its alignment
and boundary parameters to find a well-positioned range of empty pages
in a reservation. Consequently, a reservation could be broken
mistakenly when it was unable to provide a satisfactory set of pages.

Rename the function, correct the errors, and add assertions to detect
the error in case it appears again.

Reviewed by: alc, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33344


# 9f32cb5b 05-Dec-2021 Doug Moore <dougm@FreeBSD.org>

Set uninitialized popmap bits in vm_reserv_init

In vm_reserv_init, set all the marker popmap bits in vm_reserv_init,
and not just the bits of the first popmap entry.

Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33258


# 968079f2 11-Mar-2021 Mark Johnston <markjdb@gmail.com>

vm_reserv: Fix list locking in vm_reserv_reclaim_contig()

The per-domain partpop queue is locked by the combination of the
per-domain lock and individual reservation mutexes.
vm_reserv_reclaim_contig() scans the queue looking for partially
populated reservations that can be reclaimed in order to satisfy the
caller's allocation.

During the scan, we drop the per-domain lock. At this point, the rvn
pointer may be invalidated. Take care to load rvn after re-acquiring
the per-domain lock.

While here, simplify the condition used to check whether a reservation
was dequeued while the per-domain lock was dropped.

Reviewed by: alc, kib
Reported by: gallatin
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29203


# 431fb8ab 18-Nov-2020 Mark Johnston <markj@FreeBSD.org>

vm_phys: Try to clean up NUMA KPIs

It can useful for code outside the VM system to look up the NUMA domain
of a page backing a virtual or physical address, specifically when
creating NUMA-aware data structures. We have _vm_phys_domain() for
this, but the leading underscore implies that it's an internal function,
and vm_phys.h has dependencies on a number of other headers.

Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to
vm_phys_domain(). Make the latter an inline function.

Add _vm_phys.h and define struct vm_phys_seg there so that it's easier
to use in other headers. Include it from vm_page.h so that
vm_page_domain() can be defined there.

Include machine/vmparam.h from _vm_phys.h since it depends directly on
some constants defined there.

Reviewed by: alc
Reviewed by: dougm, kib (earlier versions)
Differential Revision: https://reviews.freebsd.org/D27207


# 114484b7 23-Sep-2020 Mark Johnston <markj@FreeBSD.org>

Flag vm_reserv and vm_phys sysctls as MPSAFE.

Nothing in these subsystems relies on Giant.

MFC after: 1 week


# 7988971a 21-Sep-2020 D Scott Phillips <scottph@FreeBSD.org>

vm_reserv: Sparsify the vm_reserv_array when VM_PHYSSEG_SPARSE

On an Ampere Altra system, the physical memory is populated
sparsely within the physical address space, with only about 0.4%
of physical addresses backed by RAM in the range [0, last_pa].

This is causing the vm_reserv_array to be over-sized by a few
orders of magnitude, wasting roughly 5 GiB on a system with
256 GiB of RAM.

The sparse allocation of vm_reserv_array is controlled by defining
VM_PHYSSEG_SPARSE, with the dense allocation still remaining for
platforms with VM_PHYSSEG_DENSE.

Reviewed by: markj, alc, kib
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26130


# d869a17e 06-Mar-2020 Mark Johnston <markj@FreeBSD.org>

Use COUNTER_U64_DEFINE_EARLY() in places where it simplifies things.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23978


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# a314aba8 11-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vm: add missing CLTFLAG_MPSAFE annotations

This covers all vm/* files.


# b378d296 22-Nov-2019 Mark Johnston <markj@FreeBSD.org>

Fix locking in vm_reserv_reclaim_contig().

We were not properly handling the case where the trylock of the
reservaton fails, in which case we could leak reservation lock.

Introduce a marker reservation to implement precise scanning in
vm_reserv_reclaim_contig(). Before, a race could result in early
termination of the scan in rare situations. Use the marker's lock to
serialize scans of the partpop queue so that a global marker structure
can be used. Modify vm_reserv_reclaim_inactive() to handle the presence
of a marker while minimizing the hold time of domain-global locks.

Reviewed by: alc, jeff, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22392


# 63967687 19-Nov-2019 Jeff Roberson <jeff@FreeBSD.org>

Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory. Introduce vm_object_allocate_anon() to create these
objects. DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.

Reviewed by: alc, dougm, kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22119


# fe6d5344 18-Nov-2019 Mark Johnston <markj@FreeBSD.org>

Group per-domain reservation data in the same structure.

We currently have the per-domain partially populated reservation queues
and the per-domain queue locks. Define a new per-domain padded
structure to contain both of them. This puts the queue fields and lock
in the same cache line and avoids the false sharing within the old queue
array.

Also fix field packing in the reservation structure. In many places we
assume that a domain index fits in 8 bits, so we can do the same there
as well. This reduces the size of the structure by 8 bytes.

Update some comments while here. No functional change intended.

Reviewed by: dougm, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22391


# 3e5e1b51 18-Aug-2019 Jeff Roberson <jeff@FreeBSD.org>

Allocate amd64's page array using pages and page directory pages from the
NUMA domain that the pages describe. Patch original from gallatin.

Reviewed by: kib
Tested by: pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21252


# 6b821a74 16-Aug-2019 Aleksandr Rybalko <ray@FreeBSD.org>

Check paddr for overflow.
Fix panic on initialize of "vm reserv" per-superpage lock in case when RAM ends at upper boundary of address space.
Observed on ARM32 board BPI-R2 (2GB RAM 0x80000000-0xffffffff).

PR: 235362
Reviewed by: kib, markj, alc
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21272


# f96e8a0b 06-Jun-2019 Doug Moore <dougm@FreeBSD.org>

The means of finding ranges of free pages was changed for
vm_reserv_break in r348484, and there was found to improve performance
minutely and reduce code size. This change applies a similar change to
vm_reserv_reclaim_config, expecting similar benefits. This change also
allows quick rejection of page ranges that are unsuitable on account
of alignment or boundary issues, where those issues are processed a
page at a time in the current implementation. For contrived test
cases, this can make finding a reservation satisfying a major
alignment requirement around 30 times faster.

Tested by: pho
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20274


# 2d5039db 02-Jun-2019 Alan Cox <alc@FreeBSD.org>

Retire vm_reserv_extend_{contig,page}(). These functions were introduced
as part of a false start toward fine-grained reservation locking. In the
end, they were not needed, so eliminate them.

Order the parameters to vm_reserv_alloc_{contig,page}() consistently with
the vm_page functions that call them.

Update the comments about the locking requirements for
vm_reserv_alloc_{contig,page}(). They no longer require a free page
queues lock.

Wrap several lines that became too long after the "req" and "domain"
parameters were added to vm_reserv_alloc_{contig,page}().

Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20492


# b8590dae 31-May-2019 Doug Moore <dougm@FreeBSD.org>

The function vm_phys_free_contig invokes vm_phys_free_pages for every
power-of-two page block it frees, launching an unsuccessful search for
a buddy to pair up with each time. The only possible buddy-up mergers
are across the boundaries of the freed region, so change
vm_phys_free_contig simply to enqueue the freed interior blocks, via a
new function vm_phys_enqueue_contig, and then call vm_phys_free_pages
on the bounding blocks to create as big a cross-boundary block as
possible after buddy-merging.

The only callers of vm_phys_free_contig at the moment call it in
situations where merging blocks across the boundary is clearly
impossible, so just call vm_phys_enqueue_contig in those places and
avoid trying to buddy-up at all.

One beneficiary of this change is in breaking reservations. For the
case where memory is freed in breaking a reservation with only the
first and last pages allocated, the number of cycles consumed by the
operation drops about 11% with this change.

Suggested by: alc
Reviewed by: alc
Approved by: kib, markj (mentors)
Differential Revision: https://reviews.freebsd.org/D16901


# e67a5068 27-May-2019 Doug Moore <dougm@FreeBSD.org>

Reduce the code size and number of ffsl calls in vm_reserv_break. Use
xor to find where free ranges begin and end.

Tested by: pho
Reviewed by:alc
Approved by:markj, kib (mentors)
Differential Revision: https://reviews.freebsd.org/D20256


# f2a496d6 18-Jan-2019 Konstantin Belousov <kib@FreeBSD.org>

MI VM: Make it possible to set size of superpage at boot instead of compile time.

In order to allow single kernel to use PAE pagetables on i386 if
hardware supports it, and fall back to classic two-level paging
structures if not, superpage code should be able to adopt to either 2M
or 4M superpages size. There I make MI VM structures large enough to
track the biggest possible superpage, by allowing architecture to
define VM_NFREEORDER_MAX and VM_LEVEL_0_ORDER_MAX constants.
Corresponding VM_NFREEORDER and VM_LEVEL_0_ORDER symbols can be
defined as runtime values and must be less than the _MAX constants.
If architecture does not define _MAXs, it is assumed that _MAX ==
normal constant.

Reviewed by: markj
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18853


# 2ef6727e 06-Jul-2018 Jeff Roberson <jeff@FreeBSD.org>

Use the ticks since the last update to reduce hysteresis in the partpopq and
contention on the vm_reserv_domain lock.

This gives a roughly 8x speedup on will-it-scale fault1 on a 16 core machine.

Reviewed by: alc, kib, markj


# 2d3f4181 23-Mar-2018 Jeff Roberson <jeff@FreeBSD.org>

Fix two compliation problems on non-amd64 architectures.


# 72346b22 22-Mar-2018 Cy Schubert <cy@FreeBSD.org>

Fix build on i386 without INVARIANTS following r331369.

--- vm_reserv.o ---
In file included from /opt/src/svn-current/sys/vm/vm_reserv.c:48:
In file included from /opt/src/svn-current/sys/sys/counter.h:37:
./machine/counter.h:174:3: error: implicit declaration of function
'critical_enter' is invalid in C99 [-Werror,-Wimplicit-function-declarat
ion]
critical_enter();

Reviewed by: jeff@


# 5c930c89 22-Mar-2018 Jeff Roberson <jeff@FreeBSD.org>

Lock reservations with a dedicated lock in each reservation. Protect the
vmd_free_count with atomics.

This allows us to allocate and free from reservations without the free lock
except where a superpage is allocated from the physical layer, which is
roughly 1/512 of the operations on amd64.

Use the counter api to eliminate cache conention on counters.

Reviewed by: markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14707


# 30fbfdda 15-Mar-2018 Jeff Roberson <jeff@FreeBSD.org>

Eliminate pageout wakeup races. Take another step towards lockless
vmd_free_count manipulation. Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup. Only trigger
the pageout daemon on transitions between states. Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.

Reviewed by: markj, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14612


# f4af5959 07-Mar-2018 Jeff Roberson <jeff@FreeBSD.org>

Don't assert that the domain free lock is held until we're certain that
there is a valid reservation. This can trip erroneously when memory
falls within a domain but doesn't have the reservation initialized because
it does not meet size or alignment requirements.

Reported by: pho, mjg
Sponsored by: Netflix, Dell/EMC Isilon


# 5f70fb14 23-Feb-2018 Mark Johnston <markj@FreeBSD.org>

Correct some comments after r328954.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D14486


# ada27a3b 13-Feb-2018 Konstantin Belousov <kib@FreeBSD.org>

Cleanup unused page argument for vm_reserv_break().

Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14364


# c4be9169 13-Feb-2018 Konstantin Belousov <kib@FreeBSD.org>

Do not leak rv->psind in some specific situations.

Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver). Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting. Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld. In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.

Clean psind on vm_reserv_break() to avoid the situation.

Reported and tested by: Slava Shwartsman
Reviewed by: markj
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14335


# e2068d0b 06-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.

Reviewed by: markj, kib, gallatin
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14000


# 7a469c8e 12-Jan-2018 Jeff Roberson <jeff@FreeBSD.org>

Implement NUMA policy for kmem_*(9). This maintains compatibility with
reservations by giving each memory domain its own KVA space in vmem that
is naturally aligned on superpage boundaries.

Reviewed by: alc, markj, kib (some objections)
Sponsored by: Netflix, Dell/EMC Isilon
Tested by; pho
Differential Revision: https://reviews.freebsd.org/D13289


# ef435ae7 28-Nov-2017 Jeff Roberson <jeff@FreeBSD.org>

Move domain iterators into the page layer where domain selection should take
place. This makes the majority of the phys layer explicitly domain specific.

Reviewed by: markj, kib (some objections)
Discussed with: alc
Tested by: pho
Sponsored by: Netflix & Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13014


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 8b5e1472 23-Jul-2017 Alan Cox <alc@FreeBSD.org>

Utilize pmap_enter(..., psind=1) in vm_fault_soft_fast() on amd64. (The
Differential Revision discusses the benefits of this change.)

Add a function, vm_reserv_to_superpage(), that returns the superpage
containing the specified base page.

Reviewed by: kib, markj
Tested by: pho
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D11556


# 9ed01c32 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

All these files need sys/vmmeter.h, but now they got it implicitly
included via sys/pcpu.h.


# 920da7e4 28-Dec-2016 Alan Cox <alc@FreeBSD.org>

Relax the object type restrictions on vm_page_alloc_contig(). Specifically,
add support for object types that were previously prohibited because they
could contain PG_CACHED pages.

Roughly halve the number of radix trie operations performed by
vm_page_alloc_contig() using the same approach that is employed by
vm_page_alloc(). Also, eliminate the radix trie lookup performed with the
free page queues lock held.

Tidy up the handling of radix trie insert failures in vm_page_alloc() and
vm_page_alloc_contig().

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8878


# 3453bca8 12-Dec-2016 Alan Cox <alc@FreeBSD.org>

Eliminate every mention of PG_CACHED pages from the comments in the machine-
independent layer of the virtual memory system. Update some of the nearby
comments to eliminate redundancy and improve clarity.

In vm/vm_reserv.c, do not use hyphens after adverbs ending in -ly per
The Chicago Manual of Style.

Update the comment in vm/vm_page.h defining the four types of page queues to
reflect the elimination of PG_CACHED pages and the introduction of the
laundry queue.

Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8752


# 7667839a 15-Nov-2016 Alan Cox <alc@FreeBSD.org>

Remove most of the code for implementing PG_CACHED pages. (This change does
not remove user-space visible fields from vm_cnt or all of the references to
cached pages from comments. Those changes will come later.)

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8497


# c869e672 19-Dec-2015 Alan Cox <alc@FreeBSD.org>

Introduce a new mechanism for relocating virtual pages to a new physical
address and use this mechanism when:

1. kmem_alloc_{attr,contig}() can't find suitable free pages in the physical
memory allocator's free page lists. This replaces the long-standing
approach of scanning the inactive and inactive queues, converting clean
pages into PG_CACHED pages and laundering dirty pages. In contrast, the
new mechanism does not use PG_CACHED pages nor does it trigger a large
number of I/O operations.

2. on 32-bit MIPS processors, uma_small_alloc() and the pmap can't find
free pages in the physical memory allocator's free page lists that are
covered by the direct map. Tested by: adrian

3. ttm_bo_global_init() and ttm_vm_page_alloc_dma32() can't find suitable
free pages in the physical memory allocator's free page lists.

In the coming months, I expect that this new mechanism will be applied in
other places. For example, balloon drivers should use relocation to
minimize fragmentation of the guest physical address space.

Make vm_phys_alloc_contig() a little smarter (and more efficient in some
cases). Specifically, use vm_phys_segs[] earlier to avoid scanning free
page lists that can't possibly contain suitable pages.

Reviewed by: kib, markj
Glanced at: jhb
Discussed with: jeff
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4444


# 67b7e434 26-Nov-2015 Alan Cox <alc@FreeBSD.org>

Correct an error in vm_reserv_reclaim_contig(). In the highly unusual
case that the reservation contained "low", the starting position in the
popmap for the free page search was incorrectly calculated. The most
likely (and visible) symptom of this error was the assertion failure,
"vm_reserv_reclaim_contig: pa is too low".


# e0a63baa 06-Aug-2015 Alan Cox <alc@FreeBSD.org>

Introduce a sysctl for reporting the number of fully populated reservations.


# 966272ca3 07-Jun-2015 Alan Cox <alc@FreeBSD.org>

Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.

Differential Revision: https://reviews.freebsd.org/D2712
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division


# 6a93e36b 11-Apr-2015 Alan Cox <alc@FreeBSD.org>

Correct an off-by-one error in vm_reserv_reclaim_contig() that results in
an infinite loop.

Submitted by: Svatopluk Kraus
MFC after: 1 week


# ed9dd64b 14-Mar-2015 Ian Lepore <ian@FreeBSD.org>

Revert r279932; this is going to be fixed in the sbuf code instead.

PR: 195668


# f3b9fcf2 12-Mar-2015 Ian Lepore <ian@FreeBSD.org>

Nullterminate strings returned via sysctl.

PR: 195668


# 79d7993d 07-Mar-2015 Konstantin Belousov <kib@FreeBSD.org>

Fix function name in the panic message.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 09e5f3c4 22-Nov-2014 Alan Cox <alc@FreeBSD.org>

By the time that vm_reserv_init() runs, vm_phys_segs[] is initialized. Use
it instead of phys_avail[].

Discussed with: Svatopluk Kraus


# 64f096ee 09-Sep-2014 Alan Cox <alc@FreeBSD.org>

Fix a boundary case error in vm_reserv_alloc_contig(): If a reservation
isn't being allocated for the last of the requested pages, because a
reservation won't fit in the gap between allocated pages, then the
reservation structure shouldn't be initialized.

While I'm here, improve the nearby comments.

Reported by: jeff, pho
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division


# 3180f757 11-Jun-2014 Alan Cox <alc@FreeBSD.org>

Correct a bug in the management of the population map on big-endian
machines. Specifically, there was a mismatch between how the routine
allocation and deallocation operations accessed the population map
and how the aggressively optimized reservation-breaking operation
accessed it. So, problems only occurred when reservations were broken.
This change makes the routine operations access the population map in
the same way as the reservation breaking operation.

This bug was introduced in r259999.

PR: 187080
Tested by: jmg (on an "armeb" machine)
Sponsored by: EMC / Isilon Storage Division


# dd05fa19 07-Jun-2014 Alan Cox <alc@FreeBSD.org>

Add a page size field to struct vm_page. Increase the page size field when
a partially populated reservation becomes fully populated, and decrease this
field when a fully populated reservation becomes partially populated.

Use this field to simplify the implementation of pmap_enter_object() on
amd64, arm, and i386.

On all architectures where we support superpages, the cost of creating a
superpage mapping is roughly the same as creating a base page mapping. For
example, both kinds of mappings entail the creation of a single PTE and PV
entry. With this in mind, use the page size field to make the
implementation of vm_map_pmap_enter(..., MAP_PREFAULT_PARTIAL) a little
smarter. Previously, if MAP_PREFAULT_PARTIAL was specified to
vm_map_pmap_enter(), that function would only map base pages. Now, it will
create up to 96 base page or superpage mappings.

Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division


# a08c1515 28-Dec-2013 Alan Cox <alc@FreeBSD.org>

Add "popmap" assertions: The page being freed isn't already free, and the
page being allocated isn't already allocated.

Sponsored by: EMC / Isilon Storage Division


# ec179322 27-Dec-2013 Alan Cox <alc@FreeBSD.org>

MFp4 alc_popmap
Change the way that reservations keep track of which pages are in use.
Instead of using the page's PG_CACHED and PG_FREE flags, maintain a bit
vector within the reservation. This approach has a couple benefits.
First, it makes breaking reservations much cheaper because there are
fewer cache misses to identify the unused pages. Second, it is a pre-
requisite for supporting two or more reservation sizes.


# 9eab5484 17-Sep-2013 Konstantin Belousov <kib@FreeBSD.org>

PG_SLAB no longer serves a useful purpose, since m->object is no
longer abused to store pointer to slab. Remove it.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Approved by: re (hrs)


# 404eb1b3 12-May-2013 Alan Cox <alc@FreeBSD.org>

Refactor vm_page_alloc()'s interactions with vm_reserv_alloc_page() and
vm_page_insert() so that (1) vm_radix_lookup_le() is never called while the
free page queues lock is held and (2) vm_radix_lookup_le() is called at most
once. This change reduces the average time that the free page queues lock
is held by vm_page_alloc() as well as vm_page_alloc()'s average overall
running time.

Sponsored by: EMC / Isilon Storage Division


# 774d251d 17-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
resident/cached pages collections as the per-vm_page_t splay iterators
are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations. In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone. This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves. However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan. It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by: EMC / Isilon storage division
In collaboration with: alc, jeff
Tested by: flo, pho, jhb, davide
Tested by: ian (arm)
Tested by: andreast (powerpc)


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# 476f7f24 15-Jul-2012 Alan Cox <alc@FreeBSD.org>

Correct an off-by-one error in vm_reserv_alloc_contig() that resulted in
the last reservation of a multi-reservation allocation not being
initialized.


# 908e3da1 08-Apr-2012 Alan Cox <alc@FreeBSD.org>

If a page belonging a reservation is cached, then mark the reservation so
that it will be freed to the cache pool rather than the default pool.
Otherwise, the cached pages within the reservation may be recycled sooner
than necessary.

Reported by: Andrey Zonov


# c68c3537 05-Dec-2011 Alan Cox <alc@FreeBSD.org>

Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to
use superpage reservations. So, for the first time, kernel virtual memory
that is allocated by contigmalloc(), kmem_alloc_attr(), and
kmem_alloc_contig() can be promoted to superpages. In fact, even a series
of small contigmalloc() allocations may collectively result in a promoted
superpage.

Eliminate some duplication of code in vm_reserv_alloc_page().

Change the type of vm_reserv_reclaim_contig()'s first parameter in order
that it be consistent with other vm_*_contig() functions.

Tested by: marius (sparc64)


# 5c1f2cc4 29-Oct-2011 Alan Cox <alc@FreeBSD.org>

Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at
eliminating duplicated code in the various pmap implementations.

Micro-optimize vm_phys_free_pages().

Introduce vm_phys_free_contig(). It is fast routine for freeing an
arbitrary number of physically contiguous pages. In particular, it
doesn't require the number of pages to be a power of two.

Use "u_long" instead of "unsigned long".

Bruce Evans (bde@) has convinced me that the "boundary" parameters
to kmem_alloc_contig(), vm_phys_alloc_contig(), and
vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not
"u_long". Make this change.


# 00f0e671 26-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by: rwatson


# 85f2a0c9 18-Nov-2010 Max Laier <mlaier@FreeBSD.org>

Off by one page in vm_reserv_reclaim_contig(): Also reclaim reservations
with only a single free page if that satisfies the requested size.

MFC after: 3 days
Reviewed by: alc


# 2cf36c8f 10-Nov-2010 Alan Cox <alc@FreeBSD.org>

Enable reservation-based physical memory allocation. Even without the
creation of large page mappings in the pmap, it can provide modest
performance benefits. In particular, for a "buildworld" on a 2x 1GHz
Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system
time by 12.6%.

Tested by: marius@


# d689bc00 30-Oct-2010 Alan Cox <alc@FreeBSD.org>

Correct some format strings used by sysctls.

MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 4e657159 16-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Re-add r212370 now that the LOR in powerpc64 has been resolved:

Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.

Reviewed by: phk (original patch)


# 404a593e 13-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn


# dd67e210 09-Sep-2010 Matthew D Fleming <mdf@FreeBSD.org>

Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte. This behaviour was preserved, though it should not be necessary.

Reviewed by: phk


# ab5378cf 11-Apr-2009 Alan Cox <alc@FreeBSD.org>

Previously, when vm_page_free_toq() was performed on a page belonging to
a reservation, unless all of the reservation's pages were free, the
reservation was moved to the head of the partially-populated reservations
queue, where it would be the next reservation to be broken in case the
free page queues were emptied. Now, instead, I am moving it to the tail.
Very likely this reservation is in the process of being freed in its
entirety, so placing it at the tail of the queue makes it more likely that
the underlying physical memory will be returned to the free page queues as
one contiguous chunk. If a reservation must be broken, it will, instead,
be the longest unchanged reservation, which is arguably the reservation
that is least likely to ever achieve promotion or be freed in its entirety.

MFC after: 6 weeks


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 44aab2c3 06-Apr-2008 Alan Cox <alc@FreeBSD.org>

Introduce vm_reserv_reclaim_contig(). This function is used by
contigmalloc(9) as a last resort to steal pages from an inactive,
partially-used superpage reservation.

Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and
refactor it so that a separate subroutine is responsible for breaking
the selected reservation. This subroutine is also used by
vm_reserv_reclaim_contig().


# f8a47341 29-Dec-2007 Alan Cox <alc@FreeBSD.org>

Add the superpage reservation system. This is "part 2 of 2" of the
machine-independent support for superpages. (The earlier part was
the rewrite of the physical memory allocator.) The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.

Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64. (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)