History log of /freebsd-current/sys/vm/vm_radix.c
Revision Date Author Comments
# da76d349 03-May-2024 Bojan Novković <bnovkov@FreeBSD.org>

uma: Deduplicate uma_small_alloc

This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.

The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.

Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45084


# 10db91ec 12-Sep-2023 Doug Moore <dougm@FreeBSD.org>

vm_radix: add a missing paren

429c871ddddac4bbf6abf1eb9e2e6603f87c2ef5 left parens unbalanced in a
powerpc case that my testing missed. Restore balance.

Reported by: jenkins


# 429c871d 12-Sep-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: have vm_radix use pctrie code

Implement everything currently in vm_radix.c with calls to functions
in subr_pctrie.c, asccessed via the interface provided by the
DEFINE_PCTRIE_SMR macro.

Add back some #includes removed in the first attempt, and avoid the
use of a discontinued type in a bit of conditionally compiled code.

Reviewed by: alc, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D41344


# 6cec93da 11-Sep-2023 Doug Moore <dougm@FreeBSD.org>

Revert "radix_trie: have vm_radix use pctrie code"

This reverts commit a494d30465f21e8cb014a5c788a43001397325d7.


# a494d304 11-Sep-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: have vm_radix use pctrie code

Implement everything currently in vm_radix.c with calls to functions
in subr_pctrie.c, asccessed via the interface provided by the
DEFINE_PCTRIE_SMR macro.

Reviewed by: alc, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D41344


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# ac0572e6 30-Jul-2023 Doug Moore <dougm@FreeBSD.org>

radix_tree: compute slot from keybarr

The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.

This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.

Reviewed by: alc
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41235


# 38f5cb1b 30-Jul-2023 Doug Moore <dougm@FreeBSD.org>

radix_tree: redefine the clev field

The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.

For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.

This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.

Reviewed by: kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41226


# 2d2bcba7 28-Jul-2023 Doug Moore <dougm@FreeBSD.org>

Every path in a radix trie ends with a leaf or a NULL. By replacing
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.

This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.

Reviewed by: alc, kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41171


# 6f251ef2 19-Jul-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: simplify ge, le lookups

Replace the implementations of lookup_le and lookup_ge with ones
that do not use a stack or climb back up the tree, and instead
exploit the popmap field to quickly identify the place to resume
searching if the straightforward indexed search fails.

The code size of the original functions shrinks by a combined 160
bytes on amd64, and the cumulative cycle count per invocation of
the two functions together is reduced 20% in a buildworld test.

Reviewed by: alc, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40936


# 16e01c05 09-Jul-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: avoid code duplication in insert

Two cases in the insert routine are written differently, when
they're really doing the same thing. Writing that case only once
saves 208 bytes in the compiled vm_radix_insert code and reduces
instructions executed by about 2%.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40807


# 8df38859 07-Jul-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: replace node count with popmap

Replace the 'count' field in a trie node with a bitmap that
identifies non-NULL children. Drop the 'last' field, and use the
last bit set in the bitmap instead. In lookup_le, lookup_ge,
remove, and reclaim_all, use the bitmap to find the
previous/next/only/every non-null child in constant time by
examining the bitmask instead of looping across array elements
and null-checking them one-by-one.

A buildworld test suggests that this reduces the cycle count on
those functions that eliminate some null-checks by 4.9%, 1.5%,
0.0% and 13.3%.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40775


# da72505f 26-Jun-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: pass fewer params to node_get

Let node_get calculate it's own owner value. Don't pass the count
parameter, since it's always 2. Save 16 bytes in insert(). Move,
without modifying, slot and trimkey to handle use-before-declaration
problem.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D40723


# 9cfed089 27-Jun-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: clean up overlong lines

This is purely a cosmetic change. vm_radix.c has lines that reach past
column 80 and this change cleans that up. The associated changes to
subr_pctrie.c are just to keep mirroring vm_radix.c.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D40764


# 72c3a43b 26-Jun-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: skip compare in lookup_le, lookup_ge

In _lookup_ge, where a loop "looks for an available edge or val within
the current bisection node" (to quote the code comment), the value of
index has already been modified to guarantee that it is the least
value than can be found in the non-NULL child node being
examined. Therefore, if the non-NULL child is a leaf, there's no need
to compare 'index' to anything, and the value can just be returned.

The same is true for _lookup_le with 'most' replacing 'least'.
Reviewed by: alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D40746


# a42d8fe0 24-Jun-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: simplify trimkey functions

Replacing a branch and two shifts with a single masking operation saves 64 bytes the pair of functions lookup_le and lookup_ge on amd64. Refresh the associated comments.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D40722


# e8efee29 23-Jun-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: avoid reloading radix node

In the vm_radix:remove loop that searches for the last child, load
that child once, without loading it again after the search is over.
Change KASSERTS from index check to NULL node check.
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D40721


# 1efa7dbc 20-Jun-2023 Doug Moore <dougm@FreeBSD.org>

vm_radix: drop unused function; use bool.

Replace boolean_t with bool in vm_radix.c. Drop the unused function
vm_radix_is_singleton, which is unused and has no corresponding
function in subr_pctrie.c.
Reviewed by: alc
Differential Revision: <https://reviews.freebsd.org/D40586>


# 05963ea4 20-Jun-2023 Doug Moore <dougm@FreeBSD.org>

radix_trie: eliminate iteration in keydiff

Use flsll(), instead of a loop, to find where two keys differ, and
then arithmetic to transform that to a trie level.
Approved by: alc, markj
Differential Revision: https://reviews.freebsd.org/D40585


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# c3aa3bf9 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

vm: clean up empty lines in .c and .h files


# c79cee71 13-May-2020 Kyle Evans <kevans@FreeBSD.org>

kernel: provide panicky version of __unreachable

__builtin_unreachable doesn't raise any compile-time warnings/errors on its
own, so problems with its usage can't be easily detected. While it would be
nice for this situation to change and compilers to at least add a warning
for trivial cases where local state means the instruction can't be reached,
this isn't the case at the moment and likely will not happen.

This commit adds an __assert_unreachable, whose intent is incredibly clear:
it asserts that this instruction is unreachable. On INVARIANTS builds, it's
a panic(), and on non-INVARIANTS it expands to __unreachable().

Existing users of __unreachable() are converted to __assert_unreachable,
to improve debuggability if this assumption is violated.

Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D23793


# 3fba8868 06-Mar-2020 Mark Johnston <markj@FreeBSD.org>

Move SMR pointer type definition and access macros to smr_types.h.

The intent is to provide a header that can be included by other headers
without introducing too much pollution. smr.h depends on various
headers and will likely grow over time, but is less likely to be
required by system headers.

Rename SMR_TYPE_DECLARE() to SMR_POINTER():
- One might use SMR to protect more than just pointers; it
could be used for resizeable arrays, for example, so TYPE seems too
generic.
- It is useful to be able to define anonymous SMR-protected pointer
types and the _DECLARE suffix makes that look wrong.

Reviewed by: jeff, mjg, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23988


# cef81f8f 22-Feb-2020 Kyle Evans <kevans@FreeBSD.org>

vm_radix: prefer __builtin_unreachable() to an unreachable panic()

This provides the needed hint to GCC and offers an annotation for readers to
observe that it's in-fact impossible to hit this point. We'll get hit with a
a -Wswitch error if the enum applicable to the switch above were to get
expanded without the new value(s) being handled.


# 4b3dac72 19-Feb-2020 Jeff Roberson <jeff@FreeBSD.org>

Silence a gcc warning about no return from a function that handles every
possible enum in a switch statement. I verified that this emits nothing
as expected on clang. radix relies on constant propagation to eliminate
any branching from these access routines.

Reported by: lwhsu/tinderbox


# 1ddda2eb 19-Feb-2020 Jeff Roberson <jeff@FreeBSD.org>

Use SMR to provide a safe unlocked lookup for vm_radix.

The tree is kept correct for readers with store barriers and careful
ordering. The existing object lock serializes writers. Consumers
will be introduced in later commits.

Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D23446


# a3d799fb 24-Jun-2018 Mateusz Guzik <mjg@FreeBSD.org>

vm: stop passing M_ZERO when allocating radix nodes

Allocation explicitely initialized the 3 leading fields. The rest is an
array which is supposed to be NULL-ed prior to deallocation.

Delegate zeroing to the infrequently called object initializator.

This gets rid of one of the most common memset consumers.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D15989


# ae941b1b 06-Feb-2018 Gleb Smirnoff <glebius@FreeBSD.org>

Fix boot_pages calculation for machines that don't have UMA_MD_SMALL_ALLOC.

o Call uma_startup1() after initializing kmem, vmem and domains.
o Include 8 eight VM startup pages into uma_startup_count() calculation.
o Account for vmem_startup() and vm_map_startup() preallocating pages.
o Account for extra two allocations done by kmem_init() and vmem_create().
o Hardcode the place of execution of vm_radix_reserve_kva(). Using SYSINIT
allowed several other SYSINITs to sneak in before it, thus bumping
requirement for amount of boot pages.


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 8d6fbbb8 07-Nov-2017 Jeff Roberson <jeff@FreeBSD.org>

Replace manyinstances of VM_WAIT with blocking page allocation flags
similar to the kernel memory allocator.

This simplifies NUMA allocation because the domain will be known at wait
time and races between failure and sleeping are eliminated. This also
reduces boilerplate code and simplifies callers.

A wait primitive is supplied for uma zones for similar reasons. This
eliminates some non-specific VM_WAIT calls in favor of more explicit
sleeps that may be satisfied without new pages.

Reviewed by: alc, kib, markj
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon


# cd1241fb 19-Jul-2017 Konstantin Belousov <kib@FreeBSD.org>

Add pctrie_init() and vm_radix_init() to initialize generic pctrie and
vm_radix trie.

Existing vm_radix_init() function is renamed to vm_radix_zinit().
Inlines moved out of the _ headers.

Reviewed by: alc, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11661


# e94965d8 07-Dec-2016 Alan Cox <alc@FreeBSD.org>

Previously, vm_radix_remove() would panic if the radix trie didn't
contain a vm_page_t at the specified index. However, with this
change, vm_radix_remove() no longer panics. Instead, it returns NULL
if there is no vm_page_t at the specified index. Otherwise, it
returns the vm_page_t. The motivation for this change is that it
simplifies the use of radix tries in the amd64, arm64, and i386 pmap
implementations. Instead of performing a lookup before every remove,
the pmap can simply perform the remove.

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D8708


# 8804a2b0 02-Dec-2016 Alan Cox <alc@FreeBSD.org>

Eliminate a stale comment; vm_radix_prealloc() was replaced in r254141.

MFC after: 3 days


# 563a19d5 01-Dec-2016 Alan Cox <alc@FreeBSD.org>

During vm_page_cache()'s call to vm_radix_insert(), if vm_page_alloc() was
called to allocate a new page of radix trie nodes, there could be a call to
vm_radix_remove() on the same trie (of PG_CACHED pages) as the in-progress
vm_radix_insert(). With the removal of PG_CACHED pages, we can simplify
vm_radix_insert() and vm_radix_remove() by removing the flags on the root of
the trie that were used to detect this case and the code for restarting
vm_radix_insert() when it happened.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8664


# b66bb393 22-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 44f1c916 22-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename global cnt to vm_cnt to avoid shadowing.

To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from: arch@
Sponsored by: EMC / Isilon Storage Division


# 703b304f 08-Dec-2013 Alan Cox <alc@FreeBSD.org>

Eliminate a redundant parameter to vm_radix_replace().

Improve the wording of the comment describing vm_radix_replace().

Reviewed by: attilio
MFC after: 6 weeks
Sponsored by: EMC / Isilon Storage Division


# 776cad90 23-Aug-2013 Alan Cox <alc@FreeBSD.org>

Addendum to r254141: The call to vm_radix_insert() in vm_page_cache() can
reclaim the last preexisting cached page in the object, resulting in a call
to vdrop(). Detect this scenario so that the vnode's hold count is
correctly maintained. Otherwise, we panic.

Reported by: scottl
Tested by: pho
Discussed with: attilio, jeff, kib


# e946b949 09-Aug-2013 Attilio Rao <attilio@FreeBSD.org>

On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (older version)
Reviewed by: jeff
Tested by: pho, scottl


# 9f2e6008 11-May-2013 Alan Cox <alc@FreeBSD.org>

To reduce the amount of arithmetic performed in the various radix tree
functions, reverse the numbering scheme for the levels. The highest
numbered level in the tree now appears near the root instead of the leaves.

Sponsored by: EMC / Isilon Storage Division


# bb0e1de4 07-May-2013 Alan Cox <alc@FreeBSD.org>

Remove a redundant call to panic() from vm_radix_keydiff(). The assertion
before the loop accomplishes the same thing.

Sponsored by: EMC / Isilon Storage Division


# 2d4b9a64 04-May-2013 Alan Cox <alc@FreeBSD.org>

Optimize vm_radix_lookup_ge() and vm_radix_lookup_le(). Specifically,
change the way that these functions ascend the tree when the search for a
matching leaf fails at an interior node. Rather than returning to the root
of the tree and repeating the lookup with an updated key, maintain a stack
of interior nodes that were visited during the descent and use that stack
to resume the lookup at the closest ancestor that might have a matching
descendant.

Sponsored by: EMC / Isilon Storage Division
Reviewed by: attilio
Tested by: pho


# 82af926a 28-Apr-2013 Alan Cox <alc@FreeBSD.org>

Eliminate an unneeded call to vm_radix_trimkey() from vm_radix_lookup_le().
This call is clearing bits from the key that will be set again by the next
line.

Sponsored by: EMC / Isilon Storage Division


# 40076ebc 27-Apr-2013 Alan Cox <alc@FreeBSD.org>

Avoid some lookup restarts in vm_radix_lookup_{ge,le}().

Sponsored by: EMC / Isilon Storage Division


# 384875a3 21-Apr-2013 Alan Cox <alc@FreeBSD.org>

Simplify vm_radix_{add,dec}lev().

Sponsored by: EMC / Isilon Storage Division


# 880659fe 17-Apr-2013 Alan Cox <alc@FreeBSD.org>

When calculating the number of reserved nodes, discount the pages that will
be used to store the nodes.

Sponsored by: EMC / Isilon Storage Division


# a08f2cf6 15-Apr-2013 Alan Cox <alc@FreeBSD.org>

Although we perform path compression to reduce the height of the trie and
the number of interior nodes, we have previously created a level zero
interior node at the root of every non-empty trie, even when that node is
not strictly necessary, i.e., it has only one child. This change is the
second (and final) step in eliminating those unnecessary level zero interior
nodes. Specifically, it updates the deletion and insertion functions so
that they do not require a level zero interior node at the root of the trie.
For a "buildworld" workload, this change results in a 16.8% reduction in the
number of interior nodes allocated and a similar reduction in the average
execution time for lookup functions. For example, the average execution
time for a call to vm_radix_lookup_ge() is reduced by 22.9%.

Reviewed by: attilio, jeff (an earlier version)
Sponsored by: EMC / Isilon Storage Division


# 6f9c0b15 12-Apr-2013 Alan Cox <alc@FreeBSD.org>

Although we perform path compression to reduce the height of the trie and
the number of interior nodes, we always create a level zero interior node at
the root of every non-empty trie, even when that node is not strictly
necessary, i.e., it has only one child. This change is the first step in
eliminating those unnecessary level zero interior nodes. Specifically, it
updates all of the lookup functions so that they do not require a level zero
interior node at the root.

Reviewed by: attilio, jeff (an earlier version)
Sponsored by: EMC / Isilon Storage Division


# 2c899fed 06-Apr-2013 Alan Cox <alc@FreeBSD.org>

Micro-optimize the order of struct vm_radix_node's fields. Specifically,
arrange for all of the fields to start at a short offset from the
beginning of the structure.

Eliminate unnecessary masking of VM_RADIX_FLAGS from the root pointer in
vm_radix_getroot().

Sponsored by: EMC / Isilon Storage Division


# c1c82b36 06-Apr-2013 Alan Cox <alc@FreeBSD.org>

Simplify vm_radix_keybarr().

Sponsored by: EMC / Isilon Storage Division


# 72abda64 06-Apr-2013 Alan Cox <alc@FreeBSD.org>

Simplify vm_radix_insert().

Reviewed by: attilio
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# 96f1a842 03-Apr-2013 Alan Cox <alc@FreeBSD.org>

Replace the remaining uses of vm_radix_node_page() by vm_radix_isleaf() and
vm_radix_topage(). This transformation eliminates some unnecessary
conditional branches from the inner loops of vm_radix_insert(),
vm_radix_lookup{,_ge,_le}(), and vm_radix_remove().

Simplify the control flow of vm_radix_lookup_{ge,le}().

Reviewed by: attilio (an earlier version)
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# 3fc10b73 26-Mar-2013 Alan Cox <alc@FreeBSD.org>

Introduce vm_radix_isleaf() and use it in a couple places. As compared to
using vm_radix_node_page() == NULL, the compiler is able to generate one
less conditional branch when vm_radix_isleaf() is used. More use cases
involving the inner loops of vm_radix_insert(), vm_radix_lookup{,_ge,_le}(),
and vm_radix_remove() will follow.

Reviewed by: attilio
Sponsored by: EMC / Isilon Storage Division


# 652615dc 24-Mar-2013 Alan Cox <alc@FreeBSD.org>

Micro-optimize the control flow in a few places. Eliminate a panic call
that could never be reached in vm_radix_insert(). (If the pointer being
checked by the panic call were ever NULL, the immmediately preceding loop
would have already crashed on a NULL pointer dereference.)

Reviewed by: attilio (an earlier version)
Sponsored by: EMC / Isilon Storage Division


# 774d251d 17-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
resident/cached pages collections as the per-vm_page_t splay iterators
are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations. In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone. This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves. However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan. It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by: EMC / Isilon storage division
In collaboration with: alc, jeff
Tested by: flo, pho, jhb, davide
Tested by: ian (arm)
Tested by: andreast (powerpc)