Cross Reference: /freebsd-current/sys/vm/vm

History log of /freebsd-current/sys/vm/vm_phys.h
Revision	Date	Author	Comments
# e3537f92	03-Jun-2024	Doug Moore <dougm@FreeBSD.org>	Revert "subr_pctrie: use ilog2(x) instead of fls(x)-1" This reverts commit 574ef650695088d56ea12df7da76155370286f9f.
# 574ef650	02-Jun-2024	Doug Moore <dougm@FreeBSD.org>	subr_pctrie: use ilog2(x) instead of fls(x)-1 In three instances where fls(x)-1 is used, the compiler does not know that x is nonzero and so adds needless zero checks. Using ilog(x) instead saves, in each instance, about 4 instructions, including a conditional, and 16 or so bytes, on an amd64 build. Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D45330
# cb20a74c	03-Apr-2024	Stephen J. Kiernan <stevek@FreeBSD.org>	vm: add macro to mark arguments used when NUMA is defined This fixes compiler warnings when -Wunused-arguments is enabled and not quieted. Reviewed by: kib, markj Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D44623
# 95ee2897	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: two-line .h pattern Remove /^\s\\n \*\s+\$FreeBSD\$$\n/
# 9e817428	16-Jun-2023	Doug Moore <dougm@FreeBSD.org>	vm_phys: add binary segment search Replace several sequential searches for a segment that contains a phyiscal address with a call to a function that does it by binary search. In vm_page_reclaim_contig_domain_ext, find the first segment to reclaim from, and reclaim from each subsequent appropriate segment. Eliminate vm_phys_scan_contig. Reviewed by: alc, markj Differential Revision: https://reviews.freebsd.org/D40058
# 6062d9fa	05-Jun-2023	Mark Johnston <markj@FreeBSD.org>	vm_phys: Change the return type of vm_phys_unfree_page() to bool This is in keeping with the trend of removing uses of boolean_t, and the sole caller was implicitly converting it to a "bool". No functional change intended. Reviewed by: dougm, alc, imp, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D40401
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# 8119cdd3	29-Dec-2021	Doug Moore <dougm@FreeBSD.org>	vm_phys: hide vm_phys_set_pool It is only called in the file that defines it, so make it static and remove the declaration from the header. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33688
# 31991a5a	29-Sep-2021	Mitchell Horne <mhorne@FreeBSD.org>	minidump: De-duplicate is_dumpable() The function is identical in each minidump implementation, so move it to vm_phys.c. The only slight exception is powerpc where the function was public, for use in moea64_scan_pmap(). Reviewed by: kib, markj, imp (earlier version) MFC after: 2 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D31884
# 431fb8ab	18-Nov-2020	Mark Johnston <markj@FreeBSD.org>	vm_phys: Try to clean up NUMA KPIs It can useful for code outside the VM system to look up the NUMA domain of a page backing a virtual or physical address, specifically when creating NUMA-aware data structures. We have _vm_phys_domain() for this, but the leading underscore implies that it's an internal function, and vm_phys.h has dependencies on a number of other headers. Rename vm_phys_domain() to vm_page_domain(), and _vm_phys_domain() to vm_phys_domain(). Make the latter an inline function. Add _vm_phys.h and define struct vm_phys_seg there so that it's easier to use in other headers. Include it from vm_page.h so that vm_page_domain() can be defined there. Include machine/vmparam.h from _vm_phys.h since it depends directly on some constants defined there. Reviewed by: alc Reviewed by: dougm, kib (earlier versions) Differential Revision: https://reviews.freebsd.org/D27207
# ccfd886a	23-Oct-2020	Alan Cox <alc@FreeBSD.org>	Conditionally compile struct vm_phys_seg's md_first field. This field is only used by arm64's pmap. Reviewed by: kib, markj, scottph Differential Revision: https://reviews.freebsd.org/D26907
# 6f3b523c	14-Oct-2020	Konstantin Belousov <kib@FreeBSD.org>	Avoid dump_avail[] redefinition. Move dump_avail[] extern declaration and inlines into a new header vm/vm_dumpset.h. This fixes default gcc build for mips. Reviewed by: alc, scottph Tested by: kevans (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26741
# de031846	21-Sep-2020	D Scott Phillips <scottph@FreeBSD.org>	arm64/pmap: Sparsify pv_table Reviewed by: markj, kib Approved by: scottl (implicit) MFC after: 1 week Sponsored by: Ampere Computing, Inc. Differential Revision: https://reviews.freebsd.org/D26132
# 7988971a	21-Sep-2020	D Scott Phillips <scottph@FreeBSD.org>	vm_reserv: Sparsify the vm_reserv_array when VM_PHYSSEG_SPARSE On an Ampere Altra system, the physical memory is populated sparsely within the physical address space, with only about 0.4% of physical addresses backed by RAM in the range [0, last_pa]. This is causing the vm_reserv_array to be over-sized by a few orders of magnitude, wasting roughly 5 GiB on a system with 256 GiB of RAM. The sparse allocation of vm_reserv_array is controlled by defining VM_PHYSSEG_SPARSE, with the dense allocation still remaining for platforms with VM_PHYSSEG_DENSE. Reviewed by: markj, alc, kib Approved by: scottl (implicit) MFC after: 1 week Sponsored by: Ampere Computing, Inc. Differential Revision: https://reviews.freebsd.org/D26130
# 00e66147	21-Sep-2020	D Scott Phillips <scottph@FreeBSD.org>	Sparsify the vm_page_dump bitmap On Ampere Altra systems, the sparse population of RAM within the physical address space causes the vm_page_dump bitmap to be much larger than necessary, increasing the size from ~8 Mib to > 2 Gib (and overflowing `int` for the size). Changing the page dump bitmap also changes the minidump file format, so changes are also necessary in libkvm. Reviewed by: jhb Approved by: scottl (implicit) MFC after: 1 week Sponsored by: Ampere Computing, Inc. Differential Revision: https://reviews.freebsd.org/D26131
# c3aa3bf9	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	vm: clean up empty lines in .c and .h files
# 81302f1d	28-May-2020	Mark Johnston <markj@FreeBSD.org>	Fix boot on systems where NUMA domain 0 is unpopulated. - Add vm_phys_early_add_seg(), complementing vm_phys_early_alloc(), to ensure that segments registered during hammer_time() are placed in the right domain. Otherwise, since the SRAT is not parsed at that point, we just add them to domain 0, which may be incorrect and results in a domain with only several MB worth of memory. - Fix uma_startup1() to try allocating memory for zones from any domain. If domain 0 is unpopulated, the allocation will simply fail, resulting in a page fault slightly later during boot. - Change _vm_phys_domain() to return -1 for addresses not covered by the affinity table, and change vm_phys_early_alloc() to handle wildcard domains. This is necessary on amd64, where the page array is dense and pmap_page_array_startup() may allocate page table pages for non-existent page frames. Reported and tested by: Rafael Kitover <rkitover@gmail.com> Reviewed by: cem (earlier version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25001
# b7565d44	18-Aug-2019	Jeff Roberson <jeff@FreeBSD.org>	Encapsulate phys_avail manipulation in a set of simple routines. Add a NUMA aware boot time memory allocator that will be used to allocate early domain correct structures. Code partially submitted by gallatin. Reviewed by: gallatin, kib Tested by: pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21251
# 21943937	15-Aug-2019	Jeff Roberson <jeff@FreeBSD.org>	Move phys_avail definition into MI code. It is consumed in the MI layer and doing so adds more flexibility with less redundant code. Reviewed by: jhb, markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21250
# c1685086	06-Aug-2019	Jeff Roberson <jeff@FreeBSD.org>	Add two new kernel options to control memory locality on NUMA hardware. - UMA_XDOMAIN enables an additional per-cpu bucket for freed memory that was freed on a different domain from where it was allocated. This is only used for UMA_ZONE_NUMA (first-touch) zones. - UMA_FIRSTTOUCH sets the default UMA policy to be first-touch for all zones. This tries to maintain locality for kernel memory. Reviewed by: gallatin, alc, kib Tested by: pho, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20929
# b8590dae	31-May-2019	Doug Moore <dougm@FreeBSD.org>	The function vm_phys_free_contig invokes vm_phys_free_pages for every power-of-two page block it frees, launching an unsuccessful search for a buddy to pair up with each time. The only possible buddy-up mergers are across the boundaries of the freed region, so change vm_phys_free_contig simply to enqueue the freed interior blocks, via a new function vm_phys_enqueue_contig, and then call vm_phys_free_pages on the bounding blocks to create as big a cross-boundary block as possible after buddy-merging. The only callers of vm_phys_free_contig at the moment call it in situations where merging blocks across the boundary is clearly impossible, so just call vm_phys_enqueue_contig in those places and avoid trying to buddy-up at all. One beneficiary of this change is in breaking reservations. For the case where memory is freed in breaking a reservation with only the first and last pages allocated, the number of cycles consumed by the operation drops about 11% with this change. Suggested by: alc Reviewed by: alc Approved by: kib, markj (mentors) Differential Revision: https://reviews.freebsd.org/D16901
# f2a496d6	18-Jan-2019	Konstantin Belousov <kib@FreeBSD.org>	MI VM: Make it possible to set size of superpage at boot instead of compile time. In order to allow single kernel to use PAE pagetables on i386 if hardware supports it, and fall back to classic two-level paging structures if not, superpage code should be able to adopt to either 2M or 4M superpages size. There I make MI VM structures large enough to track the biggest possible superpage, by allowing architecture to define VM_NFREEORDER_MAX and VM_LEVEL_0_ORDER_MAX constants. Corresponding VM_NFREEORDER and VM_LEVEL_0_ORDER symbols can be defined as runtime values and must be less than the _MAX constants. If architecture does not define _MAXs, it is assumed that _MAX == normal constant. Reviewed by: markj Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18853
# 662e7fa8	20-Oct-2018	Mark Johnston <markj@FreeBSD.org>	Create some global domainsets and refactor NUMA registration. Pre-defined policies are useful when integrating the domainset(9) policy machinery into various kernel memory allocators. The refactoring will make it easier to add NUMA support for other architectures. No functional change intended. Reviewed by: alc, gallatin, jeff, kib Tested by: pho (part of a larger patch) MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17416
# 89ea39a7	26-Jun-2018	Alan Cox <alc@FreeBSD.org>	Update the physical page selection strategy used by vm_page_import() so that it does not cause rapid fragmentation of the free physical memory. Reviewed by: jeff, markj (an earlier version) Differential Revision: https://reviews.freebsd.org/D15976
# c33e3a64	31-Mar-2018	Jeff Roberson <jeff@FreeBSD.org>	Add a uma cache of free pages in the DEFAULT freepool. This gives us per-cpu alloc and free of pages. The cache is filled with as few trips to the phys allocator as possible by the use of a new vm_phys_alloc_npages() function which allocates as many as N pages. This code was originally by markj with the import function rewritten by me. Reviewed by: markj, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14905
# 146bf2c6	26-Mar-2018	Jeff Roberson <jeff@FreeBSD.org>	Move vm_ndomains to vm.h where it can be used with a single header include rather than requiring a half-dozen. Many non-vm files may want to know the number of valid domains. Sponsored by: Netflix, Dell/EMC Isilon
# e2068d0b	06-Feb-2018	Jeff Roberson <jeff@FreeBSD.org>	Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000
# b6715dab	13-Jan-2018	Jeff Roberson <jeff@FreeBSD.org>	Move VM_NUMA_ALLOC and DEVICE_NUMA under the single global config option NUMA. Sponsored by: Netflix, Dell/EMC Isilon Discussed with: jhb
# 6f4acaf4	12-Jan-2018	Jeff Roberson <jeff@FreeBSD.org>	Add support for NUMA domains to bus dma tags. This causes all memory allocated with a tag to come from the specified domain if it meets the other constraints provided by the tag. Automatically create a tag at the root of each bus specifying the domain local to that bus if available. Reviewed by: jhb, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13545
# 7a469c8e	12-Jan-2018	Jeff Roberson <jeff@FreeBSD.org>	Implement NUMA policy for kmem_*(9). This maintains compatibility with reservations by giving each memory domain its own KVA space in vmem that is naturally aligned on superpage boundaries. Reviewed by: alc, markj, kib (some objections) Sponsored by: Netflix, Dell/EMC Isilon Tested by; pho Differential Revision: https://reviews.freebsd.org/D13289
# 3f289c3f	12-Jan-2018	Jeff Roberson <jeff@FreeBSD.org>	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403
# ef435ae7	28-Nov-2017	Jeff Roberson <jeff@FreeBSD.org>	Move domain iterators into the page layer where domain selection should take place. This makes the majority of the phys layer explicitly domain specific. Reviewed by: markj, kib (some objections) Discussed with: alc Tested by: pho Sponsored by: Netflix & Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13014
# d2b677ce	27-Nov-2017	Mark Johnston <markj@FreeBSD.org>	Avoid unnecessary lookups when initializing the vm_page array. This gives a marginal improvement in the vm_page_array initialization time. Also garbage-collect the now-unused vm_phys_paddr_to_segind(). Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13270
# fe267a55	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.
# b20bf182	26-Nov-2017	Mark Johnston <markj@FreeBSD.org>	Move vm_phys_init_page() to vm_page.c. Suggested by: kib Reviewed by: alc, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13250
# 1dbf52e7	13-Oct-2017	Mateusz Guzik <mjg@FreeBSD.org>	Reduce traffic on vm_cnt.v_free_count The variable is modified with the highly contended page free queue lock. It unnecessarily shares a cacheline with purely read-only fields and is re-read after the lock is dropped in the page allocation code making the hold time longer. Pad the variable just like the others and store the value as found with the lock held instead of re-reading. Provides a modest 1%-ish speed up in concurrent page faults. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D12665
# f93f7cf1	07-Sep-2017	Mark Johnston <markj@FreeBSD.org>	Speed up vm_page_array initialization. We currently initialize the vm_page array in three passes: one to zero the array, one to initialize the "order" field of each page (necessary when inserting them into the vm_phys buddy allocator one-by-one), and one to initialize the remaining non-zero fields and individually insert each page into the allocator. Merge the three passes into one following a suggestion from alc: initialize vm_page fields in a single pass, and use vm_phys_free_contig() to efficiently insert physical memory segments into the buddy allocator. This reduces the initialization time to a third or a quarter of what it was before on most systems that I tested. Reviewed by: alc, kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12248
# dbbaf04f	03-Sep-2016	Mark Johnston <markj@FreeBSD.org>	Remove support for idle page zeroing. Idle page zeroing has been disabled by default on all architectures since r170816 and has some bugs that make it seemingly unusable. Specifically, the idle-priority pagezero thread exacerbates contention for the free page lock, and yields the CPU without releasing it in non-preemptive kernels. The pagezero thread also does not behave correctly when superpage reservations are enabled: its target is a function of v_free_count, which includes reserved-but-free pages, but it is only able to zero pages belonging to the physical memory allocator. Reviewed by: alc, imp, kib Differential Revision: https://reviews.freebsd.org/D7714
# 62d70a81	09-Apr-2016	John Baldwin <jhb@FreeBSD.org>	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782
# c869e672	19-Dec-2015	Alan Cox <alc@FreeBSD.org>	Introduce a new mechanism for relocating virtual pages to a new physical address and use this mechanism when: 1. kmem_alloc_{attr,contig}() can't find suitable free pages in the physical memory allocator's free page lists. This replaces the long-standing approach of scanning the inactive and inactive queues, converting clean pages into PG_CACHED pages and laundering dirty pages. In contrast, the new mechanism does not use PG_CACHED pages nor does it trigger a large number of I/O operations. 2. on 32-bit MIPS processors, uma_small_alloc() and the pmap can't find free pages in the physical memory allocator's free page lists that are covered by the direct map. Tested by: adrian 3. ttm_bo_global_init() and ttm_vm_page_alloc_dma32() can't find suitable free pages in the physical memory allocator's free page lists. In the coming months, I expect that this new mechanism will be applied in other places. For example, balloon drivers should use relocation to minimize fragmentation of the guest physical address space. Make vm_phys_alloc_contig() a little smarter (and more efficient in some cases). Specifically, use vm_phys_segs[] earlier to avoid scanning free page lists that can't possibly contain suitable pages. Reviewed by: kib, markj Glanced at: jhb Discussed with: jeff Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4444
# 6520495a	11-Jul-2015	Adrian Chadd <adrian@FreeBSD.org>	Add an initial NUMA affinity/policy configuration for threads and processes. This is based on work done by jeff@ and jhb@, as well as the numa.diff patch that has been circulating when someone asks for first-touch NUMA on -10 or -11. * Introduce a simple set of VM policy and iterator types. * tie the policy types into the vm_phys path for now, mirroring how the initial first-touch allocation work was enabled. * add syscalls to control changing thread and process defaults. * add a global NUMA VM domain policy. * implement a simple cascade policy order - if a thread policy exists, use it; if a process policy exists, use it; use the default policy. * processes inherit policies from their parent processes, threads inherit policies from their parent threads. * add a simple tool (numactl) to query and modify default thread/process policities. * add documentation for the new syscalls, for numa and for numactl. * re-enable first touch NUMA again by default, as now policies can be set in a variety of methods. This is only relevant for very specific workloads. This doesn't pretend to be a final NUMA solution. The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by 'sysctl vm.default_policy=rr'. This is only relevant if MAXMEMDOM is set to something other than 1. Ie, if you're using GENERIC or a modified kernel with non-NUMA, then this is a glorified no-op for you. Thank you to Norse Corp for giving me access to rather large (for FreeBSD!) NUMA machines in order to develop and verify this. Thank you to Dell for providing me with dual socket sandybridge and westmere v3 hardware to do NUMA development with. Thank you to Scott Long at Netflix for providing me with access to the two-socket, four-domain haswell v3 hardware. Thank you to Peter Holm for running the stress testing suite against the NUMA branch during various stages of development! Tested: * MIPS (regression testing; non-NUMA) * i386 (regression testing; non-NUMA GENERIC) * amd64 (regression testing; non-NUMA GENERIC) * westmere, 2 socket (thankyou norse!) * sandy bridge, 2 socket (thankyou dell!) * ivy bridge, 2 socket (thankyou norse!) * westmere-EX, 4 socket / 1TB RAM (thankyou norse!) * haswell, 2 socket (thankyou norse!) * haswell v3, 2 socket (thankyou dell) * haswell v3, 2x18 core (thankyou scott long / netflix!) * Peter Holm ran a stress test suite on this work and found one issue, but has not been able to verify it (it doesn't look NUMA related, and he only saw it once over many testing runs.) * I've tested bhyve instances running in fixed NUMA domains and cpusets; all seems to work correctly. Verified: * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different NUMA policies for processes under test. Review: This was reviewed through phabricator (https://reviews.freebsd.org/D2559) as well as privately and via emails to freebsd-arch@. The git history with specific attributes is available at https://github.com/erikarn/freebsd/ in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy). This has been reviewed by a number of people (stas, rpaulo, kib, ngie, wblock) but not achieved a clear consensus. My hope is that with further exposure and testing more functionality can be implemented and evaluated. Notes: * The VM doesn't handle unbalanced domains very well, and if you have an overly unbalanced memory setup whilst under high memory pressure, VM page allocation may fail leading to a kernel panic. This was a problem in the past, but it's much more easily triggered now with these tools. * This work only controls the path through vm_phys; it doesn't yet strongly/predictably affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory isn't really guaranteed in any way. That's next on my plate. Sponsored by: Norse Corp, Inc.; Dell
# cf82d811	08-May-2015	Adrian Chadd <adrian@FreeBSD.org>	oops - how'd i miss this. Sorry!
# 415d7cca	07-May-2015	Adrian Chadd <adrian@FreeBSD.org>	Add initial memory locality cost awareness to the VM, and include a basic ACPI SLIT table parser. For now this just exports the map via sysctl; it'll eventually be useful to userland when there's more useful NUMA support in -HEAD. * Add an optional mem_locality map; * add a mapping function taking from/to domain and returning the relative cost, or -1 if it's not available; * Add a very basic SLIT parser to x86 ACPI. Differential Revision: https://reviews.freebsd.org/D2460 Reviewed by: rpaulo, stas, jhb Sponsored by: Norse Corp, Inc (hardware, coding); Dell (hardware)
# d866a563	30-Dec-2014	Alan Cox <alc@FreeBSD.org>	The physical memory allocator supports the use of distinct free lists for managing pages from different address ranges. Generally speaking, this feature is used to increase the likelihood that physical pages are available that can meet special DMA requirements or can be accessed through a limited-coverage direct mapping (e.g., MIPS). However, prior to this change, the configuration of the free lists was static, i.e., it was determined at compile time. Consequentally, free lists could be created for address ranges that held no actual pages, for example, on 32-bit MIPS- based systems with 512 MB or less of physical memory. This change makes the creation of the free lists dynamic, i.e., it is based on the available physical memory at boot time. On 64-bit x86-based systems with 64 GB or more of physical memory, create free lists for managing pages with physical addresses below 4 GB. This change is to address reported problems with initializing devices that require the allocation of physical pages below 4 GB on some systems with 128 GB or more of physical memory. PR: 185727 Differential Revision: https://reviews.freebsd.org/D1274 Reviewed by: jhb, kib MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division
# 271f0f12	15-Nov-2014	Alan Cox <alc@FreeBSD.org>	Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the default on i386 PAE. Previously, VM_PHYSSEG_SPARSE could not be used on amd64 and i386 because vm_page_startup() would not create vm_page structures for the kernel page table pages allocated during pmap_bootstrap() but those vm_page structures are needed when the kernel attempts to promote the corresponding kernel virtual addresses to superpage mappings. To address this problem, a new public function, vm_phys_add_seg(), is introduced and vm_phys_init() is updated to reflect the creation of vm_phys_seg structures by calls to vm_phys_add_seg(). Discussed with: Svatopluk Kraus MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division
# 44f1c916	22-Mar-2014	Bryan Drewery <bdrewery@FreeBSD.org>	Rename global cnt to vm_cnt to avoid shadowing. To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division
# 449c2e92	07-Aug-2013	Konstantin Belousov <kib@FreeBSD.org>	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
# 7e226537	13-May-2013	Attilio Rao <attilio@FreeBSD.org>	o Add accessor functions to add and remove pages from a specific freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle(). Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff
# 43f48b65	15-Nov-2012	Konstantin Belousov <kib@FreeBSD.org>	Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h to vm/vm_phys.h, where it belongs. Requested and reviewed by: alc MFC after: 2 weeks
# b6de32bd	12-May-2012	Konstantin Belousov <kib@FreeBSD.org>	Add a facility to register a range of physical addresses to be used for allocation of fictitious pages, for which PHYS_TO_VM_PAGE() returns proper fictitious vm_page_t. The range should be de-registered after consumer stopped using it. De-inline the PHYS_TO_VM_PAGE() since it now carries code to iterate over registered ranges. A hash container might be developed instead of range registration interface, and fake pages could be put automatically into the hash, were PHYS_TO_VM_PAGE() could look them up later. This should be considered before the MFC of the commit is done. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
# fbd80bd0	16-Nov-2011	Alan Cox <alc@FreeBSD.org>	Refactor the code that performs physically contiguous memory allocation, yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb
# 5c1f2cc4	29-Oct-2011	Alan Cox <alc@FreeBSD.org>	Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt at eliminating duplicated code in the various pmap implementations. Micro-optimize vm_phys_free_pages(). Introduce vm_phys_free_contig(). It is fast routine for freeing an arbitrary number of physically contiguous pages. In particular, it doesn't require the number of pages to be a power of two. Use "u_long" instead of "unsigned long". Bruce Evans (bde@) has convinced me that the "boundary" parameters to kmem_alloc_contig(), vm_phys_alloc_contig(), and vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not "u_long". Make this change.
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# a3870a18	27-Jul-2010	John Baldwin <jhb@FreeBSD.org>	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc
# 49ca10d4	21-Jul-2010	Jayachandran C. <jchandra@FreeBSD.org>	Redo the page table page allocation on MIPS, as suggested by alc@. The UMA zone based allocation is replaced by a scheme that creates a new free page list for the KSEG0 region, and a new function in sys/vm that allocates pages from a specific free page list. This also fixes a race condition introduced by the UMA based page table page allocation code. Dropping the page queue and pmap locks before the call to uma_zfree, and re-acquiring them afterwards will introduce a race condtion(noted by alc@). The changes are : - Revert the earlier changes in MIPS pmap.c that added UMA zone for page table pages. - Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that is not directly mapped (in 32bit kernel). Normal page allocations will first try the HIGHMEM freelist and then the default(direct mapped) freelist. - Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int order, int req)' to vm/vm_page.c to allocate a page from a specified freelist. The MIPS page table pages will be allocated using this function from the freelist containing direct mapped pages. - Move the page initialization code from vm_phys_alloc_contig() to a new function vm_page_alloc_init(), and use this function to initialize pages in vm_page_alloc_freelist() too. - Split the function vm_phys_alloc_pages(int pool, int order) to create vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages(). Reviewed by: alc
# 3153e878	12-Jul-2009	Alan Cox <alc@FreeBSD.org>	Add support to the virtual memory system for configuring machine- dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)
# e999111a	25-Jun-2009	Alan Cox <alc@FreeBSD.org>	This change is the next step in implementing the cache control functionality required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# e35395ce	20-Dec-2007	Alan Cox <alc@FreeBSD.org>	Modify vm_phys_unfree_page() so that it no longer requires the given page to be in the free lists. Instead, it now returns TRUE if it removed the page from the free lists and FALSE if the page was not in the free lists. This change is required to support superpage reservations. Specifically, once reservations are introduced, a cached page can either be in the free lists or a reservation.
# 7bfda801	25-Sep-2007	Alan Cox <alc@FreeBSD.org>	Change the management of cached pages (PQ_CACHE) in two fundamental ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
# 8941dc44	14-Jul-2007	Alan Cox <alc@FreeBSD.org>	Eliminate two unused functions: vm_phys_alloc_pages() and vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to vm_phys_alloc_pages() and vm_phys_free_pages_locked() to vm_phys_free_pages(). Add comments regarding the need for the free page queues lock to be held by callers to these functions. No functional changes. Approved by: re (hrs)
# 11752d88	09-Jun-2007	Alan Cox <alc@FreeBSD.org>	Add a new physical memory allocator. However, do not yet connect it to the build. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re