#
7a79d066 |
|
09-Apr-2024 |
Bojan Novković <bnovkov@FreeBSD.org> |
vm: improve kstack_object pindex calculation to avoid pindex holes This commit replaces the linear transformation of kernel virtual addresses to kstack_object pindex values with a non-linear scheme that circumvents physical memory fragmentation caused by kernel stack guard pages. The new mapping scheme is used to effectively "skip" guard pages and assign pindices for non-guard pages in a contiguous fashion. The new allocation scheme requires that all default-sized kstack KVAs come from a separate, specially aligned region of the KVA space. For this to work, this commited introduces a dedicated per-domain kstack KVA arena used to allocate kernel stacks of default size. The behaviour on 32-bit platforms remains unchanged due to a significatly smaller KVA space. Aside from fullfilling the requirements imposed by the new scheme, a separate kstack KVA arena facilitates superpage promotion in the rest of kernel and causes most kstacks to have guard pages at both ends. Reviewed by: alc, kib, markj Tested by: markj Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D38852
|
#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
b8ebd99a |
|
13-Apr-2022 |
John Baldwin <jhb@FreeBSD.org> |
vm: Use __diagused for variables only used in KASSERT().
|
#
0f559a9f |
|
17-Oct-2021 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Make vmdaemon timeout configurable Make vmdaemon timeout configurable, so that one can adjust how often it runs. Here's a trick: set this to 1, then run 'limits -m 0 sh', then run whatever you want with 'ktrace -it XXX', and observe how the working set changes over time. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D22038
|
#
f13fa9df |
|
26-Apr-2020 |
Mark Johnston <markj@FreeBSD.org> |
Use a single VM object for kernel stacks. Previously we allocated a separate VM object for each kernel stack. However, fully constructed kernel stacks are cached by UMA, so there is no harm in using a single global object for all stacks. This reduces memory consumption and makes it easier to define a memory allocation policy for kernel stack pages, with the aim of reducing physical memory fragmentation. Add a global kstack_object, and use the stack KVA address to index into the object like we do with kernel_object. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24473
|
#
c99d0c58 |
|
28-Feb-2020 |
Mark Johnston <markj@FreeBSD.org> |
Add a blocking counter KPI. refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723
|
#
e9ceb9dd |
|
19-Feb-2020 |
Jeff Roberson <jeff@FreeBSD.org> |
Don't release xbusy on kmem pages. After lockless page lookup we will not be able to guarantee that they can be racquired without blocking. Reviewed by: kib Discussed with: markj Differential Revision: https://reviews.freebsd.org/D23506
|
#
d6e13f3b |
|
19-Jan-2020 |
Jeff Roberson <jeff@FreeBSD.org> |
Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033
|
#
9f5632e6 |
|
28-Dec-2019 |
Mark Johnston <markj@FreeBSD.org> |
Remove page locking for queue operations. With the previous reviews, the page lock is no longer required in order to perform queue operations on a page. It is also no longer needed in the page queue scans. This change effectively eliminates remaining uses of the page lock and also the false sharing caused by multiple pages sharing a page lock. Reviewed by: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22885
|
#
3c01c56b |
|
28-Dec-2019 |
Mark Johnston <markj@FreeBSD.org> |
Don't update per-page activation counts in the swapout code. This avoids duplicating the work of the page daemon's active queue scan. Moreover, this duplication was inconsistent: - PGA_REFERENCED is not counted in act_count unless pmap_ts_referenced() returned 0, but the page daemon always counts PGA_REFERENCED towards the activation count. - The swapout daemon always activates a referenced page, but the page daemon only does so when the containing object is mapped at least once. The main purpose of swapout_deactivate_pages() is to shrink the number of pages mapped into a given pmap. To do this without unmapping active pages, use the non-destructive pmap_is_referenced() instead of the destructive pmap_ts_referenced() and deactivate pages accordingly. This simplifies some future changes to the locking protocol for page queue state. Reviewed by: kib Discussed with: jeff Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22674
|
#
61a74c5c |
|
15-Dec-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626
|
#
9be9ea42 |
|
10-Dec-2019 |
Mark Johnston <markj@FreeBSD.org> |
Add a helper function to the swapout daemon's deactivation code. vm_swapout_object_deactivate_pages() is renamed to vm_swapout_object_deactivate(), and the loop body is moved into the new vm_swapout_object_deactivate_page(). This makes the code a bit easier to follow and is in preparation for some functional changes. Reviewed by: jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22651
|
#
5cff1f4d |
|
10-Dec-2019 |
Mark Johnston <markj@FreeBSD.org> |
Introduce vm_page_astate. This is a 32-bit structure embedded in each vm_page, consisting mostly of page queue state. The use of a structure makes it easy to store a snapshot of a page's queue state in a stack variable and use cmpset loops to update that state without requiring the page lock. This change merely adds the structure and updates references to atomic state fields. No functional change intended. Reviewed by: alc, jeff, kib Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D22650
|
#
0012f373 |
|
14-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(4/6) Protect page valid with the busy lock. Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594
|
#
63e97555 |
|
14-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(1/6) Replace busy checks with acquires where it is trival to do so. This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548
|
#
2288078c |
|
08-Oct-2019 |
Doug Moore <dougm@FreeBSD.org> |
Define macro VM_MAP_ENTRY_FOREACH for enumerating the entries in a vm_map. In case the implementation ever changes from using a chain of next pointers, then changing the macro definition will be necessary, but changing all the files that iterate over vm_map entries will not. Drop a counter in vm_object.c that would have an effect only if the vm_map entry count was wrong. Discussed with: alc Reviewed by: markj Tested by: pho (earlier version) Differential Revision: https://reviews.freebsd.org/D21882
|
#
e8bcf696 |
|
16-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Revert r352406, which contained changes I didn't intend to commit.
|
#
41fd4b94 |
|
16-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Fix a couple of nits in r352110. - Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page. Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639
|
#
11b57401 |
|
12-Sep-2019 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Use REFCOUNT_COUNT() to obtain refcount where appropriate. Refcount waiting will set some flag bits in the refcount value. Make sure these bits get cleared by using the REFCOUNT_COUNT() macro to obtain the actual refcount. Differential Revision: https://reviews.freebsd.org/D21620 Reviewed by: kib@, markj@ MFC after: 1 week Sponsored by: Mellanox Technologies
|
#
fee2a2fa |
|
09-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Change synchonization rules for vm_page reference counting. There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486
|
#
386eba08 |
|
23-Aug-2019 |
Mark Johnston <markj@FreeBSD.org> |
Make vm_pqbatch_submit_page() externally visible. It will become useful for the page daemon to be able to directly create a batch queue entry for a page, and without modifying the page structure. Rename vm_pqbatch_submit_page() to vm_page_pqbatch_submit() to keep the namespace consistent. No functional change intended. Reviewed by: alc, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21369
|
#
930b1952 |
|
21-Aug-2019 |
Mark Johnston <markj@FreeBSD.org> |
Don't requeue active pages in vm_swapout_object_deactivate_pages(). As of r332974 the page daemon does not requeue pages during a scan of the active queue, so there is not much value in doing so here either. Reviewed by: alc, dougm, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21343
|
#
0b26119b |
|
06-Aug-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
Cache kernel stacks in UMA. This gives us NUMA support, better concurrency, and more statistics. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20931
|
#
eeacb3b0 |
|
08-Jul-2019 |
Mark Johnston <markj@FreeBSD.org> |
Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247
|
#
f82dd310 |
|
20-Nov-2018 |
Ben Widawsky <bwidawsk@FreeBSD.org> |
linuxkpi: Use pageproc instead of vmproc According to markj@: pageproc contains the page daemon and laundry threads, which are responsible for managing the LRU page queues and writing back dirty pages. vmproc's main task is to swap out kernel stacks when the system is under memory pressure, and swap them back in when necessary. It's a somewhat legacy component of the system and isn't required. You can build a kernel without it by specifying "options NO_SWAPPING" (which is a somewhat misleading name), in which vm_swapout_dummy.c is compiled instead of vm_swapout.c. Based on this, we want pageproc to emulate kswapd, not vmproc. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D18061
|
#
c3f4f28c |
|
20-Nov-2018 |
Ben Widawsky <bwidawsk@FreeBSD.org> |
linuxkpi: Add some basic swap functions These are used by kms-drm to determine various heuristics relate memory conditions. The number of free swap pages is just a variable, and it can be much cheaper by either adding a new getter, or simply extern'ing swap_total. However, this patch opts to use the more expensive, existing interface - since this isn't an operation in a high per path. This allows us to remove some more gpl linuxkpi and do the follo kms-drm: git rm linuxkpi/gplv2/include/linux/swap.h Reviewed by: mmacy, Johannes Lundberg <johalun0@gmail.com> Approved by: emaste (mentor) Differential Revision: https://reviews.freebsd.org/D18052
|
#
2a843ae7 |
|
22-Oct-2018 |
Mark Johnston <markj@FreeBSD.org> |
Avoid a redundancy in a comment updated by r339601. Reported by: alc X-MFC with: r339601
|
#
b0058196 |
|
22-Oct-2018 |
Mark Johnston <markj@FreeBSD.org> |
Swap in processes unless there's a global memory shortage. On NUMA systems, we would not swap in processes unless all domains had some free pages. This is too conservative in general. Instead, permit swapins so long as at least one domain has free pages, and add a kernel stack NUMA policy which ensures that we will try to allocate kernel stack pages from any domain. Reported and tested by: pho, Jan Bramkamp <crest@bultmann.eu> Reviewed by: alc, kib Discussed with: jeff MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17304
|
#
c1344d2b |
|
13-Aug-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Prevent some parallel swap-ins, rate-limit swapper swap-ins. If faultin() was called outside swapper (from PHOLD()), do not allow swapper to initiate additional swap-ins. Swapper' initiated swap-ins are serialized because they are synchronous and executed in the context of the thread0. With the added limitation, we only allow parallel swap-ins from PHOLD(), which is up to PHOLD() users to manage, usually they do not need to. Rate-limit swapper' swap-ins to one in the MAXSLP / 2 seconds interval, counting faultin() swapins. Suggested by: alc Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16610
|
#
a70e9a13 |
|
04-Aug-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Swap in WKILLED processes. Swapped-out process that is WKILLED must be swapped in as soon as possible. The reason is that such process can be killed by OOM and its pages can be only freed if the process exits. To exit, the kernel stack of the process must be mapped. When allocating pages for the stack of the WKILLED process on swap in, use VM_ALLOC_SYSTEM requests to increase the chance of the allocation to succeed. Add counter of the swapped out processes to avoid unneeded iteration over the allprocs list when there is no work to do, reducing the allproc_lock ownership. Reviewed by: alc, markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16489
|
#
5cd29d0f |
|
24-Apr-2018 |
Mark Johnston <markj@FreeBSD.org> |
Improve VM page queue scalability. Currently both the page lock and a page queue lock must be held in order to enqueue, dequeue or requeue a page in a given page queue. The queue locks are a scalability bottleneck in many workloads. This change reduces page queue lock contention by batching queue operations. To detangle the page and page queue locks, per-CPU batch queues are used to reference pages with pending queue operations. The requested operation is encoded in the page's aflags field with the page lock held, after which the page is enqueued for a deferred batch operation. Page queue scans are similarly optimized to minimize the amount of work performed with a page queue lock held. Reviewed by: kib, jeff (previous versions) Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14893
|
#
1d3a1bcf |
|
07-Feb-2018 |
Mark Johnston <markj@FreeBSD.org> |
Dequeue wired pages lazily. Previously, wiring a page would cause it to be removed from its page queue. In the common case, unwiring causes it to be enqueued at the tail of that page queue. This change modifies vm_page_wire() to not dequeue the page, thus avoiding the highly contended page queue locks. Instead, vm_page_unwire() takes care of requeuing the page as a single operation, and the page daemon dequeues wired pages as they are encountered during a queue scan to avoid needlessly revisiting them later. For pages in PQ_ACTIVE we do even better, since a requeue is unnecessary. The change improves scalability for some common workloads. For instance, threads wiring pages into the buffer cache no longer need to modify global page queues, and unwiring is usually done by the bufspace thread, so concurrency is not as much of an issue. As another example, many sysctl handlers wire the output buffer to avoid faults on copyout, and since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid modifying the page queue in this case. The change also adds a block comment describing some properties of struct vm_page's reference counters, and the busy lock. Reviewed by: jeff Discussed with: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D11943
|
#
e2068d0b |
|
06-Feb-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Use per-domain locks for vm page queue free. Move paging control from global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000
|
#
33937731 |
|
04-Jan-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Restructure swapout tests after vm map locking was removed. Consolidate the regions covered by the process lock. Combine similar conditions tests into one, e.g. all process flags can be test with one logical operation. Add check for in-exec state, since p_vmspace is dererenced. Remove labels and goto by explicitly tracking state. Update comments. Reviewed by: alc, markj (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D13693
|
#
36ca312d |
|
03-Jan-2018 |
Alan Cox <alc@FreeBSD.org> |
Once we have decided to swap out a process, don't delay the laundering of its per-thread kernel stack pages by making them pass through the inactive queue first. Instead, immediately place them in the laundry so that they might be cleaned and made available for reclamation sooner. Reviewed by: kib, markj MFC after: 1 week
|
#
9997c048 |
|
01-Jan-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not let vm_daemon run unbounded. On a load where single anonymous object consumes almost all memory on the large system, swapout code executes the iteration over the corresponding object page queue for long time, owning the map and object locks. This blocks pagedaemon which tries to lock the object, and blocks other threads in the process in vm_fault() waiting for the map lock. Handle the issue by terminating the deactivation loop if we executed too long and by yielding at the top level in vm_daemon. Reported by: peterj, pho Reviewed by: alc Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13671
|
#
7000c588 |
|
31-Dec-2017 |
Alan Cox <alc@FreeBSD.org> |
The variable "minslptime" is pointless and always has been, ever since its introduction in r83366. (At that time, this code appeared in vm/vm_glue.c, because vm/vm_swapout.c did not exist.) When the FOREACH_THREAD loop completes, we know that the sleep time for every thread is above whichever threshold is being applied. Reviewed by: kib X-MFC with: r327354
|
#
b6eabc36 |
|
29-Dec-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not lock vm map in swapout_procs(). Neither swapout_procs() nor swapout() access the map. Since the process' vmspace is referenced only to obtain the pointer to the vm_map, the reference is not needed as well. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13681
|
#
e258b4a0 |
|
29-Dec-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Style. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13678
|
#
89c0e67d |
|
28-Dec-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Clean up the comment. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13671
|
#
0080a8fa |
|
28-Dec-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
In vm_swapout_map_deactivate_pages(), it is enough to lock the map for read. Reviewed by: alc, markj (as part of the larger patch) Tested by: pho (again, as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13671
|
#
796df753 |
|
30-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
SPDX: Consider code from Carnegie-Mellon University. Interesting cases, most likely from CMU Mach sources.
|
#
ac04195b |
|
20-Oct-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Move swapout code into vm/vm_swapout.c. There is no NO_SWAPPING #ifdef left in the code. Requested by: alc Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D12663
|