#
272461 |
|
02-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
270920 |
|
01-Sep-2014 |
kib |
Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped wired region. Rework the handling of unwire to do the it in batch, both at pmap and object level.
All commits below are by alc.
MFC r268327: Introduce pmap_unwire().
MFC r268591: Implement pmap_unwire() for powerpc.
MFC r268776: Implement pmap_unwire() for arm.
MFC r268806: pmap_unwire(9) man page.
MFC r269134: When unwiring a region of an address space, do not assume that the underlying physical pages are mapped by the pmap. This fixes a leak of the wired pages on the unwiring of the region mapped with no access allowed.
MFC r269339: In the implementation of the new function pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic.
MFC r269365: Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed by the combination of r268591 and r269134: When we attempt to add the wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing. (They only set the wired attribute on newly created mappings.)
MFC r269433: Handle wiring failures in vm_map_wire() with the new functions pmap_unwire() and vm_object_unwire(). Retire vm_fault_{un,}wire(), since they are no longer used.
MFC r269438: Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable "rv" is uninitialized.
MFC r269485: Retire pmap_change_wiring().
Reviewed by: alc
|
#
270441 |
|
24-Aug-2014 |
kib |
MFC r270038: Complete r254667, do not destroy pmap lock if KVA allocation failed.
|
#
270439 |
|
24-Aug-2014 |
kib |
Merge the changes to pmap_enter(9) for sleep-less operation (requested by flag). The ia64 pmap.c changes are direct commit, since ia64 is removed on head.
MFC r269368 (by alc): Retire PVO_EXECUTABLE.
MFC r269728: Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused).
MFC r269759 (by alc): Update the text of a KASSERT() to reflect the changes in r269728.
MFC r269822 (by alc): Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table page allocation.
MFC r270151 (by alc): Replace KASSERT that no PV list locks are held with a conditional unlock.
Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation
|
#
267964 |
|
27-Jun-2014 |
jhb |
MFC 261781: Don't waste a page of KVA for the boot-time memory test on x86. For amd64, reuse the first page of the crashdumpmap as CMAP1/CADDR1. For i386, remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.
|
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
255724 |
|
20-Sep-2013 |
alc |
The pmap function pmap_clear_reference() is no longer used. Remove it.
pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists.
Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
|
#
255028 |
|
29-Aug-2013 |
alc |
Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2).
Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.
Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise().
Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
|
#
254667 |
|
22-Aug-2013 |
kib |
Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release().
Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
254138 |
|
09-Aug-2013 |
attilio |
The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it.
Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag
The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code.
Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
|
#
254025 |
|
07-Aug-2013 |
jeff |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
#
251703 |
|
13-Jun-2013 |
jeff |
- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()
Discussed with: attilio Sponsored by: EMC / Isilon Storage Division
|
#
250884 |
|
21-May-2013 |
attilio |
o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks.
Sponsored by: EMC / Isilon storage division Reviewed by: alc
|
#
248508 |
|
19-Mar-2013 |
kib |
Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads.
The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag.
When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation.
Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap.
The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests.
Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested.
In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached.
By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions.
Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
|
#
248449 |
|
17-Mar-2013 |
attilio |
Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie.
This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections.
The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed.
The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property.
This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done.
The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped.
Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/
Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)
|
#
248280 |
|
14-Mar-2013 |
kib |
Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified.
The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture.
Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement.
For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking.
Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
|
#
248084 |
|
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
#
247678 |
|
02-Mar-2013 |
attilio |
Fix-up r247622 by also renaming pv_list iterator into the xen pmap verbatim copy.
Sponsored by: EMC / Isilon storage division Reported by: tinderbox
|
#
247400 |
|
27-Feb-2013 |
attilio |
Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions.
Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
|
#
246855 |
|
15-Feb-2013 |
jkim |
Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is no functional change.
|
#
241498 |
|
12-Oct-2012 |
alc |
Replace all uses of the vm page queues lock by a new R/W lock. Unfortunately, this lock cannot be defined as static under Xen because it is (ab)used to serialize queued page table changes.
Tested by: sbruno
|
#
241400 |
|
10-Oct-2012 |
alc |
MFi386 r241356 Add several asserts.
MFC after: 3 days
|
#
241353 |
|
08-Oct-2012 |
alc |
In a few places, like the implementation of ptrace(), a thread may call upon pmap_enter() to create a mapping within a different address space, i.e., not the thread's own address space. On i386, this entails the creation of a temporary mapping to the affected page table page (PTP). In general, pmap_enter() will read from this PTP, allocate a PV entry, and write to this PTP. The trouble comes when the system is short of memory. In order to allocate a new PV entry, an older PV entry has to be reclaimed. Reclaiming a PV entry involves destroying a mapping, which requires access to the affected PTP. Thus, the PTP mapped at the beginning of pmap_enter() is no longer mapped at the end of pmap_enter(), which leads to pmap_enter() modifying the wrong PTP. To address this problem, pmap_pv_reclaim() is changed to use an alternate method of mapping PTPs.
Update a related comment.
Reported by: pho Diagnosed by: kib MFC after: 5 days
|
#
241020 |
|
28-Sep-2012 |
alc |
Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.
|
#
240317 |
|
10-Sep-2012 |
alc |
Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(), pmap_unmapdev()'s own direct efforts to destroy the page table entries are redundant, so eliminate them.
Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS. Setting PTE_W on MIPS is inconsistent with the implementation of this function on other architectures. Moreover, PTE_W should not be set, unless the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}() doesn't do.
MFC after: 10 days
|
#
240126 |
|
05-Sep-2012 |
alc |
Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the comment describing them. Both the function names and the comment had grown stale. Quite some time has passed since these pmap implementations last used the page's hold count to track the number of valid mapping within a page table page. Also, returning TRUE from pmap_unwire_ptp() rather than _pmap_unwire_ptp() eliminates a few instructions from callers like pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used directly by a conditional statement.
|
#
239171 |
|
10-Aug-2012 |
alc |
Eliminate an unnecessary acquisition and release of the page queues lock from pmap_pte(). PT_SET_MA() is not a queued mapping update, but instead an immediate mapping update, so the page queues lock is not required here.
Reviewed by: cperciva
|
#
236534 |
|
04-Jun-2012 |
alc |
Various small changes to PV entry management:
Constify pc_freemask[].
pmap_pv_reclaim() Eliminate "freemask" because it was a pessimization. Add a comment about the resident count adjustment.
free_pv_entry() [i386 only] Merge an optimization from amd64 (r233954).
get_pv_entry() Eliminate the move to tail of the pv_chunk on the global pv_chunks list. (The right strategy needs more thought. Moreover, there were unintended differences between the amd64 and i386 implementation.)
pmap_remove_pages() Eliminate unnecessary ()'s.
|
#
236378 |
|
01-Jun-2012 |
alc |
Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by introducing free_pv_chunk().
|
#
236291 |
|
30-May-2012 |
alc |
Eliminate some purely stylistic differences among the amd64, i386 native, and i386 xen PV entry allocators.
|
#
236243 |
|
29-May-2012 |
alc |
MFi386 pmap r233433 Disable detailed PV entry accounting by default. (A config option for enabling it was already introduced in r233433.)
|
#
236240 |
|
29-May-2012 |
alc |
Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no longer uses the active and inactive paging queues. Instead, the pmap now maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses this list to select pv entries for reclamation.
Note: The old pmap_collect() tried to avoid reclaiming mappings for pages that have either a hold_count or a busy field that is non-zero. However, this isn't necessary for correctness, and the locking in pmap_collect() was insufficient to guarantee that such mappings weren't reclaimed. The new pmap_pv_reclaim() doesn't even try.
Tested by: sbruno MFC after: 5 weeks
|
#
229007 |
|
30-Dec-2011 |
alc |
Merge r216333 and r216555 from the native pmap When r207410 eliminated the acquisition and release of the page queues lock from pmap_extract_and_hold(), it didn't take into account that pmap_pte_quick() sometimes requires the page queues lock to be held. This change reimplements pmap_extract_and_hold() such that it no longer uses pmap_pte_quick(), and thus never requires the page queues lock.
Merge r177525 from the native pmap Prevent the overflow in the calculation of the next page directory. The overflow causes the wraparound with consequent corruption of the (almost) whole address space mapping.
Strictly speaking, r177525 is not required by the Xen pmap because the hypervisor steals the uppermost region of the normal kernel address space. I am nonetheless merging it in order to reduce the number of unnecessary differences between the native and Xen pmap implementations.
Tested by: sbruno
|
#
228935 |
|
28-Dec-2011 |
alc |
Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold(): If the page lock acquisition is retried, then the underlying thread is not unpinned.
Wrap nearby lines that exceed 80 columns.
|
#
228923 |
|
27-Dec-2011 |
alc |
Eliminate many of the unnecessary differences between the native and paravirtualized pmap implementations for i386. This includes some style fixes to the native pmap and several bug fixes that were not previously applied to the paravirtualized pmap.
Tested by: sbruno MFC after: 3 weeks
|
#
228746 |
|
20-Dec-2011 |
alc |
The Xen pmap doesn't support superpages. So, there is no point in it initializing structures, like the pv table, that are only used to implement superpages. In fact, some of the unnecessary code in pmap_init() was actually doing harm. It was preventing the kernel from booting on virtual machines with more than 768 MB of memory.
Tested by: sbruno
|
#
227309 |
|
07-Nov-2011 |
ed |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
#
226843 |
|
27-Oct-2011 |
alc |
Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.
|
#
225418 |
|
06-Sep-2011 |
kib |
Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs.
Document the changes to flags field to only require the page lock.
Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced.
Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
|
#
224746 |
|
09-Aug-2011 |
kib |
- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged.
Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
|
#
223758 |
|
04-Jul-2011 |
attilio |
With retirement of cpumask_t and usage of cpuset_t for representing a mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.
Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).
This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement.
MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
|
#
223732 |
|
02-Jul-2011 |
alc |
When iterating over a paging queue, explicitly check for PG_MARKER, instead of relying on zeroed memory being interpreted as an empty PV list.
Reviewed by: kib
|
#
223677 |
|
29-Jun-2011 |
alc |
Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages.
This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages.
Update all of the existing assertions on pmap_remove_all() to reflect this change.
Reviewed by: kib
|
#
222813 |
|
07-Jun-2011 |
attilio |
etire the cpumask_t type and replace it with cpuset_t usage.
This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being.
Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES.
Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64).
Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC.
People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately.
Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development.
(I really hope I didn't forget anyone, if it happened I apologize in advance).
|
#
217688 |
|
21-Jan-2011 |
pluknet |
Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
|
#
216960 |
|
04-Jan-2011 |
cperciva |
Add hamfisted locking to the Xen/PV pmap code: Only allow one thread to be in {pmap_pinit, pmap_copy, pmap_release} at a time.
This reduces the rate of panics when running 'make index' from ~0.6/hour to ~0.02/hour (p < 10^-30).
At a later date this locking will be removed, and for this reason, it is wrapped in #ifdef HAMFISTED_LOCKING; this temporary hack is being put in place with the intention of shipping somewhat-stable Xen bits in FreeBSD 8.2-RELEASE.
PR: kern/153672 MFC after: 3 days
|
#
216843 |
|
31-Dec-2010 |
cperciva |
Make i386_set_ldt work on i386/XEN, step 1/5.
Lock the vm page queue mutex around calls to pte_store. As with many other uses of the vm page queue mutex in i386/xen/pmap.c, this is bogus and needs to be replaced at some future date by a spin lock dedicated to protecting the queue of pending xen page mapping hypervisor calls. (But for now, bogus locking is better than a panic.)
MFC after: 3 days
|
#
216762 |
|
28-Dec-2010 |
cperciva |
Remove a "not strictly correct" (and panic-inducing) workaround for a bug which doesn't seem to exist.
PR: kern/141328 MFC after: 3 days
|
#
216703 |
|
26-Dec-2010 |
cperciva |
Lock the vm page queue mutex in pmap_pte_release around the call to PMAP_SET_VA; this fixes a mutex-not-held panic when a process which called mlock(2) exits, and parallels a change made in pmap_pte 10 months ago (svn r204160).
Note: The locking in this code is utterly broken. We should not be using the VM page queue mutex to protect the queue of pending Xen page mapping hypervisor calls. Even if it made sense to do so, this commit and r204160 introduce LORs between the vm page queue mutex and PMAP2mutex.
(However, a possible deadlock is better than a guaranteed panic, and this change will hopefully make life easier for whoever fixes the Xen pmap locking in the future.)
PR: kern/140313 MFC after: 3 days
|
#
215844 |
|
25-Nov-2010 |
cperciva |
Revert r215819 and fix the bug properly. In pmap_qremove, paging table updates were being queued by pmap_kremove, but the queue wasn't being flushed; as a result, the updates didn't happen until *after* the call to pmap_invalidate_range, and old entries could stick around in the TLB. Adding a PT_UPDATES_FLUSH() call immediately before pmap_invalidate_range ensures that after the invalidation the TLB will be repopulated with the correct new entries.
Thanks to: kib, avg, alc
|
#
215819 |
|
25-Nov-2010 |
cperciva |
Work around paging bug. Somehow we seem to be ending up with entries in the TLB which don't correspond to ptes with PG_V set; prior to this commit I'm sometimes getting the wrong data when pages are loaded into the buffer cache (they're being loaded, but the missing TLB invalidation is causing the wrong data to be visible).
|
#
215593 |
|
20-Nov-2010 |
cperciva |
Unifdef XEN. This file is only compiled with the XEN kernel option set, and the !XEN bits get in the way of understanding the code.
|
#
215587 |
|
20-Nov-2010 |
cperciva |
Add VTOM(va) macro as xpmap_ptom(VTOP(va)) to convert to machine addresses.
Clean up the code by converting xpmap_ptom(VTOP(...)) to VTOM(...) and converting xpmap_ptom(VM_PAGE_TO_PHYS(...)) to VM_PAGE_TO_MACH(...). In a few places we take advantage of the fact that xpmap_ptom can commute with setting PG_* flags.
This commit should have no net effect save to improve the readability of this code.
|
#
215525 |
|
19-Nov-2010 |
cperciva |
Make pmap_release consistent with pmap_pinit with respect to unpinning pages. The pinning of NPGPTD pages is #if 0ed out in pmap_pinit (I'm not quite sure why...) and this commit adds a corresponding #if 0 in pmap_release to avoid unpinning those pages.
Some versions of Xen seem to silently ignore requests to unpin pages which were never pinned in the first place, but some return an error (causing FreeBSD to panic) prior to this commit.
|
#
215472 |
|
18-Nov-2010 |
cperciva |
Make pmap_release match pmap_pinit by invoking pmap_qremove(pmap->pm_pdpt) to match pmap_pinit's pmap_qenter(pmap->pm_pdpt) call in the case of PAE.
|
#
215470 |
|
18-Nov-2010 |
cperciva |
Don't KASSERT in pmap_release that xpmap_ptom(VM_PAGE_TO_PHYS(m)) == (pmap->pm_pdpt[i] & PG_FRAME) for i = NPGPTD, since pmap->pm_pdpt[i] is only initialized for 0 <= i < NPGPTD.
This fixes an inevitable panic with XEN && PAE && INVARIANTS when pmap_release is called (e.g., when /sbin/init is launched).
|
#
211197 |
|
11-Aug-2010 |
jhb |
Update various places that store or manipulate CPU masks to use cpumask_t instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.
|
#
209048 |
|
11-Jun-2010 |
alc |
Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)
|
#
208990 |
|
10-Jun-2010 |
alc |
Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment.
Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this.
Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue.
Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.)
Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively.
ARM:
Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().
Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases.
PowerPC/AIM:
moea*_page_exits_quick() and moea*_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated.
Push down the page queues lock inside of moea*_clear_bit(), simplifying moea*_clear_modify() and moea*_clear_reference().
The last parameter to moea*_clear_bit() is never used. Eliminate it.
PowerPC/BookE:
Simplify mmu_booke_page_exists_quick()'s control flow.
Reviewed by: kib@
|
#
208667 |
|
31-May-2010 |
alc |
Eliminate a stale comment.
|
#
208657 |
|
30-May-2010 |
alc |
Simplify the inner loop of pmap_collect(): While iterating over the page's pv list, there is no point in checking whether or not the pv list is empty. Instead, wait until the loop completes.
|
#
208651 |
|
30-May-2010 |
alc |
Merge various changes from i386/i386/pmap.c:
The remaining, unmerged portions of r175404 Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel option DIAGNOSTIC still disables inlining of certain pmap functions.)
Eliminate dead code from pmap_enter(). This code implemented an assertion. On i386, an equivalent check is already implemented. However, on amd64, a small change is required to implement an equivalent check.
Eliminate \n from a nearby panic string.
Use KASSERT() to reimplement pmap_copy()'s two assertions.
Merge portions of r177659 To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set.
The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set.
r208609 Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all().
r208645 When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call.
When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called.
Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator.
There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this.
Tidy up the variable declarations at the start of pmap_enter().
|
#
208574 |
|
26-May-2010 |
alc |
Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock.
Assert that the page is managed in pmap_is_referenced().
On powerpc/aim, push down the page queues lock acquisition from moea*_is_modified() and moea*_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock.
Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment.
Correct a spelling error in vm_page_dontneed().
Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable.
Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked.
Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty().
Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions.
Reviewed by: kib
|
#
208504 |
|
24-May-2010 |
alc |
Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore().
Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page.
Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information.
Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock.
Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed().
Reviewed by: kib (an earlier version)
|
#
208175 |
|
16-May-2010 |
alc |
On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures.
On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page.
With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running.
Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
|
#
207796 |
|
08-May-2010 |
alc |
Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write().
Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.)
Switch to a per-processor counter for the total number of pages cached.
|
#
207419 |
|
30-Apr-2010 |
kmacy |
merge 194209 in to the i386/xen pmap
requested by: alc@
|
#
207410 |
|
29-Apr-2010 |
kmacy |
On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
|
#
207262 |
|
27-Apr-2010 |
alc |
MFi386 r207205 Clearing a page table entry's accessed bit (PG_A) and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it.
|
#
207155 |
|
24-Apr-2010 |
alc |
Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
|
#
204160 |
|
20-Feb-2010 |
kmacy |
- fix bootstrap for variable KVA_PAGES - remove unused CADDR1 - hold lock across page table update
MFC after: 3 days
|
#
204041 |
|
18-Feb-2010 |
ed |
Allow the pmap code to be built with GCC from FreeBSD 7 again.
This patch basically gives us the best of both worlds. Instead of forcing the compiler to emulate GNU-style inline semantics even though we're using ISO C99, it will only use GNU-style inlining when the compiler is configured that way (__GNUC_GNU_INLINE__).
Tested by: jhb
|
#
202628 |
|
19-Jan-2010 |
ed |
Recommit r193732:
Remove __gnu89_inline.
Now that we use C99 almost everywhere, just use C99-style in the pmap code. Since the pmap code is the only consumer of __gnu89_inline, remove it from cdefs.h as well. Because the flag was only introduced 17 months ago, I don't expect any problems.
Reviewed by: alc
It was backed out, because it prevented us from building kernels using a 7.x compiler. Now that most people use 8.x, there is nothing that holds us back. Even if people run 7.x, they should be able to build a kernel if they run `make kernel-toolchain' or `make buildworld' first.
|
#
201804 |
|
08-Jan-2010 |
bz |
Unbreak the XEN build after r201751.
|
#
201751 |
|
07-Jan-2010 |
alc |
Make pmap_set_pg() static.
|
#
200346 |
|
10-Dec-2009 |
kmacy |
- revert pmap_kenter_temporary to taking a physical address - make minidump work
|
#
199734 |
|
24-Nov-2009 |
kmacy |
fixup kernel core dumps on paravirtual guests
|
#
199184 |
|
11-Nov-2009 |
avg |
reflect that pg_ps_enabled is a tunable, not just a read-only sysctl
Nod from: jhb
|
#
198341 |
|
21-Oct-2009 |
marcel |
o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked).
The key property of this change is that the I-cache is made coherent *after* writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent *before* any writes happen. The difference is key when the I-cache prefetches.
|
#
197070 |
|
10-Sep-2009 |
jkim |
Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce unnecessary #ifdef's for shared code between them.
|
#
197046 |
|
09-Sep-2009 |
kib |
As was done in r196643 for i386 and amd64, swap the start/end virtual addresses in pmap_invalidate_cache_range().
Reported by: Vincent Hoffman <vince unsane co uk> Reviewed by: jhb MFC after: 3 days
|
#
196734 |
|
01-Sep-2009 |
adrian |
Delete whitespace not in i386/pmap.c
|
#
196728 |
|
01-Sep-2009 |
adrian |
Migrate to use cpuset_t.
|
#
196726 |
|
01-Sep-2009 |
adrian |
Merge in the pat_works work from sys/i386/i386/pmap.c - primarily to reduce diff size.
|
#
196725 |
|
01-Sep-2009 |
adrian |
Fix broken build.
|
#
196723 |
|
31-Aug-2009 |
adrian |
Shuffle pagezero() into the same location as in sys/i386/i386/pmap.c.
|
#
195949 |
|
29-Jul-2009 |
kib |
Fix XEN build breakage, by implementing pmap_invalidate_cache_range() and using it when appropriate. Merge analogue of the r195836 optimization to XEN.
Approved by: re (kensmith)
|
#
195840 |
|
24-Jul-2009 |
jhb |
Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver.
Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
|
#
195774 |
|
19-Jul-2009 |
alc |
Change the handling of fictitious pages by pmap_page_set_memattr() on amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev().
The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless.
Add pmap_page_set_memattr() to the Xen pmap.
Approved by: re (kib)
|
#
195649 |
|
12-Jul-2009 |
alc |
Add support to the virtual memory system for configuring machine- dependent memory attributes:
Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior.
Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages.
Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager:
kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386.
vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes.
Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping.
Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7.
In collaboration with: jhb
Approved by: re (kib)
|
#
195385 |
|
05-Jul-2009 |
alc |
PAE adds another level to the i386 page table. This level is a small 4-entry table that must be located within the first 4GB of RAM. This requirement is met by defining an UMA zone with a custom back-end allocator function. This revision makes two changes to this back-end allocator function: (1) It replaces the use of contigmalloc() with the use of kmem_alloc_contig(). This eliminates "double accounting", i.e., accounting by both the UMA zone and malloc tags. (I made the same change for the same reason to the zones supporting jumbo frames a week ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK, M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it. pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the moment, this is harmless only because the default behavior of contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init() doesn't really depend on the memory being zeroed.
The back-end allocator function in the Xen pmap is dead code. I am changing it nonetheless because I don't want to leave any "bad examples" in the source tree for someone to copy at a later date.
Approved by: re (kib)
|
#
193734 |
|
08-Jun-2009 |
ed |
Revert my change; reintroduce __gnu89_inline.
It turns out our compiler in stable/7 can't build this code anymore. Even though my opinion is that those people should just run `make kernel-toolchain' before building a kernel, I am willing to wait and commit this after we've branched stable/8.
Requested by: rwatson
|
#
193732 |
|
08-Jun-2009 |
ed |
Remove __gnu89_inline.
Now that we use C99 almost everywhere, just use C99-style in the pmap code. Since the pmap code is the only consumer of __gnu89_inline, remove it from cdefs.h as well. Because the flag was only introduced 17 months ago, I don't expect any problems.
Reviewed by: alc
|
#
190627 |
|
01-Apr-2009 |
dfr |
Fix the Xen build for i386 PV mode.
|
#
188617 |
|
14-Feb-2009 |
alc |
Remove unnecessary page queues locking around vm_page_wakeup().
Approved by: kmacy
|
#
188341 |
|
08-Feb-2009 |
kmacy |
Don't try to directly update page tables
|
#
186557 |
|
29-Dec-2008 |
kmacy |
merge 186535, 186537, and 186538 from releng_7_xen
Log: - merge in latest xenbus from dfr's xenhvm - fix race condition in xs_read_reply by converting tsleep to mtx_sleep
Log: unmask evtchn in bind_{virq, ipi}_to_irq
Log: - remove code for handling case of not being able to sleep - eliminate tsleep - make sleeps atomic
|
#
183342 |
|
25-Sep-2008 |
kmacy |
Make nkpt dependent on the size of the initial memory allocation
MFC after: 1 month
|
#
182902 |
|
10-Sep-2008 |
kmacy |
Get initial bootstrap of APs working under xen. Note that the APs still blow up in sched_throw().
MFC after: 1 month
|
#
181946 |
|
21-Aug-2008 |
kmacy |
Fix boot time pmap_growkernel panic for case where vm is allocated >= 768M
MFC after: 1 month
|
#
181808 |
|
17-Aug-2008 |
kmacy |
translate machine addresses to physical addresses in new code in pmap_init
MFC after: 1 month
|
#
181747 |
|
15-Aug-2008 |
kmacy |
Compile fixes for xen build.
MFC after: 1 month.
|
#
181641 |
|
12-Aug-2008 |
kmacy |
Import i386 xen sub-arch files.
MFC after: 2 weeks
|