Cross Reference: /freebsd-10.0-release/sys/powerpc/aim/mmu

History log of /freebsd-10.0-release/sys/powerpc/aim/mmu_oea64.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 259065	07-Dec-2013	gjb	- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1 [1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.0-release /freebsd-10.0-release/sys/conf/newvers.sh /freebsd-10.0-release/sys/sys/param.h
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255724	20-Sep-2013	alc	The pmap function pmap_clear_reference() is no longer used. Remove it. pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
# 255503	12-Sep-2013	nwhitehorn	Change VM object lock assertion to match locking higher in the call chain. This repairs a panic observed during pageout on some 64-bit PowerPC systems. Submitted by: grehan Approved by: re (kib) MFC after: 2 weeks Revisit after: 10.0
# 255418	09-Sep-2013	nwhitehorn	Add POWER CPUs to the kernel's knowledge. This does not imply we currently actually run on any machines with POWER CPUs but avoids closing that door unnecessarily. Approved by: re (kib)
# 254667	22-Aug-2013	kib	Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
# 254138	09-Aug-2013	attilio	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
# 254025	07-Aug-2013	jeff	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
# 253272	12-Jul-2013	nwhitehorn	Fix check: bitwise and has only one &. MFC after: 1 week
# 250884	21-May-2013	attilio	o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
# 250747	17-May-2013	alc	Relax the object locking assertion in pmap_enter_locked(). Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
# 248508	19-Mar-2013	kib	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
# 248280	14-Mar-2013	kib	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
# 248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
# 247400	27-Feb-2013	attilio	Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
# 247297	25-Feb-2013	attilio	Merge from vmobj-rwlock branch: Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h. Sponsored by: EMC / Isilon storage division Tested by: pho Reviewed by: alc
# 243040	14-Nov-2012	kib	Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks
# 241020	28-Sep-2012	alc	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.
# 238357	10-Jul-2012	alc	Avoid recursion on the pvh global lock in the aim oea pmap. Correct the return type of the pmap_ts_referenced() implementations. Reported by: jhibbits [1] Tested by: andreast
# 236019	25-May-2012	raj	Fix physical address type to vm_paddr_t also for powerpc64.
# 235689	20-May-2012	nwhitehorn	Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies range operations like pmap_remove() and pmap_protect() as well as allowing simple operations like pmap_extract() not to involve any global state. This substantially reduces lock coverages for the global table lock and improves concurrency.
# 234576	22-Apr-2012	nwhitehorn	Avoid a lock order reversal in pmap_extract_and_hold() from relocking the page. This PMAP requires an additional lock besides the PMAP lock in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release. Suggested by: kib MFC after: 4 days
# 234156	11-Apr-2012	nwhitehorn	We don't need kcopy() in any of the remaining places it is used, so remove it. MFC after: 2 weeks
# 234155	11-Apr-2012	nwhitehorn	Only manipulate the PGA_EXECUTABLE flag on managed pages. This is a proxy for whether the page is physical. On dense phys mem systems (32-bit), VM_PHYS_TO_PAGE will not return NULL for device memory pages if device memory is above physical memory even if there is no allocated vm_page. Attempting to use the returned page could then cause either memory corruption or a page fault.
# 233957	06-Apr-2012	nwhitehorn	Substantially reduce the scope of the locks held in pmap_enter(), which improves concurrency slightly.
# 233949	06-Apr-2012	nwhitehorn	Reduce the frequency that the PowerPC/AIM pmaps invalidate instruction caches, by invalidating kernel icaches only when needed and not flushing user caches for shared pages. Suggested by: kib MFC after: 2 weeks
# 233618	28-Mar-2012	nwhitehorn	More PMAP performance improvements: skip 256 MB segments entirely if they are are not mapped during ranged operations and reduce the scope of the tlbie lock only to the actual tlbie instruction instead of the entire sequence. There are a few more optimization possibilities here as well.
# 233530	26-Mar-2012	nwhitehorn	Make sure to call vm_page_dirty() before the pmap lock is released to prevent a race where another process could conclude the page was clean. Submitted by: alc
# 233529	26-Mar-2012	nwhitehorn	More PMAP concurrency improvements: replace the table lock and (almost) all uses of the page queues mutex with a new rwlock that protects the page table and the PV lists. This reduces system time during a parallel buildworld by 35%. Reviewed by: alc
# 233436	24-Mar-2012	nwhitehorn	Only call vm_page_dirty() on pages that are writable in order not to confuse the VM.
# 233434	24-Mar-2012	nwhitehorn	Following suggestions from alc, skip wired mappings in pmap_remove_pages() and remove moea64_attr_*() in favor of direct calls to vm_page_dirty() and friends.
# 233117	18-Mar-2012	nwhitehorn	Remove acquisition of VM page queues lock from pmap_protect(). Any actual manipulation of the pvo_vlink and pvo_olink entries is already protected by the table lock, so most remaining instances of the acquisition of the page queues lock can likely be replaced with the table lock, or removed if the table lock is already held. Reviewed by: alc
# 233017	15-Mar-2012	nwhitehorn	Implement pmap_remove_pages(). This will be added later to the 32-bit MMU module. Suggested by: alc
# 233011	15-Mar-2012	nwhitehorn	Improve algorithm for deciding whether to loop through all process pages or look them up individually in pmap_remove() and apply the same logic in the other ranged operation (pmap_protect). This speeds up make installworld by a factor of 2 on powerpc64. MFC after: 1 week
# 232980	14-Mar-2012	nwhitehorn	Use LIST_FOREACH_SAFE() instead of LIST_FOREACH() in pmap_remove(), since the point of this loop is to remove elements. This worked by accident before. MFC after: 2 days
# 230779	30-Jan-2012	kib	Fix build for the case of powerpc64 kernel without COMPAT_FREEBSD32. MFC after: 2 months
# 230767	30-Jan-2012	kib	Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported, and some variants of powerpc32. MFC after: 2 months (if ever)
# 228522	15-Dec-2011	alc	Eliminate vestiges of page coloring.
# 228412	11-Dec-2011	nwhitehorn	Keep track of PVO entries in each pmap, which allows much faster pmap_remove() for large sparse requests. This can prevent pmap_remove() operations on 64-bit process destruction or swapout that would take several hundred times the lifetime of the universe to complete. This behavior is largely indistinguishable from a hang.
# 225418	06-Sep-2011	kib	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
# 224746	09-Aug-2011	kib	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
# 223758	04-Jul-2011	attilio	With retirement of cpumask_t and usage of cpuset_t for representing a mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient. Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))). This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement. MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
# 223471	23-Jun-2011	andreast	Fix merge typo.
# 222813	07-Jun-2011	attilio	etire the cpumask_t type and replace it with cpuset_t usage. This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being. Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES. Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64). Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC. People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately. Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development. (I really hope I didn't forget anyone, if it happened I apologize in advance).
# 222666	04-Jun-2011	nwhitehorn	Fix a typo derived from a mismerge from mmu_oea that would cause pmap_sync_icache() to sync random (possibly uncached or nonexisting!) memory, causing kernel page faults or machine checks, most easily triggered by using GDB. While here, add an additional safeguard to only sync cacheable memory. MFC after: 2 days
# 222614	02-Jun-2011	nwhitehorn	Remove some dead code: unnecessary isyncs and memory sorting, which are handled in mtmsr() and mem_regions(), respectively.
# 221981	16-May-2011	nwhitehorn	Remove a useless check that served only to make 64-bit PPC systems unbootable after r221855. Submitted by: andreast MFC after: 1 week
# 220642	14-Apr-2011	andreast	Adjust debugging string to match the actual function. Approved by: nwhitehorn (mentor)
# 220639	14-Apr-2011	andreast	The macro MOEA_PVO_CHECK is empty and not used. It is a left over from the NetBSD import. Remove the definition and all its occurrences. Approved by: nwhitehorn (mentor)
# 217688	21-Jan-2011	pluknet	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
# 217341	13-Jan-2011	nwhitehorn	Fix handling of NX pages on capable CPUs. Thanks to kib for prodding me in the right direction.
# 216563	19-Dec-2010	nwhitehorn	Garbage-collect unused variable.
# 216383	11-Dec-2010	nwhitehorn	Add some isync()s related to the 64-bit MMU scratch page to avoid race conditions on its invalidation.
# 216174	04-Dec-2010	nwhitehorn	Add an abstraction layer to the 64-bit AIM MMU's page table manipulation logic to support modifying the page table through a hypervisor. This uses KOBJ inheritance to provide subclasses of the base 64-bit AIM MMU class with additional methods for page table manipulation. Many thanks to Peter Grehan for suggesting this design and implementing the MMU KOBJ inheritance mechanism.
# 215163	12-Nov-2010	nwhitehorn	Remove use of a separate ofw_pmap on 32-bit CPUs. Many Open Firmware mappings need to end up in the kernel anyway since the kernel begins executing in OF context. Separating them adds needless complexity, especially since the powerpc64 and mmu_oea64 code gave up on it a long time ago. As a side effect, the PPC ofw_machdep code is no longer AIM-specific, so move it to powerpc/ofw.
# 215160	12-Nov-2010	nwhitehorn	Remove or conditionalize some hypervisor-unfriendly instruction sequences.
# 215159	12-Nov-2010	nwhitehorn	Add some platform KOBJ extensions and continue integrating PowerPC hypervisor infrastructure support: - Fix coexistence of multiple platform modules in the same kernel - Allow platform modules to provide an SMP topology - PowerPC hypervisors limit the amount of memory accessible in real mode. Allow the platform modules to specify the maximum real-mode address, and modify the bits of the kernel that need to allocate real-mode-accessible buffers to respect this limits.
# 215158	12-Nov-2010	nwhitehorn	Fix an error in r215067. An existing /chosen/mmu but missing translations property just means we shouldn't add any translations, not that we should panic.
# 215067	09-Nov-2010	nwhitehorn	Make AIM early-boot code function correctly without Open Firmware.
# 214617	01-Nov-2010	alc	Implement pmap_is_prefaultable(). Reviewed by: nwhitehorn
# 213407	04-Oct-2010	nwhitehorn	Follow exactly the steps in architecture manual for correctly invalidating TLB entries instead of trying to cut corners.
# 213335	01-Oct-2010	nwhitehorn	Fix pmap_page_set_memattr() behavior in the presence of fictitious pages by just caching the mode for later use by pmap_enter(), following amd64. While here, correct some mismerges from mmu_oea64 -> mmu_oea and clean up some dead code found while fixing the fictitious page behavior.
# 213307	30-Sep-2010	nwhitehorn	Add support for memory attributes (pmap_mapdev_attr() and friends) on PowerPC/AIM. This is currently stubbed out on Book-E, since I have no idea how to implement it there.
# 212722	16-Sep-2010	nwhitehorn	Split the SLB mirror cache into two kinds of object, one for kernel maps which are similar to the previous ones, and one for user maps, which are arrays of pointers into the SLB tree. This changes makes user SLB updates atomic, closing a window for memory corruption. While here, rearrange the allocation functions to make context switches faster.
# 212715	15-Sep-2010	nwhitehorn	Replace the SLB backing store splay tree used on 64-bit PowerPC AIM hardware with a lockless sparse tree design. This marginally improves the performance of PMAP and allows copyin()/copyout() to run without acquiring locks when used on wired mappings. Submitted by: mdf
# 212627	14-Sep-2010	grehan	Introduce inheritance into the PowerPC MMU kobj interface. include/mmuvar.h - Change the MMU_DEF macro to also create the class definition as well as define the DATA_SET. Add a macro, MMU_DEF_INHERIT, which has an extra parameter specifying the MMU class to inherit methods from. Update the comments at the start of the header file to describe the new macros. booke/pmap.c aim/mmu_oea.c aim/mmu_oea64.c - Collapse mmu_def_t declaration into updated MMU_DEF macro The MMU_DEF_INHERIT macro will be used in the PS3 MMU implementation to allow it to inherit the stock powerpc64 MMU methods. Reviewed by: nwhitehorn
# 212363	09-Sep-2010	nwhitehorn	Reorder statistics tracking and table lock acquisitions already in place to avoid race conditions updating the PVO statistics.
# 212331	08-Sep-2010	nwhitehorn	Fix a printf specifier on 64-bit systems.
# 212322	08-Sep-2010	nwhitehorn	Fix a typo in the original import of this code from NetBSD that caused the wrong element of the VSID bitmap array to be examined after a collision, leading to reallocation of in-use VSIDs under some circumstances, with attendant memory corruption. Also add an assert to check for this kind of problem in the future. MFC after: 4 days
# 212308	07-Sep-2010	nwhitehorn	Fix an error made in r209975 related to context ID allocation for 64-bit PowerPC CPUs running a 32-bit kernel. This bug could cause in-use VSIDs to be allocated again to another process, causing memory space overlaps and corruption. Reported by: linimon
# 212044	31-Aug-2010	nwhitehorn	Missed one place the SLB lock should be held in r211967.
# 211967	29-Aug-2010	nwhitehorn	Avoid a race in the allocation of new segment IDs that could result in memory corruption on heavily loaded SMP systems. MFC after: 2 weeks
# 210704	31-Jul-2010	nwhitehorn	Improve hash coverage for kernel page table entries by modifying the kernel ESID -> VSID map function. This makes ZFS run stably on PowerPC under heavy loads (repeated simultaneous SVN checkouts and updates).
# 209975	13-Jul-2010	nwhitehorn	MFppc64: Kernel sources for 64-bit PowerPC, along with build-system changes to keep 32-bit kernels compiling (build system changes for 64-bit kernels are coming later). Existing 32-bit PowerPC kernel configurations must be updated after this change to specify their architecture.
# 209048	11-Jun-2010	alc	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)
# 208990	10-Jun-2010	alc	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@
# 208810	05-Jun-2010	alc	Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.
# 208574	26-May-2010	alc	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
# 208504	24-May-2010	alc	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
# 208175	16-May-2010	alc	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
# 207796	08-May-2010	alc	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
# 207410	29-Apr-2010	kmacy	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
# 207155	24-Apr-2010	alc	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
# 205370	20-Mar-2010	nwhitehorn	Revisit locking in the 64-bit AIM PMAP. The PVO head for a page is generally protected by the VM page queue mutex. Instead of extending the table lock to cover the PVO heads, add some asserts that the page queue mutex is in fact held. This fixes several LORs and possible deadlocks. This also adds an optimization to moea64_kextract() useful for direct-mapped quantities, like UMA buffers. Being able to use this from inside UMA removes an additional LOR.
# 205163	14-Mar-2010	nwhitehorn	Fix two small bugs. The PowerPC 970 does not support non-coherent memory access, and reflects this by autonomously writing LPTE_M into PTE entries. As such, we should not panic if LPTE_M changes by itself. While here, fix a harmless typo in moea64_sync_icache().
# 204719	04-Mar-2010	nwhitehorn	Fix an obvious lock escape and fix a typo in a comment.
# 204694	04-Mar-2010	nwhitehorn	Patch some more concurrency issues here. This expands the page table lock to cover the PVOs, and removes the scratchpage PTEs from the PVOs entirely to avoid the system trying to be helpful and rewriting them.
# 204297	25-Feb-2010	nwhitehorn	Move the OEA64 scratchpage to the end of KVA from the beginning, and set its PVO to map physical address 0 instead of kernelstart. This fixes a situation in which a user process could attempt to return this address via KVM, have it fault while being modified, and then panic the kernel because (a) it is supposed to map a valid address and (b) it lies in the no-fault region between VM_MIN_KERNEL_ADDRESS and virtual_avail. While here, move msgbuf and dpcpu make into regular KVA space for consistency with other implementations.
# 204296	25-Feb-2010	nwhitehorn	Provide an implementation of pmap_dev_direct_mapped() on OEA64. This is required in order to be able to mmap the running kernel, which is turn required to avoid fstat returning gibberish.
# 204269	23-Feb-2010	nwhitehorn	Use dcbz instead of word stores for page zeroing, providing a factor of 3-4 speedup.
# 204268	23-Feb-2010	nwhitehorn	Close a race involving the OEA64 scratchpage. When the scratch page's physical address is changed, there is a brief window during which its PTE is invalid. Since moea64_set_scratchpage_pa() does not and cannot hold the page table lock, it was possible for another CPU to insert a new PTE into the scratch page's PTEG slot during this interval, corrupting both mappings. Solve this by creating a new flag, LPTE_LOCKED, such that moea64_pte_insert will avoid claiming locked PTEG slots even if they are invalid. This change also incorporates some additional paranoia added to solve things I thought might be this bug. Reported by: linimon
# 204128	20-Feb-2010	nwhitehorn	Reduce KVA pressure on OEA64 systems running in bridge mode by mapping UMA segments at their physical addresses instead of into KVA. This emulates the direct mapping behavior of OEA32 in an ad-hoc way. To make this work properly required sharing the entire kernel PMAP with Open Firmware, so ofw_pmap is transformed into a stub on 64-bit CPUs. Also implement some more tweaks to get more mileage out of our limited amount of KVA, principally by extending KVA into segment 16 until the beginning of the first OFW mapping. Reported by: linimon
# 204042	18-Feb-2010	nwhitehorn	Fix a bug where pages being removed from memory entirely no longer have PVOs, and so the modified state of the page can no longer be communicated to the VM layer, causing pages not to be flushed to swap when needed, in turn causing memory corruption. Also make several correctness adjustments to I-Cache synchronization and TLB invalidation for 64-bit Book-S CPUs. Obtained from: projects/ppc64 Discussed with: grehan MFC after: 2 weeks
# 201758	07-Jan-2010	mbr	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week
# 199226	12-Nov-2009	nwhitehorn	Provide a real fix to the too-many-translations problem when booting from CD on 64-bit hardware to replace existing band-aids. This occurred when the preloaded mdroot required too many mappings for the static buffer. Since we only use the translations buffer once, allocate a dynamic buffer on the stack. This early in the boot process, the call chain is quite short and we can be assured of having sufficient stack space. Reviewed by: grehan
# 199108	09-Nov-2009	nwhitehorn	Spell sz correctly. Pointed out by: jmallett
# 199084	09-Nov-2009	nwhitehorn	Increase the size of the OFW translations buffer to handle G5 systems that use many translation regions in firmware, and add bounds checking to prevent buffer overflows in case even the new value is exceeded. Reported by: Jacob Lambert MFC after: 3 days
# 198400	23-Oct-2009	nwhitehorn	Do not map the trap vectors into the kernel's address space. They are only used in real mode and keeping them mapped only serves to make NULL a valid address, which results in silent NULL pointer deferences. Suggested by: Patrick Kerharo Obtained from: projects/ppc64
# 198378	23-Oct-2009	nwhitehorn	Add SMP support on U3-based G5 systems. This does not yet work perfectly: at least on my Xserve, getting the decrementer and timebase on APs to tick requires setting up a clock chip over I2C, which is not yet done. While here, correct the 64-bit tlbie function to set the CPU to 64-bit mode correctly. Hardware donated by: grehan
# 198341	21-Oct-2009	marcel	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.
# 195632	12-Jul-2009	nwhitehorn	Increase the size of the page table on 64-bit PowerPC machines as a bandaid to prevent exhaustion of the primary and secondary hash groups in the event of extreme stress on the PMAP layer (e.g. a forkbomb). This wastes memory, and should be revised to properly handle PTEG spills instead. Suggested by: grehan Approved by: re (kensmith)
# 194784	23-Jun-2009	jeff	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
# 192067	13-May-2009	nwhitehorn	Factor out platform dependent things unrelated to device drivers into a new platform module. These are probed in early boot, and have the responsibility of determining the layout of physical memory, determining the CPU timebase frequency, and handling the zoo of SMP mechanisms found on PowerPC. Reviewed by: marcel, raj Book-E parts by: raj
# 190681	03-Apr-2009	nwhitehorn	Add support for 64-bit PowerPC CPUs operating in the 64-bit bridge mode provided, for example, on the PowerPC 970 (G5), as well as on related CPUs like the POWER3 and POWER4. This also adds support for various built-in hardware found on Apple G5 hardware (e.g. the IBM CPC925 northbridge). Reviewed by: grehan