Cross Reference: /freebsd-10.1-release/sys/sparc64/sparc64/pmap.c

History log of /freebsd-10.1-release/sys/sparc64/sparc64/pmap.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 272461	02-Oct-2014	gjb	Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.1-release
# 270920	01-Sep-2014	kib	Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped wired region. Rework the handling of unwire to do the it in batch, both at pmap and object level. All commits below are by alc. MFC r268327: Introduce pmap_unwire(). MFC r268591: Implement pmap_unwire() for powerpc. MFC r268776: Implement pmap_unwire() for arm. MFC r268806: pmap_unwire(9) man page. MFC r269134: When unwiring a region of an address space, do not assume that the underlying physical pages are mapped by the pmap. This fixes a leak of the wired pages on the unwiring of the region mapped with no access allowed. MFC r269339: In the implementation of the new function pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic. MFC r269365: Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed by the combination of r268591 and r269134: When we attempt to add the wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing. (They only set the wired attribute on newly created mappings.) MFC r269433: Handle wiring failures in vm_map_wire() with the new functions pmap_unwire() and vm_object_unwire(). Retire vm_fault_{un,}wire(), since they are no longer used. MFC r269438: Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable "rv" is uninitialized. MFC r269485: Retire pmap_change_wiring(). Reviewed by: alc
# 270441	24-Aug-2014	kib	MFC r270038: Complete r254667, do not destroy pmap lock if KVA allocation failed.
# 270439	24-Aug-2014	kib	Merge the changes to pmap_enter(9) for sleep-less operation (requested by flag). The ia64 pmap.c changes are direct commit, since ia64 is removed on head. MFC r269368 (by alc): Retire PVO_EXECUTABLE. MFC r269728: Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused). MFC r269759 (by alc): Update the text of a KASSERT() to reflect the changes in r269728. MFC r269822 (by alc): Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table page allocation. MFC r270151 (by alc): Replace KASSERT that no PV list locks are held with a conditional unlock. Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255724	20-Sep-2013	alc	The pmap function pmap_clear_reference() is no longer used. Remove it. pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
# 255426	09-Sep-2013	jhb	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
# 255028	29-Aug-2013	alc	Significantly reduce the cost, i.e., run time, of calls to madvise(..., MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
# 254667	22-Aug-2013	kib	Revert r254501. Instead, reuse the type stability of the struct pmap which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
# 254649	22-Aug-2013	kib	Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9). The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation
# 254138	09-Aug-2013	attilio	The soft and hard busy mechanism rely on the vm object lock to work. Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
# 254025	07-Aug-2013	jeff	Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
# 253994	06-Aug-2013	marius	Add MD (for now) atomic_store_acq_<type>() and use it in pmap_activate() to get the semantics when setting the PMAP right. Prior to r251782, the latter already used implicit acquire semantics, which - currently - means to not employ additional explicit memory barriers under the hood (see also r225889).
# 253940	04-Aug-2013	attilio	Remove unused member. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
# 251782	15-Jun-2013	ed	Stick to using the documented atomic(9) API. The atomic_store_ptr() function is not part of the atomic(9) API. We only provide a version with a release barrier.
# 250884	21-May-2013	attilio	o Relax locking assertions for vm_page_find_least() o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
# 250747	17-May-2013	alc	Relax the object locking assertion in pmap_enter_locked(). Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
# 248508	19-Mar-2013	kib	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
# 248280	14-Mar-2013	kib	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
# 248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
# 247400	27-Feb-2013	attilio	Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
# 243132	16-Nov-2012	kib	Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h to vm/vm_phys.h, where it belongs. Requested and reviewed by: alc MFC after: 2 weeks
# 242534	03-Nov-2012	attilio	Rework the known rwlock to benefit about staying on their own cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris
# 241020	28-Sep-2012	alc	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.
# 239079	05-Aug-2012	marius	Merge r236494 from x86: Isolate the global TTE list lock from data and other locks to prevent false sharing within the cache. MFC after: 3 days
# 237623	27-Jun-2012	alc	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.
# 236214	29-May-2012	alc	Replace all uses of the vm page queues lock by a r/w lock that is private to this pmap.c. This new r/w lock is used primarily to synchronize access to the TTE lists. However, it will be used in a somewhat unconventional way. As finer-grained TTE list locking is added to each of the pmap functions that acquire this r/w lock, its acquisition will be changed from write to read, enabling concurrent execution of the pmap functions with finer-grained locking. Reviewed by: attilio Tested by: flo MFC after: 10 days
# 230634	27-Jan-2012	marius	Commit file missed in r230633.
# 226054	06-Oct-2011	marius	- Use atomic operations rather than sched_lock for safely assigning pm_active and pc_pmap for SMP. This is key to allowing adding support for SCHED_ULE. Thanks go to Peter Jeremy for additional testing. - Add support for SCHED_ULE to cpu_switch(). Committed from: 201110DevSummit
# 225901	01-Oct-2011	marius	Remove obsolete macros.
# 225841	28-Sep-2011	kib	Remove locking of the vm page queues from several pmaps, which only protected the dirty mask updates. The dirty mask updates are handled by atomics after the r225840. Submitted by: alc Tested by: flo (sparc64) MFC after: 2 weeks
# 225675	19-Sep-2011	attilio	It is safe to initialize locks even on early boot (and it is the same thing all the other architectures already do) thus just initialize kernel_pmap in pmap_bootstrap(). Reported by: alc Reviewed by: alc, marius Tested by: flo, marius Approved by: re (kib) MFC after: 1 week
# 225418	06-Sep-2011	kib	Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
# 224746	09-Aug-2011	kib	- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
# 223800	05-Jul-2011	marius	- pmap_cache_remove() and pmap_protect_tte() are only used within pmap.c so static'ize them. - Correct a typo.
# 223798	05-Jul-2011	marius	In pmap_remove_all() assert that the page is neither fictitious nor unmanaged as also done on other architectures. Reviewed by: alc
# 223795	05-Jul-2011	marius	Call pmap_qremove() before freeing or unwiring the pages, otherwise there's a window during which a page can be re-used before its previous mapping is removed. Reviewed by: alc MFC after: 1 week
# 223719	02-Jul-2011	marius	- For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from the TLBs in order to get rid of the user mappings but instead traverse them an flush only the latter like we also do for the Spitfire-class. Also flushing the unlocked kernel entries can cause instant faults which when called from within cpu_switch() are handled with the scheduler lock held which in turn can cause timeouts on the acquisition of the lock by other CPUs. This was easily seen with a 16-core V890 but occasionally also happened with 2-way machines. While at it, move the SPARC64-V support code entirely to zeus.c. This causes a little bit of duplication but is less confusing than partially using Cheetah-class bits for these. - For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024- entry, 2-way set associative TLB. - In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back. Tested by: Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)
# 223377	21-Jun-2011	marius	On machines where we don't need to lock the kernel TSB into the dTLB and thus may basically use the entire 64-bit kernel address space increase the kernel virtual memory to not be limited by VM_KMEM_SIZE_MAX.
# 223347	20-Jun-2011	marius	As astopgap minimize the sched_lock coverage in pmap_activate() in order to reduce lock contention.
# 223346	20-Jun-2011	marius	- Remove MD usage of pc_cpumask and pc_other_cpus. [1] - Remove CTASSERTs which no longer need to hold since r222813. Submitted by: attilio [1]
# 222813	07-Jun-2011	attilio	etire the cpumask_t type and replace it with cpuset_t usage. This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being. Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES. Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64). Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC. People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately. Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development. (I really hope I didn't forget anyone, if it happened I apologize in advance).
# 222531	31-May-2011	nwhitehorn	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb
# 220939	22-Apr-2011	marius	Correct spelling in comments. Submitted by: brucec
# 219608	13-Mar-2011	marius	Remove the advertising clause from the UCB license according to the July 22, 1999 addendum.
# 218457	08-Feb-2011	marius	Take advantage of accessing the kernel TSB via ASI_ATOMIC_QUAD_LDD_PHYS on SPARC64-V, too. Tested by: Michael Moll
# 217688	21-Jan-2011	pluknet	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
# 217514	17-Jan-2011	marius	In order to save instructions the MMU trap handlers assumed that the kernel TSB is located within the 32-bit address space, which held true as long as we were using virtual addresses magic-mapped before the location of the kernel for addressing it. However, with r216803 in place when possible we address it via its physical address instead, which on machines like Sun Fire V880 have no physical memory in the 32-bit address space at all requires to use 64-bit addressing. When using physical addressing it still should be safe to assume that we can just ignore the lowest 10 bits of the address as a minor optimization as we did before r216803.
# 216803	29-Dec-2010	marius	On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS, which takes an physical address instead of an virtual one, for loading TTEs of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB, which only has a very limited number of lockable dTLB slots. The net result is that we now basically can handle a kernel TSB of any size and no longer need to limit the kernel address space based on the number of dTLB slots available for locked entries. Consequently, other parts of the trap handlers now also only access the the kernel TSB via its physical address in order to avoid nested traps, as does the PMAP bootstrap code as we haven't taken over the trap table at that point, yet. Apart from that the kernel TSB now is accessed via a direct mapping when we are otherwise taking advantage of ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this is implemented by extending the patching of the TSB addresses and mask as well as the ASIs used to load it into the trap table so the runtime overhead of this change is rather low. Currently the use of ASI_ATOMIC_QUAD_LDD_PHYS is not yet enabled on SPARC64 CPUs due to lack of testing and due to the fact it might require minor adjustments there. Theoretically it should be possible to use the same approach also for the user TSB, which already is not locked into the dTLB, avoiding nested traps. However, for reasons I don't understand yet OpenSolaris only does that with SPARC64 CPUs. On the other hand I think that also addressing the user TSB physically and thus avoiding nested traps would get us closer to sharing this code with sun4v, which only supports trap level 0 and 1, so eventually we could have a single kernel which runs on both sun4u and sun4v (as does Linux and OpenBSD). Developed at and committed from: 27C3
# 214879	06-Nov-2010	marius	Implement pmap_is_prefaultable(). Reviewed by: alc (with bugfix)
# 214528	29-Oct-2010	marius	- When resetting pm_active and pm_context of a pmap in pmap_pinit() we need locking as otherwise we may race against the other parts of the MD code which expects a consistent state of these. While at it move the resetting of the pmap before entering it in the TSB. - Spell a 0 as TLB_CTX_KERNEL.
# 211568	21-Aug-2010	marius	Skip a KASSERT which isn't appropriate when not employing page coloring. Reported by: Michael Moll
# 211049	07-Aug-2010	marius	For CPUs which ignore TD_CV and support hardware unaliasing don't bother doing page coloring. This results in a small but measurable performance improvement in buildworld times.
# 210334	21-Jul-2010	attilio	KTR_CTx are long time aliased by existing classes so they can't serve their purpose anymore. Axe them out. Sponsored by: Sandvine Incorporated Discussed with: jhb, emaste Possible MFC: TBD
# 209048	11-Jun-2010	alc	Relax one of the new assertions in pmap_enter() a little. Specifically, allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)
# 208990	10-Jun-2010	alc	Reduce the scope of the page queues lock and the number of PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea_page_exits_quick() and moea_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea_clear_bit(), simplifying moea_clear_modify() and moea_clear_reference(). The last parameter to moea_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@
# 208846	05-Jun-2010	alc	Don't set PG_WRITEABLE in pmap_enter() unless the page is managed. Correct a typo in a nearby comment on sparc64.
# 208574	26-May-2010	alc	Push down page queues lock acquisition in pmap_enter_object() and pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea_is_modified() and moea_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
# 208504	24-May-2010	alc	Roughly half of a typical pmap_mincore() implementation is machine- independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
# 208175	16-May-2010	alc	On entry to pmap_enter(), assert that the page is busy. While I'm here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
# 207796	08-May-2010	alc	Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
# 207702	06-May-2010	alc	Push down the page queues lock inside of vm_page_free_toq() and pmap_page_is_mapped() in preparation for removing page queues locking around calls to vm_page_free(). Setting aside the assertion that calls pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page queues lock just long enough to actually add or remove the page from the paging queues. Update vm_page_unhold() to reflect the above change.
# 207649	05-May-2010	alc	Use an OBJT_PHYS object and thus PG_UNMANAGED pages to implement the TSB. The TSB is not a pageable structure, so there is no point in using managed pages. Reviewed by: kib
# 207537	02-May-2010	marius	Add support for SPARC64 V (and where it already makes sense for other HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the MMU and cache handling, it doesn't add pmap optimizations possible with these CPU, yet, though. With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250 and likely also other models based on SPARC64 V like 450, 650 and 850. Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.
# 207410	29-Apr-2010	kmacy	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
# 207373	29-Apr-2010	alc	MFamd64/i386 r207205 Clearing a page table entry's accessed bit and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it. Moreover, on ia64, don't set the page's dirty field unless pmap_protect() is removing write access.
# 207155	24-Apr-2010	alc	Resurrect pmap_is_referenced() and use it in mincore(). Essentially, pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
# 205399	20-Mar-2010	marius	Improve the KVA space sizing of 186682; on machines with large dTLBs we can actually use all of the available lockable entries of the tiny dTLB for the kernel TSB. With this change the KVA space sizing happens to be more in line with the MI one so up to at least 24GB machines KVA doesn't need to be limited manually. This is just another stopgap though, the real solution is to take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs providing it so we don't need to lock the kernel TSB pages into the dTLB in the first place.
# 205258	17-Mar-2010	marius	- Add TTE and context register bits for the additional page sizes supported by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs. - Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK which just is the complement of TLB_CXR_CTX_MASK instead of trying to assemble it from the page size bits which vary across CPUs. - Add macros for the remainder of the SFSR bits, which are useful for at least debugging purposes.
# 204152	20-Feb-2010	marius	Some machines can not only consist of CPUs running at different speeds but also of different types, f.e. Sun Fire V890 can be equipped with a mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization and different workarounds for model specific errata. Therefore move the CPU implementation number from a global variable to the per-CPU data. Functions which are called before the latter is available are passed the implementation number as a parameter now.
# 203839	13-Feb-2010	marius	Style fixes
# 198341	21-Oct-2009	marcel	o Introduce vm_sync_icache() for making the I-cache coherent with the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent after writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent before any writes happen. The difference is key when the I-cache prefetches.
# 195840	24-Jul-2009	jhb	Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
# 195149	28-Jun-2009	marius	- Work around the broken loader behavior of not demapping no longer used kernel TLB slots when unloading the kernel or modules, which results in havoc when loading a kernel and modules which take up less TLB slots afterwards as the unused but locked ones aren't accounted for in virtual_avail. Eventually this should be fixed in the loader which isn't straight forward though and the kernel should be robust against this anyway. [1] - Ensure that the addresses allocated directly from phys_avail[] by pmap_bootstrap_alloc() are always colored properly. This implicit assumption was broken in r194784 as unlike the other consumers the DPCPU area allocated for the BSP isn't a multiple of PAGE_SIZE * DCACHE_COLORS. [2] - Remove the no longer used global msgbuf_phys. - Remove the redundant ekva parameter of pmap_bootstrap_alloc(). - Correct some outdated function names in ktr(9) invocations. Requested by: jhb [1] Reported by: gavin [2] Approved by: re (kib) MFC after: 2 weeks
# 194858	24-Jun-2009	kib	Unbreak sparc64 after the swap accounting changes: mark kernel_map entries allocated for translations in pmap_init() as MAP_NOFAULT. This prevents vm_map_insert from trying to account the entries for swap usage, that is both wrong and too early to work. While there, change FALSE to VMFS_NO_SPACE. Reported and tested by: Florian Smeets <flo at kasimir com> Reviewed by: marius
# 194784	23-Jun-2009	jeff	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
# 186682	01-Jan-2009	marius	- Currently the PMAP code is laid out to let the kernel TSB cover the whole KVA space using one locked 4MB dTLB entry per GB of physical memory. On Cheetah-class machines only the dt16 can hold locked entries though, which would be completely consumed for the kernel TSB on machines with >= 16GB. Therefore limit the KVA space to use no more than half of the lockable dTLB slots, given that we need them also for other things. - Add sanity checks which ensure that we don't exhaust the (lockable) TLB slots.
# 182878	08-Sep-2008	marius	For cheetah-class CPUs ensure that the dt512_0 is set to hold 8k pages for all three contexts and configure the dt512_1 to hold 4MB pages for them (e.g. for direct mappings). This might allow for additional optimization by using the faulting page sizes provided by AA_DMMU_TAG_ACCESS_EXT for bypassing the page size walker for the dt512 in the superpage support code. Submitted by: nwhitehorn (initial patch)
# 182877	08-Sep-2008	marius	USIII and beyond CPUs have stricter requirements when it comes to synchronization needed after stores to internal ASIs in order to make side-effects visible. This mainly requires the MEMBAR #Sync after such stores to be replaced with a FLUSH. We use KERNBASE as the address to FLUSH as it is guaranteed to not trap. Actually, the USII synchronization rules also already require a FLUSH in pretty much all of the cases changed. We're also hitting an additional USIII synchronization rule which requires stores to AA_IMMU_SFSR to be immediately followed by a DONE, FLUSH or RETRY. Doing so triggers a RED state exception though so leave the MEMBAR #Sync. Linux apparently also has gotten away with doing the same for quite some time now, apart from the fact that it's not clear to me why we need to clear the valid bit from the SFSR in the first place. Reviewed by: nwhitehorn
# 182767	04-Sep-2008	marius	The physical address space of cheetah-class CPUs has been extended to 43 bits so update TD_PA_BITS accordingly. For the most part this increase is transparent to the existing code except for when reading the physical address from ASI_{D,I}TLB_DATA_ACCESS_REG, which we only do in the loader and which was already adjusted in r182478, or from the OFW translations node. While at it, ensure we are only taking valid OFW mapping entries into account.
# 181701	13-Aug-2008	marius	cosmetic changes and style fixes
# 179081	18-May-2008	alc	Retire pmap_addr_hint(). It is no longer used.
# 178893	09-May-2008	alc	Add a stub for pmap_align_superpage() on machines that don't (yet) implement pmap-level support for superpages.
# 176994	09-Mar-2008	marius	- Do as the comment in pmap_bootstrap() suggests and flush all non-locked TLB entries possibly left over by the firmware and also do so while bootstrapping APs. - Use __FBSDID. MFC after: 1 month
# 175067	03-Jan-2008	alc	Add an access type parameter to pmap_enter(). It will be used to implement superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.
# 174933	27-Dec-2007	alc	Update two tracepoints, i.e., CTRx() invocations, to reflect the demise of page coloring a few months ago.
# 173708	17-Nov-2007	alc	Prevent the leakage of wired pages in the following circumstances: First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove(). Reviewed by: tegge MFC after: 6 weeks
# 173361	05-Nov-2007	kib	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
# 172466	07-Oct-2007	alc	Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed. Approved by: re (kensmith) PR: 116794
# 171488	18-Jul-2007	jeff	- Remove the global definition of sched_lock in mutex.h to break new code and third party modules which try to depend on it. - Initialize sched_lock in sched_4bsd.c. - Declare sched_lock in sparc64 pmap.c and assert that we're compiling with SCHED_4BSD to prevent accidental crashes from running ULE. This is the sole remaining file outside of the scheduler that uses the global sched_lock. Approved by: re
# 170249	03-Jun-2007	alc	Prepare for the new physical memory allocator: Change the way that the physical page's color is obtained. Approved by: re
# 170170	31-May-2007	attilio	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
# 169805	20-May-2007	jeff	- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument. Suggested by: julian@ Contributed by: attilio@
# 169667	18-May-2007	jeff	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
# 164229	12-Nov-2006	alc	Make pmap_enter() responsible for setting PG_WRITEABLE instead of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)
# 162544	22-Sep-2006	alc	The fix in revision 1.152 converted in the wrong direction. Fix a typo in a comment. Submitted by: Michael Plass
# 161028	06-Aug-2006	alc	Eliminate the unnecessary acquisition and release of the page queues lock from pmap_pinit().
# 160889	01-Aug-2006	alc	Complete the transition from pmap_page_protect() to pmap_remove_write(). Originally, I had adopted sparc64's name, pmap_clear_write(), for the function that is now pmap_remove_write(). However, this function is more like pmap_remove_all() than like pmap_clear_modify() or pmap_clear_reference(), hence, the name change. The higher-level rationale behind this change is described in src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm trying to clean up and fix our support for execute access. Reviewed by: marcel@ (ia64)
# 159627	14-Jun-2006	ups	Remove mpte optimization from pmap_enter_quick(). There is a race with the current locking scheme and removing it should have no measurable performance impact. This fixes page faults leading to panics in pmap_enter_quick_locked() on amd64/i386. Reviewed by: alc,jhb,peter,ps
# 159303	05-Jun-2006	alc	Introduce the function pmap_enter_object(). It maps a sequence of resident pages from the same object. Use it in vm_map_pmap_enter() to reduce the locking overhead of premapping objects. Reviewed by: tegge@
# 159031	29-May-2006	alc	MFalpha/amd64/arm/ia64 Retire pmap_track_modified(). We no longer need it because we do not create managed mappings within the clean submap. To prevent regressions, add assertions blocking the creation of managed mappings within the clean submap.
# 157443	03-Apr-2006	peter	Remove the unused sva and eva arguments from pmap_remove_pages().
# 153958	01-Jan-2006	scottl	Use the correct units when handling the hw.physmem tunable.
# 152630	20-Nov-2005	alc	Eliminate pmap_init2(). It's no longer used.
# 152224	09-Nov-2005	alc	Reimplement the reclamation of PV entries. Specifically, perform reclamation synchronously from get_pv_entry() instead of asynchronously as part of the page daemon. Additionally, limit the reclamation to inactive pages unless allocation from the PV entry zone or reclamation from the inactive queue fails. Previously, reclamation destroyed mappings to both inactive and active pages. get_pv_entry() still, however, wakes up the page daemon when reclamation occurs. The reason being that the page daemon may move some pages from the active queue to the inactive queue, making some new pages available to future reclamations. Print the "reclaiming PV entries" message at most once per minute, but don't stop printing it after the fifth time. This way, we do not give the impression that the problem has gone away. Reviewed by: tegge
# 149768	03-Sep-2005	alc	Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine whether the mapping should permit execute access.
# 147217	10-Jun-2005	alc	Introduce a procedure, pmap_page_init(), that initializes the vm_page's machine-dependent fields. Use this function in vm_pageq_add_new_page() so that the vm_page's machine-dependent and machine-independent fields are initialized at the same time. Remove code from pmap_init() for initializing the vm_page's machine-dependent fields. Remove stale comments from pmap_init(). Eliminate the Boolean variable pmap_initialized from the alpha, amd64, i386, and ia64 pmap implementations. Its use is no longer required because of the above changes and earlier changes that result in physical memory that is being mapped at initialization time being mapped without pv entries. Tested by: cognet, kensmith, marcel
# 146726	28-May-2005	alc	pmap_enter() no longer requires Giant. Therefore, stop acquiring and releasing it in pmap_enter_quick(). MFC after: 3 weeks
# 142869	01-Mar-2005	alc	Use the kernel pmap's lock to guarantee that only one thread at a time is using either pmap_temp_map_1 or pmap_temp_map_2. Tested by: kris@
# 141370	05-Feb-2005	alc	Acquire the source pmap's lock in pmap_copy().
# 139825	07-Jan-2005	imp	/* -> /*- for license, minor formatting changes
# 139241	23-Dec-2004	alc	Modify pmap_enter_quick() so that it expects the page queues to be locked on entry and it assumes the responsibility for releasing the page queues lock if it must sleep. Remove a bogus comment from pmap_enter_quick(). Using the first change, modify vm_map_pmap_enter() so that the page queues lock is acquired and released once, rather than each time that a page is mapped.
# 138897	15-Dec-2004	alc	In the common case, pmap_enter_quick() completes without sleeping. In such cases, the busying of the page and the unlocking of the containing object by vm_map_pmap_enter() and vm_fault_prefault() is unnecessary overhead. To eliminate this overhead, this change modifies pmap_enter_quick() so that it expects the object to be locked on entry and it assumes the responsibility for busying the page and unlocking the object if it must sleep. Note: alpha, amd64, i386 and ia64 are the only implementations optimized by this change; arm, powerpc, and sparc64 still conservatively busy the page and unlock the object within every pmap_enter_quick() call. Additionally, this change is the first case where we synchronize access to the page's PG_BUSY flag and busy field using the containing object's lock rather than the global page queues lock. (Modifications to the page's PG_BUSY flag and busy field have asserted both locks for several weeks, enabling an incremental transition.)
# 138697	11-Dec-2004	alc	Pass VM_ALLOC_NOBUSY to vm_page_grab() so that we don't have to call vm_page_flag_clear(PG_BUSY). The object lock is held the entire time. Thus, whether or not the PG_BUSY flag is set is invisible to others.
# 137168	03-Nov-2004	alc	The synchronization provided by vm object locking has eliminated the need for most calls to vm_page_busy(). Specifically, most calls to vm_page_busy() occur immediately prior to a call to vm_page_remove(). In such cases, the containing vm object is locked across both calls. Consequently, the setting of the vm page's PG_BUSY flag is not even visible to other threads that are following the synchronization protocol. This change (1) eliminates the calls to vm_page_busy() that immediately precede a call to vm_page_remove() or functions, such as vm_page_free() and vm_page_rename(), that call it and (2) relaxes the requirement in vm_page_remove() that the vm page's PG_BUSY flag is set. Now, the vm page's PG_BUSY flag is set only when the vm object lock is released while the vm page is still in transition. Typically, this is when it is undergoing I/O.
# 133663	13-Aug-2004	alc	Add pmap locking to pmap_remove_all().
# 133451	10-Aug-2004	alc	Add pmap locking to many of the functions. Implement the protection check required by the pmap_extract_and_hold() specification. Remove the acquisition and release of Giant from pmap_extract_and_hold() and pmap_protect(). Many thanks to Ken Smith for resolving a sparc64-specific initialization problem in my original patch. Tested by: kensmith@
# 133143	04-Aug-2004	alc	- Push down the acquisition and release of Giant into pmap_enter_quick() on those architectures without pmap locking. - Eliminate the acquisition and release of Giant in vm_map_pmap_enter().
# 132899	30-Jul-2004	alc	- Push down the acquisition and release of Giant into pmap_protect() on those architectures without pmap locking. - Eliminate the acquisition and release of Giant from vm_map_protect(). (Translation: mprotect(2) runs to completion without touching Giant on alpha, amd64, i386 and ia64.)
# 132575	23-Jul-2004	alc	Use kmem_alloc_nofault() rather than kmem_alloc_pageable() for allocating KVA for explicitly managed mappings, i.e., mappings created with pmap_qenter().
# 132220	15-Jul-2004	alc	Push down the acquisition and release of the page queues lock into pmap_protect() and pmap_remove(). In general, they require the lock in order to modify a page's pv list or flags. In some cases, however, pmap_protect() can avoid acquiring the lock.
# 129749	26-May-2004	tmm	Move the per-CPU vmspace pointer fixup that is required before a struct vmspace is freed from cpu_sched_exit() to pmap_release(). This has the advantage of being able to rely on MI code to decide when a free should occur, instead of having to inspect the reference count ourselves. At the same time, turn the per-CPU vmspace pointer into a pmap pointer, so that pmap_release() can deal with pmaps exclusively. Reviewed (and embrassing bug spotted) by: jake
# 129068	09-May-2004	alc	Correct the implementation of pmap_page_is_mapped(): It should return TRUE only if the page has one or more managed mappings.
# 129053	08-May-2004	alc	Since revision 1.280 of vm/vm_page.c, vm_page_grab() always returns a zeroed page when passed VM_ALLOC_ZERO. Thus, we can eliminate the check against PG_ZERO from pmap_pinit().
# 128103	11-Apr-2004	alc	Remove avail_end. It is not used.
# 127875	05-Apr-2004	alc	Remove avail_start on those platforms that no longer use it. (Only amd64 does anything with it beyond simple initialization.)
# 127869	04-Apr-2004	alc	Remove unused arguments from pmap_init().
# 126728	07-Mar-2004	alc	Retire pmap_pinit2(). Alpha was the last platform that used it. However, ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps, there has been no reason for having this function on Alpha. Briefly, when pmap_growkernel() relied upon the list of all processes to find and update the various pmaps to reflect a growth in the kernel's valid address space, pmap_init2() served to avoid a race between pmap initialization and pmap_growkernel(). Specifically, pmap_pinit2() was responsible for initializing the kernel portions of the pmap and pmap_pinit2() was called after the process structure contained a pointer to the new pmap for use by pmap_growkernel(). Thus, an update to the kernel's address space might be applied to the new pmap unnecessarily, but an update would never be lost.
# 120722	03-Oct-2003	alc	Migrate pmap_prefault() into the machine-independent virtual memory layer. A small helper function pmap_is_prefaultable() is added. This function encapsulate the few lines of pmap_prefault() that actually vary from machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have much in common. Going forward, it's worth considering their merger.
# 120534	27-Sep-2003	alc	Add vm object locking to pmap_release().
# 120287	20-Sep-2003	jake	Remove an invalid KASSERT. Apparently pmap_remove_all gets called on unmanaged pages.
# 119999	12-Sep-2003	alc	Add a new parameter to pmap_extract_and_hold() that is needed to eliminate Giant from vmapbuf(). Idea from: tegge
# 119869	08-Sep-2003	alc	Introduce a new pmap function, pmap_extract_and_hold(). This function atomically extracts and holds the physical page that is associated with the given pmap and virtual address. Such a function is needed to make the memory mapping optimizations used by, for example, pipes and raw disk I/O MP-safe. Reviewed by: tegge
# 119164	20-Aug-2003	alc	Lock the pmap's tsb object when performing vm_page_grab() on it.
# 118239	30-Jul-2003	peter	Deal with 'options KSTACK_PAGES' being a global option.
# 118217	30-Jul-2003	tmm	Return 1 from pmap_protect_tte() instead of 0. When used with tsb_foreach(), 0 signals to terminate the tsb traversal, so when tsb_foreach() was used in pmap_protect() (which only happens when the area to be protected is larger than PMAP_TSB_THRESH = 16MB), only the first tsb entry in the specified range would be protected. Reported by: Andrew Belashov <bel@orel.ru>
# 117294	06-Jul-2003	alc	MFi386 Updates to cnt.v_wire_count, the global count of wired pages, should be performed using atomic ops.
# 117206	03-Jul-2003	alc	Background: pmap_object_init_pt() premaps the pages of a object in order to avoid the overhead of later page faults. In general, it implements two cases: one for vnode-backed objects and one for device-backed objects. Only the device-backed case is really machine-dependent, belonging in the pmap. This commit moves the vnode-backed case into the (relatively) new function vm_map_pmap_enter(). On amd64 and i386, this commit only amounts to code rearrangement. On alpha and ia64, the new machine independent (MI) implementation of the vnode case is smaller and more efficient than their pmap-based implementations. (The MI implementation takes advantage of the fact that objects in -CURRENT are ordered collections of pages.) On sparc64, pmap_object_init_pt() hadn't (yet) been implemented.
# 117045	29-Jun-2003	alc	- Export pmap_enter_quick() to the MI VM. This will permit the implementation of a largely MI pmap_object_init_pt() for vnode-backed objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64 and powerpc. - Correct a mismatch between pmap_object_init_pt()'s prototype and its various implementations. (I plan to keep pmap_object_init_pt() as the MD hook for device-backed objects on i386 and amd64.) - Correct an error in ia64's pmap_enter_quick() and adjust its interface to match the other versions. Discussed with: marcel
# 116543	18-Jun-2003	jake	Ignore fake ttes in pmap_copy, its too hard to deal with them not having a real vm_page right now. This fixes a panic when processes with resident device mappings fork, such as the X server.
# 116508	17-Jun-2003	jake	Handle recursion on the vm_page_queue_mtx manually in pmap_qenter and pmap_qremove, in order to avoid making the mutex recursable. Discussed with: alc
# 116420	15-Jun-2003	jake	The page queue lock is already held in pmap_remove, change acquire/release to assertion of ownership. Serves me right for not booting a witness kernel.
# 116417	15-Jun-2003	jake	- Mirror vm_page_queue_mtx assertions added to the i386 pmap. - Add vm page queue locking in certain places that are only needed on sparc64. This should make pmap_qenter and pmap_qremove MP-safe. Discussed with: alc
# 116355	14-Jun-2003	alc	Migrate the thread stack management functions from the machine-dependent to the machine-independent parts of the VM. At the same time, this introduces vm object locking for the non-i386 platforms. Two details: 1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The different machine-dependent implementations used various combinations of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set KSTACK_GUARD_PAGES to 0. 2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In 5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed to vm_page_alloc() or vm_page_grab().
# 116328	14-Jun-2003	alc	Move the _new_altkstack() and _dispose_altkstack() functions out of the various pmap implementations into the machine-independent vm. They were all identical.
# 113453	13-Apr-2003	jake	- Move the routine for flushing all user mappings from the tlb from pmap to the cpu dependent files. It will need to be done differently for USIII. - Simplify the logic for detecting context rollovers. Instead of dealing with it when the next context switch would cause the context numbers to rollover, deal with it when they actually do rollover. - Move some things around in cpu_switch so that we only do 1 membar #Sync when switching address space, instead of 2. - Detect kernel threads by comparing the new vm space to vmspace0, instead if checking if the tlb context is 0. - Removed some debug code.
# 113238	08-Apr-2003	jake	Use vm_paddr_t for physical addresses.
# 113172	06-Apr-2003	jake	Remove a largely useless statistic (its kept elsewhere too).
# 113166	06-Apr-2003	jake	Use the vis block copy/zero functions for pmap_copy_page and pmap_zero_page. These are called through function pointers so that different implementations can be provided for cheetah, where the block load instructions may or may not be a win, and so they can be disabled with the machdep.use_vis tunable. In terms of raw bandwidth the integer versions are faster, but not allocating lines in the L2 cache for useless data gives a measurable improvement in user time for the benchmarks I tested (mostly buildworld with -j8). As far as I can tell the instructions used are implemented on everything back to UltraSPARC I, so there should not be a problem with different cpu types.
# 113165	06-Apr-2003	jake	Ignore attempts to pmap_kremove or pmap_qremove pages which do not have a valid mapping. This is bug for bug compatible with other platforms.
# 112879	31-Mar-2003	jake	- Allow the physical memory size that will be actually used by the kernel to be overridden by setting hw.physmem. - Fix a vm_map_find arg, we don't want to find space. - Add tracing and statistics for off colored pages. - Detect "stupid" pmap_kenters (same virtual and physical as existing mapping), and do nothing in that case.
# 112697	27-Mar-2003	jake	Handle the fictitious pages created by the device pager. For fictitious pages which represent actual physical memory we must strip off the fake page in order to allow illegal aliases to be detected. Otherwise we map uncacheable in the virtual and physical caches and set the side effect bit, as is required for mapping device memory. This fixes gstat on sparc64, which wants to mmap kernel memory through a character device.
# 112330	17-Mar-2003	jake	Ensure that kstack0 has physical colour equal to virtual colour, so that illegal aliases will not be created in the data cache if its accessed through another such mapping.
# 111462	25-Feb-2003	mux	Cleanup of the d_mmap_t interface. - Get rid of the useless atop() / pmap_phys_address() detour. The device mmap handlers must now give back the physical address without atop()'ing it. - Don't borrow the physical address of the mapping in the returned int. Now we properly pass a vm_offset_t * and expect it to be filled by the mmap handler when the mapping was successful. The mmap handler must now return 0 when successful, any other value is considered as an error. Previously, returning -1 was the only way to fail. This change thus accidentally fixes some devices which were bogusly returning errno constants which would have been considered as addresses by the device pager. - Garbage collect the poorly named pmap_phys_address() now that it's no longer used. - Convert all the d_mmap_t consumers to the new API. I'm still not sure wheter we need a __FreeBSD_version bump for this, since and we didn't guarantee API/ABI stability until 5.1-RELEASE. Discussed with: alc, phk, jake Reviewed by: peter Compile-tested on: LINT (i386), GENERIC (alpha and sparc64) Runtime-tested on: i386
# 108700	05-Jan-2003	jake	- Reorganize PMAP_STATS to scale a little better. - Add some more stats for things that are now considered interesting.
# 108360	28-Dec-2002	alc	Hold the page queues lock around calls to vm_page_flag_clear() and vm_page_wakeup().
# 108344	28-Dec-2002	alc	Use VM_ALLOC_WIRED in pmap_pinit().
# 108301	26-Dec-2002	jake	- Use direct mapped addresses for the message buffer, for the crash dump mappings, and for pmap_map which is used to map the vm_page structures. - Don't allocate kva space for any of the above.
# 108245	23-Dec-2002	jake	- Change the way the direct mapped region is implemented to be generally useful for accessing more than 1 page of contiguous physical memory, and to use 4mb tlb entries instead of 8k. This requires that the system only use the direct mapped addresses when they have the same virtual colour as all other mappings of the same page, instead of being able to choose the colour and cachability of the mapping. - Adapt the physical page copying and zeroing functions to account for not being able to choose the colour or cachability of the direct mapped address. This adds a lot more cases to handle. Basically when a page has a different colour than its direct mapped address we have a choice between bypassing the data cache and using physical addresses directly, which requires a cache flush, or mapping it at the right colour, which requires a tlb flush. For now we choose to map the page and do the tlb flush. This will allows the direct mapped addresses to be used for more things that don't require normal pmap handling, including mapping the vm_page structures, the message buffer, temporary mappings for crash dumps, and will provide greater benefit for implementing uma_small_alloc, due to the much greater tlb coverage.
# 108193	22-Dec-2002	jake	- Rearrange pmap_bootstrap slightly to be more in dependency order. - Put the kernel tsb before before the kernel load address, below VM_MIN_KERNEL_ADDRESS, instead of after the kernel where it consumes usable kva. This is magic mapped so the virtual address is irrelevant, it just needs to be out of the way.
# 108166	21-Dec-2002	jake	- Add a pmap pointer to struct md_page, and use this to find the pmap that a mapping belongs to by setting it in the vm_page_t structure that backs the tsb page that the tte for a mapping is in. This allows the pmap that a mapping belongs to to be found without keeping a pointer to it in the tte itself. - Remove the pmap pointer from struct tte and use the space to make the tte pv lists doubly linked (TAILQs), like on other architectures. This makes entering or removing a mapping O(1) instead of O(n) where n is the number of pmaps a page is mapped by (including kernel_pmap). - Use atomic ops for setting and clearing bits in the ttes, now that they return the old value and can be easily used for this purpose. - Use __builtin_memset for zeroing ttes instead of bzero, so that gcc will inline it (4 inline stores using %g0 instead of a function call). - Initially set the virtual colour for all the vm_page_ts to be equal to their physical colour. This will be more useful once uma_small_alloc is implemented, but basically pages with virtual colour equal to phsyical colour are easier to handle at the pmap level because they can be safely accessed through cachable direct virtual to physical mappings with that colour, without fear of causing illegal dcache aliases. In total these changes give a minor performance improvement, about 1% reduction in system time during buildworld.
# 108157	21-Dec-2002	jake	Make pmap_qenter and pmap_qremove look more like the other pmaps.
# 108155	21-Dec-2002	jake	Removed unused pmap_qenter_flags.
# 108140	20-Dec-2002	jake	Add page queue locking around functions that call vm_page_flag_set. This fixes a failed assertion early in boot on sparc64. Reported by: Roderick van Domburg <r.s.a.vandomburg@student.utwente.nl>
# 106994	16-Nov-2002	jake	MFi386 r1.369. Clear the PG_WRITEABLE flag in pmap_clear_write; return immediately if its already clear. Suggested by: alc
# 106838	13-Nov-2002	alc	Move pmap_collect() out of the machine-dependent code, rename it to reflect its new location, and add page queue and flag locking. Notes: (1) alpha, i386, and ia64 had identical implementations of pmap_collect() in terms of machine-independent interfaces; (2) sparc64 doesn't require it; (3) powerpc had it as a TODO.
# 105531	20-Oct-2002	tmm	Add kernel dump support, based on the ia64 version (which was committed as sparc64/sparc64/dump_machdep.c a while back). Other than ia64 (which uses ELF), sparc64 uses a homegrown format for the dumps (headers are required because the physical address and size of the tsb must be noted, and because physical memory may be discontiguous); ELF would not offer any advantages here. Reviewed by: jake
# 105501	20-Oct-2002	alc	- Lock page queue accesses in pmap_release().
# 105346	17-Oct-2002	tmm	When entering the firmware mappings into the kernel tlb, clear all 'soft' bits that might be set in the firmware tte data field, and set the soft flag TD_EXEC to mark the page executable. Failing to do the latter would cause fatal instruction faults in the prom in certain situations. Reviewed by: jake
# 104354	02-Oct-2002	scottl	Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates. Reviewed by: jake, peter, jhb
# 104271	01-Oct-2002	jake	Get rid of the TODO macro in the few places that still need work; either comment it out or change to explicit panics. It conflicts with things like #if TODO in drivers.
# 102399	25-Aug-2002	alc	o Retire pmap_pageable(). It's an advisory routine that none of our platforms implements.
# 102040	18-Aug-2002	jake	Add pmap support for user mappings of multiple page sizes (super pages). This supports all hardware page sizes (8K, 64K, 512K, 4MB), but only 8k pages are actually used as of yet.
# 101958	15-Aug-2002	jake	Use symbolic constants instead of magic address constants.
# 101956	15-Aug-2002	jake	Removed unneeded pmap_initialized flag.
# 101898	15-Aug-2002	jake	Store the number of itlb and dtlb entries separately; they may be different. Find the prom node for the boot cpu earlier and store it in the per-cpu area, so that cache_init can be called earlier.
# 101870	14-Aug-2002	jake	Set kernel_vm_end. Panic if we try to grow the kernel.
# 101653	10-Aug-2002	jake	Auto size available kernel virtual address space based on phsyical memory size. This avoids blowing out kva in kmeminit() on large memory machines (4 gigs or more). Reviewed by: tmm
# 101642	10-Aug-2002	alc	o Remove the setting and clearing of the PG_MAPPED flag. (This flag is obsolete.)
# 101346	04-Aug-2002	alc	o Don't set PG_MAPPED or PG_WRITEABLE when a page is mapped using pmap_kenter() or pmap_qenter(). o Use VM_ALLOC_WIRED in pmap_new_thread().
# 101121	31-Jul-2002	jake	Modify the cache handling code to assume 2 virtual colours, which is much simpler and easier to get right. Add comments. Add more statistic gathering on cacheable and uncacheable mappings.
# 100830	28-Jul-2002	jake	Fix a bug introduced in previous commit. Due to the interaction of the direct physical mappings with virtual page colour, we need to flush the data cache when a page changes colour. I missed one case which broke pipes.
# 100771	27-Jul-2002	jake	Implement a direct mapped address region, like alpha and ia64. This basically maps all of physical memory 1:1 to a range of virtual addresses outside of normal kva. The advantage of doing this instead of accessing phsyical addresses directly is that memory accesses will go through the data cache, and will participate in the normal cache coherency algorithm for invalidating lines in our own and in other cpus' data caches. So we don't have to flush the cache manually or send IPIs to do so on other cpus. Also, since the mappings never change, we don't have to flush them from the tlb manually. This makes pmap_copy_page and pmap_zero_page MP safe, allowing the idle zero proc to run outside of giant. Inspired by: ia64
# 100718	26-Jul-2002	jake	Remove the tlb argument to tlb_page_demap (itlb or dtlb), in order to better match the pmap_invalidate api.
# 99999	14-Jul-2002	alc	o Lock page queue accesses by vm_page_wire().
# 99927	13-Jul-2002	alc	o Complete the locking of page queue accesses by vm_page_unwire(). o Assert that the page queues lock is held in vm_page_unwire(). o Make vm_page_lock_queues() and vm_page_unlock_queues() visible to kernel loadable modules.
# 99571	08-Jul-2002	peter	Add a special page zero entry point intended to be called via the single threaded VM pagezero kthread outside of Giant. For some platforms, this is really easy since it can just use the direct mapped region. For others, IPI sending is involved or there are other issues, so grab Giant when needed. We still have preemption issues to deal with, but Alan Cox has an interesting suggestion on how to minimize the problem on x86. Use Luigi's hack for preserving the (lack of) priority. Turn the idle zeroing back on since it can now actually do something useful outside of Giant in many cases.
# 99559	07-Jul-2002	peter	Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/ pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code and use a single equivalent MI version. There are other cleanups needed still. While here, use the UMA zone hooks to keep a cache of preinitialized proc structures handy, just like the thread system does. This eliminates one dependency on 'struct proc' being persistent even after being freed. There are some comments about things that can be factored out into ctor/dtor functions if it is worth it. For now they are mostly just doing statistics to get a feel of how it is working.
# 99556	07-Jul-2002	peter	Fix (s/proc/thread/) some typos in two panic messages.
# 99422	04-Jul-2002	peter	Back out proc part of last commit. UMA manages the thread cache only, and we just have to deal with the kstack when told to. We do not have a UMA-managed cache for the proc struct and its associated upage yet. So, go back to the old lazy mechanism. Note that if UMA destroys pages that used to contain proc structures, we'll lose the corresponding upage forever. (zones never did this - once a page was allocated, it stayed attached to the proc zone forever)
# 99420	04-Jul-2002	peter	Take a shot at implementing changes from i386/pmap.c rev 1.328-1.331.
# 99025	29-Jun-2002	jake	Fix a deletion during traversal tailq bug.
# 98813	25-Jun-2002	jake	pmap_kremove can no longer be used to remove the magic device mappings installed with pmap_kenter_flags, since the physical addresses may not have an associated vm_page. Add a function to do this. Tested by: Tomi Vainio <Tomi.Vainio@Sun.COM>
# 98651	22-Jun-2002	jake	Fix a bug related to marking pages virtually uncacheable due to illegal dcache aliasing. A page that already had more than 1 mapping of the same virtual colour would not be correctly uncached. Noticed by: Artur Grabowski <art@openbsd.org>
# 98350	17-Jun-2002	jake	Add constants for the min and max prom addresses. Use these instead of magic numbers. Use stxa_sync instead of stxa; membar #Sync; to ensure that no instruction is placed between the two. This can cause random corruption even though interrupts are already disabled.
# 98031	08-Jun-2002	jake	Fix bizarre SMP problems. The secondary cpus sometimes start up with junk in their tlb which the prom doesn't clear out, so we have to do so manually before mapping the kernel page table or the cpu can hang due various conditions which cause undefined behaviour from the tlb.
# 97449	29-May-2002	jake	Add an MD page flag for tracking if a page is cacheable or not, so that we don't flush all mappings of a physical page in order to make it virtually cachable again, if it is already cachable.
# 97447	29-May-2002	jake	Merge the code in pv.c into pmap.c directly. Place all page mappings onto the pv lists in the vm_page, even unmanaged kernel mappings. This is so that the virtual cachability of these mappings can be tracked when a page is mapped to more than one virtual address. All virtually cachable mappings of a physical page must have the same virtual colour, or illegal alises can be created in the data cache. This is a bit tricky because we still have to recognize managed and unmanaged mappings, even though they are all on the pv lists.
# 97446	29-May-2002	jake	Add pv list linkage and a pmap pointer to struct tte. Remove separately allocated pv entries and use the linkage in the tte for pv operations.
# 97445	29-May-2002	jake	Use a contrived 'tlb_entry' structure for passing the mappings for the kernel text and data from the loader to the kernel, so that the tte format is not part of the loader->kernel ABI.
# 97444	29-May-2002	jake	Remove pmap.pm_pvlist and make the functions that use it no-ops. These are all optimizations for architectures which have large sparse page tables, and/or can't put the pv linkage inside of the page table entries.
# 97030	21-May-2002	jake	Rewrite pmap_enter to avoid copying ttes in all cases. Pass the tte data to tsb_tte_enter instead of a whole tte, also to avoid copying.
# 97027	20-May-2002	jake	Redefine the tte accessor macros to take a pointer to a tte, instead of the value of the tag or data field. Add macros for getting the page shift, size and mask for the physical page that a tte maps (which may be one of several sizes). Use the new cache functions for invalidating single pages.
# 95710	29-Apr-2002	peter	Tidy up some loose ends. i386/ia64/alpha - catch up to sparc64/ppc: - replace pmap_kernel() with refs to kernel_pmap - change kernel_pmap pointer to (&kernel_pmap_store) (this is a speedup since ld can set these at compile/link time) all platforms (as suggested by jake): - gc unused pmap_reference - gc unused pmap_destroy - gc unused struct pmap.pm_count (we never used pm_count - we track address space sharing at the vmspace)
# 95232	21-Apr-2002	jake	Avoid using pmap_kenter "early", since it may need to dink with vm_page structures, which may not be setup yet. Minor cleanups.
# 95133	20-Apr-2002	jake	Fix off by one errors in cache flush calls (mostly harmless).
# 94777	15-Apr-2002	peter	Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]() and pmap_copy_page(). This gets rid of a couple more physical addresses in upper layers, with the eventual aim of supporting PAE and dealing with the physical addressing mostly within pmap. (We will need either 64 bit physical addresses or page indexes, possibly both depending on the circumstances. Leaving this to pmap itself gives more flexibilitly.) Reviewed by: jake Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
# 93943	06-Apr-2002	jake	Remove invalid KASSERTS.
# 93687	02-Apr-2002	tmm	Fix crashes that would happen when more than one 4MB page was used to hold the kernel text, data and loader metadata by not using a fixed slot to store the TSB page(s) into. Enter fake 8k page entries into the kernel TSB that cover the 4M kernel page(s), sot that pmap_kenter() will work without having to treat these pages as a special case. Problem reported by: mjacob, obrien Problem spotted and 4M page handling proposed by: jake
# 92850	21-Mar-2002	jeff	Remove references to vm_zone.h and switch over to the new uma API. Reviewed by: jake
# 92654	19-Mar-2002	jeff	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
# 92465	16-Mar-2002	jake	Don't demap the requested page from the tlb in pmap_kenter or pmap_kremove, even on the local cpu. These are no longer used unsafely in MI code, and the MD code has been adjusted to compensate.
# 92464	16-Mar-2002	jake	Fix a problem where kernel text could become unmapped when clearing out all the user mappings from the tlb due to the context numbers rolling over. The store to the internal mmu register must be followed by a membar #Sync before much else happens to "avoid data corruption", so we use special inlines which both disable interrupts and ensure that the compiler will not insert extra instructions between the two. Also, load the tte tag and check if the context is nucleus context, rather than relying on the priviledged bit which doesn't actually serve any purpose in our design, and check the lock bit too for sanity.
# 92463	16-Mar-2002	jake	Use the tlb data access register to map the kernel tsb, rather than the data in register. The latter uses the random replacment algorithm to pick the slot, we want a specific slot.
# 91783	07-Mar-2002	jake	Implement delivery of tlb shootdown ipis. This is currently more fine grained than the other implementations; we have complete control over the tlb, so we only demap specific pages. We take advantage of the ranged tlb flush api to send one ipi for a range of pages, and due to the pm_active optimization we rarely send ipis for demaps from user pmaps. Remove now unused routines to load the tlb; this is only done once outside of the tlb fault handlers. Minor cleanups to the smp startup code. This boots multi user with both cpus active on a dual ultra 60 and on a dual ultra 2.
# 91782	07-Mar-2002	jake	Modify the tlb demap API to take a pmap instead of a tlb context number. Due to allocating tlb contexts on the fly, we only ever need to demap the primary context, non-primary contexts have already been implicitly flushed by context switching. All we really need to tell is if its a kernel demap or not, and its easier just to compare against the kernel_pmap which is a constant.
# 91781	07-Mar-2002	jake	Implement kthread context stealing. This is a bit of a misnomer because the context is not actually stolen, as it would be for i386. Instead of deactivating a user vmspace immediately when switching out, and recycling its tlb context, wait until the next context switch to a different user vmspace. In this way we can switch from a user process to any number of kernel threads and back to the same user process again, without losing any of its mappings in the tlb that would not already be knocked by the automatic replacement algorithm. This is not expected to have a measurable performance improvement on the machines we currently run on, but it sounds cool and makes the sparc64 port SMPng buzz word compliant.
# 91613	04-Mar-2002	jake	Allocate tlb contexts on the fly in cpu_switch, instead of statically 1 to 1 with pmaps. When the context numbers wrap around we flush all user mappings from the tlb. This makes use of the array indexed by cpuid to allow a pmap to have a different context number on a different cpu. If the context numbers are then divided evenly among cpus such that none are shared, we can avoid sending tlb shootdown ipis in an smp system for non-shared pmaps. This also removes a limit of 8192 processes (pmaps) that could be active at any given time due to running out of tlb contexts. Inspired by: the brown book Crucial bugfix from: tmm
# 91471	28-Feb-2002	silby	Fix a minor swap leak. Previously, the UPAGES/KSTACK area of processes/threads would leak memory at the time that a previously swapped process was terminated. Lukcily, the leak was only 12K/proc, so it was unlikely to be a major problem unless you had an undersized swap partition. Submitted by: dillon Reviewed by: silby MFC after: 1 week
# 91403	27-Feb-2002	silby	Fix a horribly suboptimal algorithm in the vm_daemon. In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times. Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change. Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began. Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals. PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week
# 91335	26-Feb-2002	jake	Wrap long lines.
# 91288	26-Feb-2002	jake	Convert pmap.pm_context to an array of contexts indexed by cpuid. This doesn't make sense for SMP right now, but it is a means to an end.
# 91286	26-Feb-2002	jake	Pu back a call to pmap_context_destroy which was accidentily removed in the previous commit. Spotted by: tmm
# 91274	26-Feb-2002	jake	Allow the user tsb to span multiple pages. Make the default 2 pages for now until we do some testing to see what's best. This gives a massive reduction in system time for processes with a relatively large working set. The size of the tsb directly affects the rss size that a user process can keep mapped. When it starts to get full replacements occur and the process takes a lot of soft vm faults. Increasing the default from 1 page to 2 gives the following before and after numbers for compiling vfs_bio.c: before: 14.27 real 6.56 user 5.69 sys after: 8.57 real 6.11 user 1.62 sys This should make self hosted builds more tolerable.
# 91257	25-Feb-2002	jake	Remove code to lock the user tsb into the tlb. We can handle faults on it now, as we do for normal wired kernel memory.
# 91224	25-Feb-2002	jake	Modify the tte format to not include the tlb context number and to store the virtual page number in a much more convenient way; all in one piece. This greatly simplifies the comparison for a matching tte, and allows the fault handlers to be much simpler due to not having to load wierd masks. Rewrite the tlb fault handlers to account for the new format. These are also written to allow faults on the user tsb inside of the fault handlers; the kernel fault handler must be aware of this and not clobber the other's registers. The faults do not yet occur due to other support that is needed (and still under my desk). Bug fixes from: tmm
# 91177	23-Feb-2002	jake	Make use of the ranged tlb demap operations where ever possible. Use pmap_qenter and pmap_qremove in preference to pmap_kenter/pmap_kremove. The former maps in multiple pages at a time, and so can do a ranged flush. Don't assume that pmap_kenter and pmap_kremove will flush the tlb, even though they still do. It will not once the MI code is updated to use pmap_qenter and pmap_qremove.
# 91168	23-Feb-2002	jake	Adapt the tsb_foreach interface to take a source and a destination pmap so that it can be used for pmap_copy. Other consumers ignore the second pmap. Add statistics gathering for tsb_foreach. Implement pmap_copy.
# 91166	23-Feb-2002	jake	Remove debug code.
# 91165	23-Feb-2002	jake	Add statistic gathering for various pmap operations. Submitted by: tmm
# 91164	23-Feb-2002	jake	Remove CADDR1 and CADDR2 which are no longer used. On other architectures these are used for copy and zeroing physical pages; we use physical addresses directly.
# 90625	13-Feb-2002	tmm	Calculate physmem before calling init_param2(). Submitted by: jake
# 89038	08-Jan-2002	jake	Update comments about _start, the kernel entry point, to reflect new parameters needed for smp support. If we are not the boot processor, jump to the smp startup code instead. Implement a per-cpu panic stack, which is used for bootstrapping both primary and secondary processors and during faults on the kernel stack. Arrange the per-cpu page like the pcb, with the struct pcpu at the end of the page and the panic stack before it. Use the boot processor's panic stack for calling sparc64_init. Split the code to set preloaded global registers and to map the kernel tsb out into functions, which non-boot processors can call. Allocate the kstack for thread0 dynamically in pmap_bootstrap, and give it a guard page too.
# 89036	08-Jan-2002	jake	Fix qsort callouts used for sorting memory regions during boot. vm_offset_t is unsigned, so we can't use signed arithmetic. Tripped over by: jhb
# 88903	05-Jan-2002	peter	Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).
# 88826	02-Jan-2002	tmm	1. Implement an optimization for pmap_remove() and pmap_protect(): if a substantial fraction of the number of entries of tte's in the tsb would need to be looked up, traverse the tsb instead. This is crucial in some places, e.g. when swapping out a process, where a certain pmap_remove() call would take very long time to complete without this. 2. Implement pmap_qenter_flags(), which will become used later 3. Reactivate the instruction cache flush done when mapping as executable. This is required e.g. when executing files via NFS, but is known to cause problems on UltraSPARC-IIe CPU's. If you have such a CPU, you will need to comment this call out for now. Submitted by: jake (3)
# 88786	01-Jan-2002	jake	Enable virtual caching for kernel pages. When we enabled virtual caching for certain user pages, stores to kernel pages would not update the affected cache lines, which would sometimes cause the wrong data to be returned for loads from kernel pages. This was especially fatal when the addresses affected held the kernel stack pointer, and a random value was loaded into it. Fix a harmless off by one error in a dcache_inval_phys call.
# 88647	29-Dec-2001	jake	Great pmap rewrite to use a much simpler one level tsb of ttes, instead of sttes. Also removes many differences between this and the other pmaps. Reserve the kva space used by the openfirmware translations. Use physical addresses directly in pmap_zero_page and pmap_copy_page, now that we have the cache line shooting support. Add code to track the virtual cachability of mapped pages. The dmmu requires that multiple mappings of the same phsyical address have the save virtual address bits up to a colour boundary. Violating this requires all mappings to be mapped uncacheable. We do not yet handle the case of a badly aliased mapping becoming cachable again. Many crucial bug fixes from: tmm
# 85258	20-Oct-2001	jake	Add missing includes.
# 85241	20-Oct-2001	jake	Parameterize the size of the kernel virtual address space on KVA_PAGES. Don't use a hard coded address constant for the virtual address of the kernel tsb. Allocate kernel virtual address space for the kernel tsb at runtime. Remove unused parameter to pmap_bootstrap. Adapt pmap.c to use KVA_PAGES. Map the message buffer too. Add some traces. Implement pmap_protect.
# 84844	12-Oct-2001	tmm	Add pmap_kenter_flags(), which is used by MD bus code that will be committed soon, add a stub form pmap_kenter_temporary(), and implement pmap_extract() and pmap_kextract().
# 84183	30-Sep-2001	jake	Move the kernel to end of the first 4 gigabytes of address space, so that one 4 meg page can map both the kernel and the openfirmware mappings. Add the openfirmware mappings to the kernel tsb so we can call the firmware on the kernel trap table and access kernel memory normally. Implement pmap_swapout_proc, pmap_swapin_proc, pmap_swapout_thread, pmap_swapin_thread, pmap_activate, pmap_page_exists, and pmap_phys_address.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 82903	03-Sep-2001	jake	Implement pv_bit_count which is used by pmap_ts_referenced. Remove the modified tte bit and add a softwrite bit. Mappings are only writeable if they have been written to, thus in general modify just duplicates the write bit. The softwrite bit makes it easier to distinguish mappings which should be writeable but are not yet modified. Move the exec bit down one, it was being sign extended when used as an immediate operand. Use the lock bit to mean tsb page and remove the tsb bit. These are the only form of locked (tsb) entries we support and we need to conserve bits where possible. Implement pmap_copy_page and pmap_is_modified and friends. Detect mappings that are being being upgraded from read-only to read-write due to copy-on-write and update the write bit appropriately. Make trap_mmu_fault do the right thing for protection faults, which is necessary to implement copy on write correctly. Also handle a bunch more userland trap types and add ktr traces.
# 82633	31-Aug-2001	peter	Converge with i386/alpha/etc pmap.c for pmap_new_proc/pmap_dispose_proc().
# 81895	18-Aug-2001	jake	Increase the size of the phys_avail memory map. Implement pmap_dispose_proc. Turn some more potentially import functions into nops so we can do stuff until they matter.
# 81385	10-Aug-2001	jake	(forced commit, last was too early). Implement pmap_zero_page_area. Make some pmap functions no-ops for now so we can get through exec. Submitted by: tmm
# 81384	10-Aug-2001	jake	Pass a context to tlb_store_slot, use a member(Sync) after setting the secondary context register.
# 81337	09-Aug-2001	obrien	The author isn't a [UC] Regents. Correct the copyright language.
# 81181	06-Aug-2001	jake	Handle managed and unmanaged mapping better. Allocate an vm object for the tsb pages.
# 81135	04-Aug-2001	tmm	Add floating point context switching code for sparc64. Reviewed by: jake
# 81087	02-Aug-2001	jake	Move some code related to managing pv entries from the pmap module to the pv module. It works now that vtophys for sttes works.
# 80709	31-Jul-2001	jake	Flesh out the sparc64 port considerably. This contains: - mostly complete kernel pmap support, and tested but currently turned off userland pmap support - low level assembly language trap, context switching and support code - fully implemented atomic.h and supporting cpufunc.h - some support for kernel debugging with ddb - various header tweaks and filling out of machine dependent structures
# 80708	31-Jul-2001	jake	Add skeleton machine dependent headers and c files for a port of freebsd to a new architecture. This is the base of the sparc64 port, but contains limited machine dependent code, and can be used a base for ports. Included are: - standard machine dependent headers, tweaked for a 64 bit, big endian architecture, including empty versions of all the machine dependent structures - a machine independent atomic.h, which can be used until a port has support for interrupts and the operations really need to be atomic - stub versions of all the machine dependent functions, which panic when called and print out the name of the function that needs to be implemented. functions which are normally in assembly files are not included, but this should reduce the number of different undefined references on the first few compiles from hundreds to 5 or 6 Given minimal startup code and console support it should be trivial to make this compile and run the first few sysinits on almost any architecture. Requested by: alfred, imp, jhb