Cross Reference: /freebsd-10.1-release/sys/vm/vm

History log of /freebsd-10.1-release/sys/vm/vm_zeroidle.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 272461	02-Oct-2014	gjb	Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.1-release
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 254065	07-Aug-2013	kib	Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
# 198854	03-Nov-2009	attilio	Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>
# 181239	03-Aug-2008	trhodes	Fill in a few sysctl descriptions. Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc
# 178272	17-Apr-2008	jeff	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
# 177253	16-Mar-2008	rwatson	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# 172836	20-Oct-2007	julian	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
# 171445	14-Jul-2007	alc	Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun". Approved by: re (hrs)
# 170816	16-Jun-2007	alc	Enable the new physical memory allocator. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re
# 170307	04-Jun-2007	jeff	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 170170	31-May-2007	attilio	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
# 169667	18-May-2007	jeff	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
# 166637	11-Feb-2007	alc	Use the free page queue mutex instead of the page queue mutex to synchronize sleeping and waking of the zero idle thread.
# 166508	05-Feb-2007	alc	Change the free page queue lock from a spin mutex to a default (blocking) mutex. With the demise of Alpha support, there is no longer a reason for it to be a spin mutex.
# 166188	23-Jan-2007	jeff	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
# 164936	06-Dec-2006	julian	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# 163709	26-Oct-2006	jb	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# 161492	21-Aug-2006	alc	Add _vm_stats and _vm_stats_misc to the sysctl declarations in sysctl.h and eliminate their declarations from various source files.
# 161489	20-Aug-2006	alc	vm_page_zero_idle()'s return value serves no purpose. Eliminate it.
# 157815	17-Apr-2006	jhb	Change msleep() and tsleep() to not alter the calling thread's priority if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.
# 153940	31-Dec-2005	netchild	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
# 150727	29-Sep-2005	jhb	Trim a couple of unneeded includes.
# 141247	04-Feb-2005	ssouhlal	Set the scheduling class of the zeroidle thread to PRI_IDLE. Reviewed by: jhb Approved by: grehan (mentor) MFC after: 1 week
# 137306	06-Nov-2004	phk	Remove dangling variable
# 137268	05-Nov-2004	jhb	- Set the priority of the page zeroing thread using sched_prio() when the thread is created rather than adjusting the priority in the main function. (kthread_create() should probably take the initial priority as an argument.) - Only yield the CPU in the !PREEMPTION case if there are any other runnable threads. Yielding when there isn't anything else better to do just wastes time in pointless context switches (albeit while the system is idle.)
# 137104	31-Oct-2004	alc	Introduce a Boolean variable wakeup_needed to avoid repeated, unnecessary calls to wakeup() by vm_page_zero_idle_wakeup().
# 137079	30-Oct-2004	alc	Eliminate an unused but initialized variable.
# 134649	02-Sep-2004	scottl	Turn PREEMPTION into a kernel option. Make sure that it's defined if FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is enabled (code inspired by the PREEMPTION warning in kern_switch.c). This is a possible MT5 candidate.
# 134586	01-Sep-2004	julian	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days
# 134461	28-Aug-2004	iedowse	Prevent vm_page_zero_idle_wakeup() from attempting to wake up the page zeroing thread before it has been created. It was possible for calls to free() very early in the boot process to panic here because the sleep queues were not yet initialised. Specifically, sysinit_add() running at SI_SUB_KLD would trigger this if the array of pointers became big enough to require uma_large_alloc() allocations. Submitted by: peter
# 131481	02-Jul-2004	jhb	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
# 131473	02-Jul-2004	jhb	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
# 126588	04-Mar-2004	bde	Record exactly where this file was copied from. It wasn't repo-copied so this is not very obvious. Fixed some style bugs (mainly missing parentheses around return values).
# 125314	02-Feb-2004	jeff	- Use a seperate startup function for the zeroidle kthread. Use this to set P_NOLOAD prior to running the thread.
# 124944	25-Jan-2004	jeff	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
# 118848	12-Aug-2003	imp	Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's copyrighted files. Approved by: Matt Dillon
# 116226	11-Jun-2003	obrien	Use __FBSDID().
# 113070	04-Apr-2003	des	Rename a static variable to avoid future conflicts.
# 104964	12-Oct-2002	jeff	- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
# 100379	19-Jul-2002	peter	Set P_NOLOAD on the pagezero kthread so that it doesn't artificially skew the loadav. This is not real load. If you have a nice process running in the background, pagezero may sit in the run queue for ages and add one to the loadav, and thereby affecting other scheduling decisions.
# 100331	18-Jul-2002	alc	o Remove the acquisition and release of Giant from the idle priority thread that pre-zeroes free pages. o Remove GIANT_REQUIRED from some low-level page queue functions. (Instead assertions on the page queue lock are being added to the higher-level functions, like vm_page_wire(), etc.) In collaboration with: peter
# 100193	16-Jul-2002	alc	o Use vm_pageq_remove_nowakeup() and vm_pageq_enqueue() in vm_page_zero_idle() instead of partially duplicated implementations. In particular, this change guarantees that the number of free pages in the free queue(s) matches the global free page count when Giant is released. Submitted by: peter (via his p4 "pmap" branch)
# 99890	12-Jul-2002	dillon	Re-enable the idle page-zeroing code. Remove all IPIs from the idle page-zeroing code as well as from the general page-zeroing code and use a lazy tlb page invalidation scheme based on a callback made at the end of mi_switch. A number of people came up with this idea at the same time so credit belongs to Peter, John, and Jake as well. Two-way SMP buildworld -j 5 tests (second run, after stabilization) 2282.76 real 2515.17 user 704.22 sys before peter's IPI commit 2266.69 real 2467.50 user 633.77 sys after peter's commit 2232.80 real 2468.99 user 615.89 sys after this commit Reviewed by: peter, jhb Approved by: peter
# 99625	08-Jul-2002	peter	vm_page_queue_free_mtx is a spin mutex, not a normal sleep mutex. I do not know why this didn't panic my box, but I have most certainly been using it: peter@overcee[3:14pm]~src/sys/i386/i386-110> sysctl -a \| grep zero vm.stats.misc.zero_page_count: 2235 vm.stats.misc.cnt_prezero: 638951 vm.idlezero_enable: 1 vm.idlezero_maxrun: 16 Submitted by: Tor.Egge@cvsup.no.freebsd.org Approved by: Tor's patches are never wrong. :-)
# 99624	08-Jul-2002	peter	Turn the zeroidle process off for SMP systems, there is still a possible TLB problem when bouncing from one cpu to another (the original cpu will not have purged its TLB if the it simply went idle). Pointed out by: Tor.Egge@cvsup.no.freebsd.org Approved by: Tor is never wrong. :-)
# 99571	08-Jul-2002	peter	Add a special page zero entry point intended to be called via the single threaded VM pagezero kthread outside of Giant. For some platforms, this is really easy since it can just use the direct mapped region. For others, IPI sending is involved or there are other issues, so grab Giant when needed. We still have preemption issues to deal with, but Alan Cox has an interesting suggestion on how to minimize the problem on x86. Use Luigi's hack for preserving the (lack of) priority. Turn the idle zeroing back on since it can now actually do something useful outside of Giant in many cases.
# 99545	07-Jul-2002	alc	o Lock accesses to the free queue(s) in vm_page_zero_idle().
# 99072	29-Jun-2002	julian	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 94777	15-Apr-2002	peter	Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]() and pmap_copy_page(). This gets rid of a couple more physical addresses in upper layers, with the eventual aim of supporting PAE and dealing with the physical addressing mostly within pmap. (We will need either 64 bit physical addresses or page indexes, possibly both depending on the circumstances. Leaving this to pmap itself gives more flexibilitly.) Reviewed by: jake Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)
# 90538	11-Feb-2002	julian	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 82756	01-Sep-2001	jhb	Process priority is locked by the sched_lock, not the proc lock.
# 82314	25-Aug-2001	peter	Implement idle zeroing of pages. I've been tinkering with this on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.
# 79744	15-Jul-2001	benno	The i386-specific includes in this file were "fixed" by bracketing them with #ifndef __alpha__. Fix this for the rest of the world by turning it into #ifdef __i386__. Reviewed by: obrien
# 79273	05-Jul-2001	mjacob	Apply field bandages to the includes so compiles happen on alpha.
# 79265	04-Jul-2001	dillon	Move vm_page_zero_idle() from machine-dependant sections to a machine-independant source file, vm/vm_zeroidle.c. It was exactly the same for all platforms and updating them all was getting annoying.