Cross Reference: /freebsd-current/sys/kern/subr

History log of /freebsd-current/sys/kern/subr_trap.c
Revision	Date	Author	Comments
# b7b78c1c	28-Mar-2024	Randall Stewart <rrs@FreeBSD.org>	Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold HPTS inserts a softclock for system call return that optimizes performance. However when no HPTS threads need the help (i.e. when they have less than 100 or so connections) then there should be little work done i.e. check the counter and return instead of running through all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold. Reported by: eduardo Reviewed by: gallatin, glebius, tuexen Tested by: gallatin Differential Revision: https://reviews.freebsd.org/D44420
# e3cbc572	04-Dec-2023	Gleb Smirnoff <glebius@FreeBSD.org>	kern/subr_trap.c: repair the HPTS performance hack in userret() It wasn't functional as subr_trap.c doesn't include opt_inet.h. Put a better comment provided by gallatin@ in place of the old one. The idea is to use userret() as a cheap place to call a soft clock. This approach saves CPU on busy machines and saves power on idle machines. An alternative would be to constantly schedule callouts. Running with neither callouts nor the soft clock ruins HPTS precision. Reviewed by: tuexen, rrs Differential Revision: https://reviews.freebsd.org/D42860
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# c159f767	13-Apr-2023	Gordon Bergling <gbe@FreeBSD.org>	kern: remove a double word in a KASSERT in subr_trap - s/with with/with/ MFC after: 5 days
# c46771a7	25-Jul-2022	Konstantin Belousov <kib@FreeBSD.org>	kern/subr_trap.c: cleanup no longer needed headers Also bump Foundation' copyright year Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
# c6d31b83	18-Jul-2022	Konstantin Belousov <kib@FreeBSD.org>	AST: rework Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c. Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing. The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it. Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
# b53133a7	12-Feb-2022	Mateusz Guzik <mjg@FreeBSD.org>	proc: load/store p_cowgen using atomic primitives
# 626d6992	26-Dec-2021	Edward Tomasz Napierala <trasz@FreeBSD.org>	Move fork_rfppwait() check into ast() This will always sleep at least once, so it's a slow path by definition. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D33387
# 98168a6e	06-Sep-2021	Konstantin Belousov <kib@FreeBSD.org>	kqueue: drain kqueue taskqueue if syscall tickled it Otherwise return from the syscall and next syscall, which could be kevent(2) on the kqueue that should be notified, races with the kqueue taskqueue thread, and potentially misses the wakeup. This is reliably visible when kevent(2) only peeks into events using zeroed timeout. PR: 258310 Reported by: arichardson, Jan Kokemüller <jan.kokemueller@gmail.com> Reviewed by: arichardson, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31858
# b0f71f1b	10-Aug-2021	Mark Johnston <markj@FreeBSD.org>	amd64: Add MD bits for KMSAN Interrupt and exception handlers must call kmsan_intr_enter() prior to calling any C code. This is because the KMSAN runtime maintains some TLS in order to track initialization state of function parameters and return values across function calls. Then, to ensure that this state is kept consistent in the face of asynchronous kernel-mode excpeptions, the runtime uses a stack of TLS blocks, and kmsan_intr_enter() and kmsan_intr_leave() push and pop that stack, respectively. Use these functions in amd64 interrupt and exception handlers. Note that handlers for user->kernel transitions need not be annotated. Also ensure that trap frames pushed by the CPU and by handlers are marked as initialized before they are used. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31467
# d7955cc0	06-Jul-2021	Randall Stewart <rrs@FreeBSD.org>	tcp: HPTS performance enhancements HPTS drives both rack and bbr, and yet there have been many complaints about performance. This bit of work restructures hpts to help reduce CPU overhead. It does this by now instead of relying on the timer/callout to drive it instead use user return from a system call as well as lro flushes to drive hpts. The timer becomes a backstop that dynamically adjusts based on how "late" we are. Reviewed by: tuexen, glebius Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31083
# 57f22c82	10-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	sigfastblock: do not skip cursig/postsig loop in ast() Even if sigfastblock block is non-zero, non-blockable signals must be checked on ast and delivered now. This also affects debugger ability to attach, because issignal() also calls ptracestop() if there is a pending stop for debugee. Instead of checking for sigfastblock, and either setting PENDING flag for usermode or doing signal delivery loop, always do the loop after checking, and then handle PENDING bit. issignal() already does the right thing for fast-blocked case, allowing only STOPs and SIGKILL delivery to happen. Reported by: Vasily Postnicov <shamaz.mazum@gmail.com>, markj Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28089
# 46588778	02-Oct-2020	Edward Tomasz Napierala <trasz@FreeBSD.org>	Move KTRUSERRET() from userret() to ast(). It's a really long detour - it writes ktrace entries to the filesystem - so the overhead of ast() won't make any difference. Reviewed by: kib Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26404
# fecb19e4	14-Sep-2020	Edward Tomasz Napierala <trasz@FreeBSD.org>	Move td_softdep_cleanup() from userret() to ast(); it's infrequent at best. The schedule_cleanup() function already sets TDF_ASTPENDING. Reviewed by: kib, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26375
# 60f083ef	14-Sep-2020	Edward Tomasz Napierala <trasz@FreeBSD.org>	Move TDP_GEOM check from userret() to ast(); this code path is quite infrequent. Reviewed by: kib No objections: mav Tested by: pho MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26374
# 30d158ee	14-Sep-2020	Edward Tomasz Napierala <trasz@FreeBSD.org>	Move racct/rctl throttling from userret() to ast(). There's no reason for it to sit in the syscall fast path. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26368
# 022c2f55	09-Sep-2020	Gleb Smirnoff <glebius@FreeBSD.org>	In r354148 the goal was to check THREAD_CAN_SLEEP() only once for the purpose of epoch_trace() and for calling subsequent panic, but to keep code fully under INVARIANTS, so don't use bare function call to panic(). However, at the last stage of review a true value slipped in, while always false was assumed. I checked that in email archive with kib@. Noticed by: trasz
# 59838c1a	01-Apr-2020	John Baldwin <jhb@FreeBSD.org>	Retire procfs-based process debugging. Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837
# 0bc52b0b	10-Mar-2020	Konstantin Belousov <kib@FreeBSD.org>	Return reschedule_signals() to being static again. It was used after sigfastblock_setpend() call in in ast() when current thread fast-blocks signals. Add a flag to sigfastblock_setpend() to request reschedule, and remove the direct use of the function from subr_trap.c Tested by: pho Sponsored by: The FreeBSD Foundation
# 74cb9a53	20-Feb-2020	Konstantin Belousov <kib@FreeBSD.org>	Fix a bug in r358168, do not call sigfastblock_setpend() under a mutex. PR: 244250 Reported and tested by: lwhsu Sponsored by: The FreeBSD Foundation
# a113b17f	20-Feb-2020	Konstantin Belousov <kib@FreeBSD.org>	Do not read sigfastblock word on syscall entry. On machines with SMAP, fueword executes two serializing instructions which can be seen in microbenchmarks. As a measure to restore microbenchmark numbers, only read the word on the attempt to deliver signal in ast(). If the word is set, signal is not delivered and word is kept, preventing interruption of interruptible sleeps by signals until userspace calls sigfastblock(UNBLOCK) which clears the word. This way, the spurious EINTR that userspace can see while in critical section is on first interruptible sleep, if a signal is pending, and on signal posting. It is believed that it is not important for rtld and lbithr critical sections. It might be visible for the application code e.g. for the callback of dl_iterate_phdr(3), but again the belief is that the non-compliance is acceptable. Most important is that the retry of the sleeping syscall does not interrupt unless additional signal is posted. For now I added the knob kern.sigfastblock_fetch_always to enable the word read on syscall entry to be able to diagnose possible issues due to spurious EINTR. While there, do some code restructuting to have all sigfastblock() handling located in kern_sig.c. Reviewed by: jeff Discussed with: mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23622
# 0e84a878	14-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	Annotate branches in the syscall path This in particular significantly shortens amd64_syscall, which otherwise keeps jumping forward over 2KB of code in total. Note some of these branches should be either eliminated altogether or coalesced.
# 146fc63f	09-Feb-2020	Konstantin Belousov <kib@FreeBSD.org>	Add a way to manage thread signal mask using shared word, instead of syscall. A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773
# b52d50cf	11-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.
# 686bcb5c	15-Dec-2019	Jeff Roberson <jeff@FreeBSD.org>	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778
# 5757b59f	29-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Merge td_epochnest with td_no_sleeping. Epoch itself doesn't rely on the counter and it is provided merely for sleeping subsystems to check it. - In functions that sleep use THREAD_CAN_SLEEP() to assert correctness. With EPOCH_TRACE compiled print epoch info. - _sleep() was a wrong place to put the assertion for epoch, right place is sleepq_add(), as there ways to call the latter bypassing _sleep(). - Do not increase td_no_sleeping in non-preemptible epochs. The critical section would trigger all possible safeguards, no sleeping counter is extraneous. Reviewed by: kib
# ed9d69b5	24-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Use THREAD_CAN_SLEEP() macro to check if thread can sleep. There is no functional change. Discussed with: kib
# bac06038	15-Oct-2019	Gleb Smirnoff <glebius@FreeBSD.org>	When assertion for a thread not being in an epoch fails also print all entered epochs. Works with EPOCH_TRACE only. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D22017
# 0db7afd0	19-Aug-2019	Andriy Gapon <avg@FreeBSD.org>	assert that td_lk_slocks is not leaked upon return from kernel This is similar to checks for td_sx_slocks and td_rw_rlocks. Although td_lk_slocks is an implementation detail, it still makes sense to validate it. MFC after: 1 week Sponsored by: Panzura
# 64cf6a62	28-Nov-2018	Mateusz Guzik <mjg@FreeBSD.org>	Deinline racct throttling out of syscall exit path. racct is not enabled by default and even when it is enabled processes are typically not throttled. The order of checks is left unchanged since racct_enable will be annotated as __read_frequently, while checking for the flag in the processes would probably require an extra fetch. Sponsored by: The FreeBSD Foundation
# 5de96e33	03-Jun-2018	Matt Macy <mmacy@FreeBSD.org>	hwpmc: support sampling both kernel and user stacks when interrupted in kernel This adds the -U options to pmcstat which will attribute in-kernel samples back to the user stack that invoked the system call. It is not the default, because when looking at kernel profiles it is generally more desirable to merge all instances of a given system call together. Although heavily revised, this change is directly derived from D7350 by Jonathan T. Looney. Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks
# 2466d12b	22-May-2018	Mateusz Guzik <mjg@FreeBSD.org>	sx: port over writer starvation prevention measures from rwlock A constant stream of readers could completely starve writers and this is not a hypothetical scenario. The 'poll2_threads' test from the will-it-scale suite reliably starves writers even with concurrency < 10 threads. The problem was run into and diagnosed by dillon@backplane.com There was next to no change in lock contention profile during -j 128 pkg build, despite an sx lock being at the top. Tested by: pho
# 06bf2a6a	10-May-2018	Matt Macy <mmacy@FreeBSD.org>	Add simple preempt safe epoch API Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@
# ed9e8bc4	24-Mar-2018	Konstantin Belousov <kib@FreeBSD.org>	Account the size of the vslock-ed memory by the thread. Assert that all such memory is unwired on return to usermode. The count of the wired memory will be used to detect the copyout mode. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week
# df57947f	18-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	spdx: initial adoption of licensing ID tags. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133
# 83c9dea1	17-Apr-2017	Gleb Smirnoff <glebius@FreeBSD.org>	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156
# aca4bb91	25-Feb-2017	Konstantin Belousov <kib@FreeBSD.org>	Do not leak mount references for dying threads. Thread might create a condition for delayed SU cleanup, which creates a reference to the mount point in td_su, but exit without returning through userret(), e.g. when terminating due to single-threading or process exit. In this case, td_su reference is not dropped and mount point cannot be freed. Handle the situation by clearing td_su also in the thread destructor and in exit1(). softdep_ast_cleanup() has to receive the thread as argument, since e.g. thread destructor is executed in different context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 77d68094	18-Jul-2016	Konstantin Belousov <kib@FreeBSD.org>	The assertion re-added in r302614 was triggered when stopping signal is delivered to vforked child. Issue is that we avoid stopping such children in issignal() to not block parents. But executed AST, which ignored stops, leaves the child with the signal pending but no AST pending. On first exec after vfork(), call signotify() to handle pending reenabled signals. Adjust the assert to not check vfork children until exec. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 8f01bee4	11-Jul-2016	Konstantin Belousov <kib@FreeBSD.org>	Revive the check, disabled in r197963. Despite the implication (process has pending signals -> the current thread marked for AST and has TDF_NEEDSIGCHK set) is not true due to other thread might manipulate its signal blocking mask, it should still hold for the single-threaded processes. Enable check for the condition for single-threaded case, and replicate it from userret() to ast() as well, where we check that ast indeed has no signal to deliver. Note that the check is under DIAGNOSTIC, it is not enabled for INVARIANTS but !DIAGNOSTIC since it imposes too heavy-weight locking for day-to-day used debugging kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 4f4c35bb	11-Jul-2016	Konstantin Belousov <kib@FreeBSD.org>	Add assert to complement r302328. AST must not execute with TDF_SBDRY or TDF_SEINTR/TDF_SERESTART thread flags set, which is asserted in userret(). As the consequence, -1 return from cursig() must not be possible. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 3a1e5dd8	26-Jun-2016	Konstantin Belousov <kib@FreeBSD.org>	Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
# ae34b6ff	06-Apr-2016	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080
# e94e50af	13-Jul-2015	Mateusz Guzik <mjg@FreeBSD.org>	racct: perform a lockless check for p_throttled This reduces proc lock contention. Reviewed by: trasz
# 4ea6a9a2	10-Jun-2015	Mateusz Guzik <mjg@FreeBSD.org>	Generalised support for copy-on-write structures shared by threads. Thread credentials are maintained as follows: each thread has a pointer to creds and a reference on them. The pointer is compared with proc's creds on userspace<->kernel boundary and updated if needed. This patch introduces a counter which can be compared instead, so that more structures can use this scheme without adding more comparisons on the boundary.
# 1bc93bb7	27-May-2015	Konstantin Belousov <kib@FreeBSD.org>	Currently, softupdate code detects overstepping on the workitems limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# ed95805e	30-Apr-2015	John Baldwin <jhb@FreeBSD.org>	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes
# 4b5c9cf6	29-Apr-2015	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). Differential Revision: https://reviews.freebsd.org/D2369 Reviewed by: kib@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation
# 18cc2ff0	12-Jan-2015	Konstantin Belousov <kib@FreeBSD.org>	Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmem does not access kernel mappings directly. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 52f3c44e	21-Mar-2014	Konstantin Belousov <kib@FreeBSD.org>	Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, success returned from the kernacc(9) does not guarantee that consequent attempt to read or write to the checked address succeed, since other thread might invalidate the address meantime. Add a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would then see a page fault from access, and recover in normal way, making /dev/mem access safer. Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed and having Giant locked does not solve issues for amd64. Note that at least the second issue exists on other architectures, and requires similar patching for md code. Reported and tested by: clusteradm (gjb, sbruno) Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 4a144410	16-Mar-2014	Robert Watson <rwatson@FreeBSD.org>	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
# e7a9eed7	17-Dec-2013	Attilio Rao <attilio@FreeBSD.org>	- Assert for not leaking readers rw locks counter on userland return. - Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock. Sponsored by: EMC / Isilon storage division
# 54366c0b	25-Nov-2013	Attilio Rao <attilio@FreeBSD.org>	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# 3cf3b9f0	18-Mar-2013	John Baldwin <jhb@FreeBSD.org>	Partially revert r195702. Deferring stops is now implemented via a set of calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
# a8efb534	14-Mar-2013	Edward Tomasz Napierala <trasz@FreeBSD.org>	When throttling a process to enforce RACCT limits, do not use neither PBDRY (which simply doesn't make any sense) nor PCATCH (which could be used by a malicious process to work around the PCPU limit). Submitted by: Rudo Tomori Reviewed by: kib
# f9379dc4	01-Mar-2013	John Baldwin <jhb@FreeBSD.org>	Replace the TDP_NOSLEEPING flag with a counter so that the THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio
# 593efaf9	21-Feb-2013	John Baldwin <jhb@FreeBSD.org>	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month
# 5584e917	30-Oct-2012	Attilio Rao <attilio@FreeBSD.org>	Fixup r240246: hwpmc needs to retain the pinning until ASTs are not executed. This means past the point where userret() is generally executed. Skip the td_pinned check if a callchain tracing is currently happening and add a more robust check to pmc_capture_user_callchain() in order to catch td_pinned leak past ast() in hwpmc case. Reported and tested by: fabient MFC after: 1 week X-MFC: r240246
# 36af9869	26-Oct-2012	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add CPU percentage limit enforcement to RCTL. The resouce name is "pcpu". It was implemented by Rudolf Tomori during Google Summer of Code 2012.
# 9b233e23	14-Oct-2012	Konstantin Belousov <kib@FreeBSD.org>	Add a KPI to allow to reserve some amount of space in the numvnodes counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week
# 16cbf13b	08-Sep-2012	Attilio Rao <attilio@FreeBSD.org>	Move the checks for td_pinned, td_critnest, TDP_NOFAULTING and TDP_NOSLEEPING leaking from syscallret() to userret() so that also trap handling is covered. Also, the check on td_locks is not duplicated between the two functions. Reported by: avg Reviewed by: kib MFC after: 1 week
# fbe18392	08-Sep-2012	Attilio Rao <attilio@FreeBSD.org>	Move PT_UPDATED_FLUSH() before td_locks check in order to have more coverage also in the XEN case. Reviewed by: kib MFC after: 1 week
# 324e5715	08-Sep-2012	Attilio Rao <attilio@FreeBSD.org>	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week
# effb6326	10-Jun-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Remove redundant include. MFC after: 1 month
# 88bf5036	20-Apr-2012	John Baldwin <jhb@FreeBSD.org>	Include the associated wait channel message for context switch ktrace records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week
# f5f9340b	28-Mar-2012	Fabien Thomas <fabient@FreeBSD.org>	Add software PMC support. New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month
# 24f3dcfe	03-Oct-2011	Konstantin Belousov <kib@FreeBSD.org>	Assert that exiting process does not return to usermode. Reviewed by: avg, jhb MFC after: 1 week
# 8451d0dd	16-Sep-2011	Kip Macy <kmacy@FreeBSD.org>	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 26ccf4f1	11-Sep-2011	Konstantin Belousov <kib@FreeBSD.org>	Inline the syscallenter() and syscallret(). This reduces the time measured by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks
# 24c1c3bf	29-Jun-2011	Jonathan Anderson <jonathan@FreeBSD.org>	We may split today's CAPABILITIES into CAPABILITY_MODE (which has to do with global namespaces) and CAPABILITIES (which has to do with constraining file descriptors). Just in case, and because it's a better name anyway, let's move CAPABILITIES out of the way. Also, change opt_capabilities.h to opt_capsicum.h; for now, this will only hold CAPABILITY_MODE, but it will probably also hold the new CAPABILITIES (implying constrained file descriptors) in the future. Approved by: rwatson Sponsored by: Google UK Ltd
# fc94e447	01-Mar-2011	Robert Watson <rwatson@FreeBSD.org>	Continue introducing Capsicum capability mode support: If a system call wasn't listed in capabilities.conf, return ECAPMODE at syscall entry. Reviewed by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months
# bf9ce95b	14-Feb-2011	Bjoern A. Zeeb <bz@FreeBSD.org>	Mfp4 CH=177256: Catch a set vnet upon return to user space. This usually means return paths with CURVNET_RESTORE() missing. If VNET_DEBUG is turned on we can even tell the function that did the CURVNET_SET() which is really helpful; else we print "N/A". Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 11 days
# 6fa39a73	25-Jan-2011	Konstantin Belousov <kib@FreeBSD.org>	Allow debugger to specify that children of the traced process should be automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 25e45560	27-Sep-2010	Ed Maste <emaste@FreeBSD.org>	Remove extra braces for style(9) (found while cleaning up an old work tree).
# b3d354c9	22-Aug-2010	Rui Paulo <rpaulo@FreeBSD.org>	Call the systrace_probe_func() when the error value. Sponsored by: The FreeBSD Foundation
# f2a664ac	15-Jul-2010	John Baldwin <jhb@FreeBSD.org>	Retire td_syscalls now that it is no longer needed.
# 34a39b7b	04-Jul-2010	Konstantin Belousov <kib@FreeBSD.org>	Obey sv_syscallnames bounds in syscallname(). Reported and tested by: pho
# fc0de8f0	30-Jun-2010	John Baldwin <jhb@FreeBSD.org>	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
# 153ac44c	28-Jun-2010	Konstantin Belousov <kib@FreeBSD.org>	Count number of threads that enter and leave dynamically registered syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading. Reviewed by: jhb Tested by: pho MFC after: 1 month
# 699d648a	23-Jun-2010	Konstantin Belousov <kib@FreeBSD.org>	Remove the support for int13 FPU exception reporting on i386. It is believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)
# f05a9476	17-Jun-2010	Rui Paulo <rpaulo@FreeBSD.org>	Make DTrace syscall provider work again by including opt_kdtrace.h here.
# b2318c28	26-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Allow to use syscallname(9) outside subr_trap.c. MFC after: 1 month
# afe1a688	23-May-2010	Konstantin Belousov <kib@FreeBSD.org>	Reorganize syscall entry and leave handling. Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_syscall pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
# 7e767511	19-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r198508, r198509: Reimplement pselect() in kernel, making change of sigmask and sleep atomic. MFC r198538: Move pselect(3) man page to section 2.
# 6ddf1cd2	19-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r197963: Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode. Change cursig() and postsig() to look both into the thread and process signal queues. MFC r197976: Fix typo. MFC r200082: Remove wrong assertion. Debugee is allowed to lose a signal
# 066d836b	27-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Current pselect(3) is implemented in usermode and thus vulnerable to well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month
# 6b286ee8	11-Oct-2009	Konstantin Belousov <kib@FreeBSD.org>	Currently, when signal is delivered to the process and there is a thread not blocking the signal, signal is placed on the thread sigqueue. If the selected thread is in kernel executing thr_exit() or sigprocmask() syscalls, then signal might be not delivered to usermode for arbitrary amount of time, and for exiting thread it is lost. Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode, since only then the thread can handle delivery of signal reliably. For exiting thread or thread that has blocked some signals, check whether the newly blocked signal is queued for the process, and try to find a thread to wakeup for delivery, in reschedule_signal(). For exiting thread, assume that all signals are blocked. Change cursig() and postsig() to look both into the thread and process signal queues. When there is a signal that thread returning to usermode could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do unlocked read of p_siglist and p_pendingcnt to check for queued signals. Note that thread that has a signal unblocked might get spurious wakeup and EINTR from the interruptible system call now, due to the possibility of being selected by reschedule_signals(), while other thread returned to usermode earlier and removed the signal from process queue. This should not cause compliance issues, since the thread has not blocked a signal and thus should be ready to receive it anyway. Reported by: Justin Teller <justin.teller gmail com> Reviewed by: davidxu, jilles MFC after: 1 month
# f33a947b	14-Jul-2009	Konstantin Belousov <kib@FreeBSD.org>	Add new msleep(9) flag PBDY that shall be specified together with PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
# bcf11e8d	05-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
# 6fe00c78	13-Dec-2008	Joseph Koshy <jkoshy@FreeBSD.org>	- Bug fix: prevent a thread from migrating between CPUs between the time it is marked for user space callchain capture in the NMI handler and the time the callchain capture callback runs. - Improve code and control flow clarity by invoking hwpmc(4)'s user space callchain capture callback directly from low-level code. Reviewed by: jhb (kern/subr_trap.c) Testing (various patch revisions): gnn, Fabien Thomas <fabien dot thomas at netasq dot com>, Artem Belevich <artemb at gmail dot com>
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 50d6e424	18-Oct-2008	Kip Macy <kmacy@FreeBSD.org>	- Forward port flush of page table updates on context switch or userret - Forward port vfork XEN hack
# 8df78c41	16-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
# b7edba77	21-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needs to enter thread_suspend_check(). - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the thread_suspend_check() to ast() rather than userret(). - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so that we don't miss a suspend request. If this is set use the expensive signal path. - Set NEEDSUSPCHK when creating a new thread in thr in case the creating thread is due to be suspended as well but has not yet. Reviewed by: davidxu (Authored original patch)
# 6617724c	12-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# d07f36b0	07-Dec-2007	Joseph Koshy <jkoshy@FreeBSD.org>	Kernel and hwpmc(4) support for callchain capture. Sponsored by: FreeBSD Foundation and Google Inc.
# e01eafef	13-Nov-2007	Julian Elischer <julian@FreeBSD.org>	A bunch more files that should probably print out a thread name instead of a process name.
# b61ce5b0	16-Sep-2007	Jeff Roberson <jeff@FreeBSD.org>	- Move all of the PS_ flags into either p_flag or td_flags. - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
# 3036ab79	12-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	- Include opt_sched.h for SCHED_STATS.
# 982d11f8	04-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# b4b70819	04-Jun-2007	Attilio Rao <attilio@FreeBSD.org>	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
# 1c4bcd05	31-May-2007	Jeff Roberson <jeff@FreeBSD.org>	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
# 2feb50bf	31-May-2007	Attilio Rao <attilio@FreeBSD.org>	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
# 222d0195	18-May-2007	Jeff Roberson <jeff@FreeBSD.org>	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
# 0c14ff0e	04-Mar-2007	Robert Watson <rwatson@FreeBSD.org>	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
# ad1e7d28	05-Dec-2006	Julian Elischer <julian@FreeBSD.org>	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# 8460a577	26-Oct-2006	John Birrell <jb@FreeBSD.org>	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# aed55708	22-Oct-2006	Robert Watson <rwatson@FreeBSD.org>	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
# 1ca2c018	17-Oct-2006	Bruce Evans <bde@FreeBSD.org>	kern_intr.c: - Count (scheduling of) software interrupts (SWIs) as SWIs, not as hardware interrupts. - Don't count (scheduling of) delayed SWIs as interrupts at all, since in the delayed case it is expected that there are many more scheduling calls than handling calls. Perhaps all interrupts should be counted only when they are handled, but it is only counts of delayed SWIs that shouldn never be combined with the other counts. subr_trap.c: - Count (handling of) Asynchronous System Traps (ASTs) as traps, not as software interrupts. Before these changes, the counter for SWIs only counted ASTs, and SWIs weren't counted separately, but a subcounter for ASTs alone is less needed than for most other exception sources. 4.4BSD-Lite uses the counters for similar things (actually matching their names) on its main arches (hp300, ..., !i386) where more of the exceptions are in hardware.
# 42925630	10-Feb-2006	David Xu <davidxu@FreeBSD.org>	Test before modifying p_sflag to avoid unconditionally cache line ping-pong on SMP.
# eb2da9a5	08-Feb-2006	Poul-Henning Kamp <phk@FreeBSD.org>	Simplify system time accounting for profiling. Rename struct thread's td_sticks to td_pticks, we will need the other name for more appropriately named use shortly. Reduce it from uint64_t to u_int. Clear td_pticks whenever we enter the kernel instead of recording its value as reference for userret(). Use the absolute value of td->pticks in userret() and eliminate third argument.
# 5b1a8eb3	07-Feb-2006	Poul-Henning Kamp <phk@FreeBSD.org>	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.
# 2c255e9d	13-Nov-2005	Robert Watson <rwatson@FreeBSD.org>	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>
# 9104847f	13-Oct-2005	David Xu <davidxu@FreeBSD.org>	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
# 9d65cdf6	27-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Rev 1.83 of kern_lock.c fixes the td_locks assert, reenable it here. Sponsored by: Isilon Systems, Inc.
# eb8d0e01	25-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- The td_locks check is currently broken with snapshots and possibly some case in unmount. Disable the KASSERT until these problems can be diagnosed. Sponsored by: Isilon Systems, Inc.
# 61ef09d1	24-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Fail an assert if we attempt to return with any lockmgr locks held in userret(). Sponsored by: Isilon Systems, Inc.
# 99b808f4	30-Dec-2004	John Baldwin <jhb@FreeBSD.org>	Whitespace fix.
# 6a987020	26-Dec-2004	Jeff Roberson <jeff@FreeBSD.org>	- Run sched_userret() after thread_userret(). Before, sched_userret() would lower the priority of the returning thread to a user priority before calling into thread_userret() which would call wakeup() which in turn would cause the returning thread to eventually context switch rather than completing its slice. Allowing this thread to complete its slice first yields a 15% performance improvement in super-smack on my dual opteron with 4BSD.
# 9197ce2e	23-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Add a new per-thread private flag: TDP_GEOM. This flag gets set whenever the thread posts an event on the GEOM event queue, and if the flag is set when the thread is prepared to return to userland from the kernel, g_waitidle() will be called to make sure that the posted events have completed. This can replace an insufficient number of g_waitidle() calls in various other places, and has the advantage of being failsafe: Any system call which does a VOP_OPEN()/VOP_CLOSE will now correctly wait for any geom events it posted as part of spoils or tastes. Assert that topology and Giant is not held in g_waitidle().
# 78c85e8d	05-Oct-2004	John Baldwin <jhb@FreeBSD.org>	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month
# ea73c1ea	23-Sep-2004	John Baldwin <jhb@FreeBSD.org>	Don't try to protect td_sticks with sched_lock. It doesn't need it as it is only accessed by curthread.
# 7eaec467	22-Sep-2004	John Baldwin <jhb@FreeBSD.org>	Various small style fixes.
# 5995adc2	31-Aug-2004	Julian Elischer <julian@FreeBSD.org>	Remove an unneeded argument.. The removed argument could trivially be derived from the remaining one. That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument. Having both proc and thread as an argumen tjust gives an opportunity for them to get out sync. MFC after: 3 days
# 99e9dcb8	31-Aug-2004	Julian Elischer <julian@FreeBSD.org>	Remove sched_free_thread() which was only used in diagnostics. It has outlived its usefulness and has started causing panics for people who turn on DIAGNOSTIC, in what is otherwise good code. MFC after: 2 days
# 2b70a83a	08-Aug-2004	David Xu <davidxu@FreeBSD.org>	Call thread_user_enter for M:N thread, ast() should be treated as another entrance of kernel.
# 52eb8464	16-Jul-2004	John Baldwin <jhb@FreeBSD.org>	- Move TDF_OWEPREEMPT, TDF_OWEUPC, and TDF_USTATCLOCK over to td_pflags since they are only accessed by curthread and thus do not need any locking. - Move pr_addr and pr_ticks out of struct uprof (which is per-process) and directly into struct thread as td_profil_addr and td_profil_ticks as these variables are really per-thread. (They are used to defer an addupc_intr() that was too "hard" until ast()).
# bf0acc27	02-Jul-2004	John Baldwin <jhb@FreeBSD.org>	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
# a3a70178	01-Jul-2004	John Baldwin <jhb@FreeBSD.org>	Tidy up uprof locking. Mostly the fields are protected by both the proc lock and sched_lock so they can be read with either lock held. Document the locking as well. The one remaining bogosity is that pr_addr and pr_ticks should be per-thread but profiling of multithreaded apps is currently undefined.
# 4ccbe07e	31-Mar-2004	Julian Elischer <julian@FreeBSD.org>	Remove unused variable.
# 37814395	13-Mar-2004	Peter Wemm <peter@FreeBSD.org>	Push Giant down a little further: - no longer serialize on Giant for thread_single*() and family in fork, exit and exec - thread_wait() is mpsafe, assert no Giant - reduce scope of Giant in exit to not cover thread_wait and just do vm_waitproc(). - assert that thread_single() family are not called with Giant - remove the DROP/PICKUP_GIANT macros from thread_single() family - assert that thread_suspend_check() s not called with Giant - remove manual drop_giant hack in thread_suspend_check since we know it isn't held. - remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family - mark kse_create() mpsafe
# 16df17d0	05-Mar-2004	Robert Watson <rwatson@FreeBSD.org>	Put "failed to set signal flags properly for ast()" check under DIAGNOSTIC instead of INVARIANTS. INVARIANTS is intended for tests that don't substantially change code flow or behavior (passive), but this test required locking both the proc lock and scheduler lock in order to execute. It also appears to be a very advisory diagnostic as opposed to an invariant violation. Following discussion with: bde
# 91d5354a	04-Feb-2004	John Baldwin <jhb@FreeBSD.org>	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
# 29bcc451	24-Jan-2004	Jeff Roberson <jeff@FreeBSD.org>	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
# 917cf8d2	05-Sep-2003	Peter Wemm <peter@FreeBSD.org>	Log involuntary context switches correctly.
# 75ea65e3	04-Aug-2003	David Xu <davidxu@FreeBSD.org>	kse.h is not needed for these files.
# aeaead20	30-Jul-2003	Peter Wemm <peter@FreeBSD.org>	When ktracing context switches, make sure we record involuntary switches. Otherwise, when we get a evicted from the cpu, there is no record of it. This is not a default ktrace flag.
# 9dde3bc9	28-Jun-2003	David Xu <davidxu@FreeBSD.org>	o Change kse_thr_interrupt to allow send a signal to a specified thread, or unblock a thread in kernel, and allow UTS to specify whether syscall should be restarted. o Add ability for UTS to monitor signal comes in and removed from process, the flag PS_SIGEVENT is used to indicate the events. o Add a KMF_WAITSIGEVENT for KSE mailbox flag, UTS call kse_release with this flag set to wait for above signal event. o For SA based thread, kernel masks all signal in its signal mask, let UTS to use kse_thr_interrupt interrupt a thread, and install a signal frame in userland for the thread. o Add a tm_syncsig in thread mailbox, when a hardware trap occurs, it is used to deliver synchronous signal to userland, and upcall is schedule, so UTS can process the synchronous signal for the thread. Reviewed by: julian (mentor)
# cd4f6ebb	14-Jun-2003	David Xu <davidxu@FreeBSD.org>	1. Add code to support bound thread. when blocked, a bound thread never schedules an upcall. Signal delivering to a bound thread is same as non-threaded process. This is intended to be used by libpthread to implement PTHREAD_SCOPE_SYSTEM thread. 2. Simplify kse_release() a bit, remove sleep loop.
# 0e2a4d3a	14-Jun-2003	David Xu <davidxu@FreeBSD.org>	Rename P_THREADED to P_SA. P_SA means a process is using scheduler activations.
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 90af4afa	13-May-2003	John Baldwin <jhb@FreeBSD.org>	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
# 5eac9e2d	23-Apr-2003	John Baldwin <jhb@FreeBSD.org>	The signotify() sanity check in userret() doesn't need Giant anymore.
# 9752f794	22-Apr-2003	John Baldwin <jhb@FreeBSD.org>	- Move PS_PROFIL and its new cousin PS_STOPPROF back over to p_flag and rename them appropriately. Protect both flags with both the proc lock and the sched_lock. - Protect p_profthreads with the proc lock. - Remove Giant from profil(2).
# f5d5cb3c	17-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Tweak locking in the PS_XCPU handler to hold the sched_lock while reading p_runtime.
# 4093529d	31-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
# 1bf4700b	31-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Change trapsignal() to accept a thread and not a proc. - Change all consumers to pass in a thread. Right now this does not cause any functional changes but it will be important later when signals can be delivered to specific threads.
# 21e0492a	10-Mar-2003	David Xu <davidxu@FreeBSD.org>	Fix signal delivering bug for threaded process.
# 26306795	04-Mar-2003	John Baldwin <jhb@FreeBSD.org>	Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls to WITNESS_WARN().
# ac2e4153	26-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Change the process flags P_KSES to be P_THREADED. This is just a cosmetic change but I've been meaning to do it for about a year.
# 58a3c273	17-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Add a new function, thread_signal_add(), that is called from postsig to add a signal to a mailbox's pending set. - Add a new function, thread_signal_upcall(), this causes the current thread to upcall so that we can deliver pending signals. Reviewed by: mini
# 4a338afd	17-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
# e4625663	16-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into the proc. These counters are only examined through calcru. Submitted by: davidxu Tested on: x86, alpha, UP/SMP
# 6f8132a8	31-Jan-2003	Julian Elischer <julian@FreeBSD.org>	Reversion of commit by Davidxu plus fixes since applied. I'm not convinced there is anything major wrong with the patch but them's the rules.. I am using my "David's mentor" hat to revert this as he's offline for a while.
# 48ed1432	31-Jan-2003	Tim J. Robbins <tjr@FreeBSD.org>	Use a local variable to store the number of ticks that elapsed in kernel mode instead of (unintentionally) using the global `ticks'. This error completely broke profiling.
# 0dbb100b	26-Jan-2003	David Xu <davidxu@FreeBSD.org>	Move UPCALL related data structure out of kse, introduce a new data structure called kse_upcall to manage UPCALL. All KSE binding and loaning code are gone. A thread owns an upcall can collect all completed syscall contexts in its ksegrp, turn itself into UPCALL mode, and takes those contexts back to userland. Any thread without upcall structure has to export their contexts and exit at user boundary. Any thread running in user mode owns an upcall structure, when it enters kernel, if the kse mailbox's current thread pointer is not NULL, then when the thread is blocked in kernel, a new UPCALL thread is created and the upcall structure is transfered to the new UPCALL thread. if the kse mailbox's current thread pointer is NULL, then when a thread is blocked in kernel, no UPCALL thread will be created. Each upcall always has an owner thread. Userland can remove an upcall by calling kse_exit, when all upcalls in ksegrp are removed, the group is atomatically shutdown. An upcall owner thread also exits when process is in exiting state. when an owner thread exits, the upcall it owns is also removed. KSE is a pure scheduler entity. it represents a virtual cpu. when a thread is running, it always has a KSE associated with it. scheduler is free to assign a KSE to thread according thread priority, if thread priority is changed, KSE can be moved from one thread to another. When a ksegrp is created, there is always N KSEs created in the group. the N is the number of physical cpu in the current system. This makes it is possible that even an userland UTS is single CPU safe, threads in kernel still can execute on different cpu in parallel. Userland calls kse_create to add more upcall structures into ksegrp to increase concurrent in userland itself, kernel is not restricted by number of upcalls userland provides. The code hasn't been tested under SMP by author due to lack of hardware. Reviewed by: julian
# 93a7aa79	27-Dec-2002	Julian Elischer <julian@FreeBSD.org>	Add code to ddb to allow backtracing an arbitrary thread. (show thread {address}) Remove the IDLE kse state and replace it with a change in the way threads sahre KSEs. Every KSE now has a thread, which is considered its "owner" however a KSE may also be lent to other threads in the same group to allow completion of in-kernel work. n this case the owner remains the same and the KSE will revert to the owner when the other work has been completed. All creations of upcalls etc. is now done from kse_reassign() which in turn is called from mi_switch or thread_exit(). This means that special code can be removed from msleep() and cv_wait(). kse_release() does not leave a KSE with no thread any more but converts the existing thread into teh KSE's owner, and sets it up for doing an upcall. It is just inhibitted from being scheduled until there is some reason to do an upcall. Remove all trace of the kse_idle queue since it is no-longer needed. "Idle" KSEs are now on the loanable queue.
# 52378b8a	08-Nov-2002	Robert Watson <rwatson@FreeBSD.org>	To reduce per-return overhead of userret(), call into mac_thread_userret() only if PS_MACPEND is set in the process AST mask. This avoids the cost of the entry point in the common case, but requires policies interested in the userret event to set the flag (protected by the scheduler lock) if they do want the event. Since all the policies that we're working with which use mac_thread_userret() use the entry point only selectively to perform operations deferred for locking reasons, this maintains the desired semantics. Approved by: re Requested by: bde Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 053effc6	25-Oct-2002	Julian Elischer <julian@FreeBSD.org>	iBack out david's last commit. the suspension code needs to be called for non KSE processes too.
# 3139ada5	25-Oct-2002	David Xu <davidxu@FreeBSD.org>	Move suspension checking code from userret() into thread_userret().
# b43179fb	11-Oct-2002	Jeff Roberson <jeff@FreeBSD.org>	- Create a new scheduler api that is defined in sys/sched.h - Begin moving scheduler specific functionality into sched_4bsd.c - Replace direct manipulation of scheduler data with hooks provided by the new api. - Remove KSE specific state modifications and single runq assumptions from kern_switch.c Reviewed by: -arch
# 5715307f	09-Oct-2002	John Baldwin <jhb@FreeBSD.org>	- Move p_cpulimit to struct proc from struct plimit and protect it with sched_lock. This means that we no longer access p_limit in mi_switch() and the p_limit pointer can be protected by the proc lock. - Remove PRS_ZOMBIE check from CPU limit test in mi_switch(). PRS_ZOMBIE processes don't call mi_switch(), and even if they did there is no longer the danger of p_limit being NULL (which is what the original zombie check was added for). - When we bump the current processes soft CPU limit in ast(), just bump the private p_cpulimit instead of the shared rlimit. This fixes an XXX for some value of fix. There is still a (probably benign) bug in that this code doesn't check that the new soft limit exceeds the hard limit. Inspired by: bde (2)
# 289e1e23	02-Oct-2002	Juli Mallett <jmallett@FreeBSD.org>	Access td->td_kse inside sched_lock. Submitted by: julian
# bc7b9f1d	02-Oct-2002	Juli Mallett <jmallett@FreeBSD.org>	De-obfuscate local use of members of 'struct thread', for which we have local variables, and group assignment.
# 92dbb82a	01-Oct-2002	Robert Watson <rwatson@FreeBSD.org>	Add a new MAC entry point, mac_thread_userret(td), which permits policy modules to perform MAC-related events when a thread returns to user space. This is required for policies that have floating process labels, as it's not always possible to acquire the process lock at arbitrary points in the stack during system call processing; process labels might represent traditional authentication data, process history information, or other data. LOMAC will use this entry point to perform the process label update prior to the thread returning to userspace, when plugged into the MAC framework. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 1d9c5696	01-Oct-2002	Juli Mallett <jmallett@FreeBSD.org>	Back our kernel support for reliable signal queues. Requested by: rwatson, phk, and many others
# feb24496	01-Oct-2002	John Baldwin <jhb@FreeBSD.org>	Minor style nits in a comment.
# 6cae6dac	01-Oct-2002	John Baldwin <jhb@FreeBSD.org>	Various style fixups. Submitted by: bde (mostly)
# f6ccde83	01-Oct-2002	John Baldwin <jhb@FreeBSD.org>	Actually clear PS_XCPU in ast() when we handle it. Submitted by: bde Pointy hat to: jhb
# dc183990	30-Sep-2002	John Baldwin <jhb@FreeBSD.org>	- Add a new per-process flag PS_XCPU to indicate that at least one thread has exceeded its CPU time limit. - In mi_switch(), set PS_XCPU when the CPU time limit is exceeded. - Perform actual CPU time limit exceeded work in ast() when PS_XCPU is set. Requested by: many
# 1226f694	30-Sep-2002	Juli Mallett <jmallett@FreeBSD.org>	First half of implementation of ksiginfo, signal queues, and such. This gets signals operating based on a TailQ, and is good enough to run X11, GNOME, and do job control. There are some intricate parts which could be more refined to match the sigset_t versions, but those require further evaluation of directions in which our signal system can expand and contract to fit our needs. After this has been in the tree for a while, I will make in kernel API changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo more robustly, such that we can actually pass information with our (queued) signals to the userland. That will also result in using a struct ksiginfo pointer, rather than a signal number, in a lot of kern_sig.c, to refer to an individual pending signal queue member, but right now there is no defined behaviour for such. CODAFS is unfinished in this regard because the logic is unclear in some places. Sponsored by: New Gold Technology Reviewed by: bde, tjr, jake [an older version, logic similar]
# 253fdd5b	23-Sep-2002	Julian Elischer <julian@FreeBSD.org>	slightly clean up the thread_userret() and thread_consider_upcall() calls. also some slight changes for TDF_BOUND testing and small style changes Should ONLY affect KSE programs Submitted by: davidxu
# 1c39a774	22-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Spell proprly properly: failed to set signal flags proprly for ast() failed to set signal flags proprly for ast() failed to set signal flags proprly for ast() failed to set signal flags proprly for ast()
# aaa1c771	10-Jul-2002	Jonathan Mini <mini@FreeBSD.org>	Revert removal of cred_free_thread(): It is used to ensure that a thread's credentials are not improperly borrowed when the thread is not current in the kernel. Requested by: jhb, alfred
# ad22735e	10-Jul-2002	Julian Elischer <julian@FreeBSD.org>	Don't slow every syscall and trap by doing locks and stuff if the 'stop' bits are not set. This is a temporary thing.. I think this code probably needs to be rewritten anyhow.
# e602ba25	29-Jun-2002	Julian Elischer <julian@FreeBSD.org>	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 01ad8a53	24-Jun-2002	Jonathan Mini <mini@FreeBSD.org>	Remove unused diagnostic function cread_free_thread(). Approved by: alfred
# d0c149fc	06-Jun-2002	John Baldwin <jhb@FreeBSD.org>	We no longer need to acqure Giant in ast() for ktrpsig() in postsig() now that ktrace no longer needs Giant.
# 628855e7	29-May-2002	Julian Elischer <julian@FreeBSD.org>	CURSIG() is not a macro so rename it cursig(). Obtained from: KSE tree
# 79065dba	04-Apr-2002	Bruce Evans <bde@FreeBSD.org>	Moved signal handling and rescheduling from userret() to ast() so that they aren't in the usual path of execution for syscalls and traps. The main complication for this is that we have to set flags to control ast() everywhere that changes the signal mask. Avoid locking in userret() in most of the remaining cases. Submitted by: luoqi (first part only, long ago, reorganized by me) Reminded by: dillon
# b454c6dd	29-Mar-2002	Jake Burkholder <jake@FreeBSD.org>	Style fixes purposefully left out of last commit. I checked the kse tree and didn't see any changes that this conflicts with.
# d0ce9a7e	29-Mar-2002	Jake Burkholder <jake@FreeBSD.org>	Remove abuse of intr_disable/restore in MI code by moving the loop in ast() back into the calling MD code. The MD code must ensure no races between checking the astpening flag and returning to usermode. Submitted by: peter (ia64 bits) Tested on: alpha (peter, jeff), i386, ia64 (peter), sparc64
# cb9a238a	20-Mar-2002	Warner Losh <imp@FreeBSD.org>	Remove last two abuses of cpu_critical_{enter,exit} in the MI code. Reviewed by: jake, jhb, rwatson
# 01c04d2d	20-Mar-2002	John Baldwin <jhb@FreeBSD.org>	Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined. Instead of caching the ucred reference, just go ahead and eat the decerement and increment of the refcount. Now that Giant is pushed down into crfree(), we no longer have to get Giant in the common case. In the case when we are actually free'ing the ucred, we would normally free it on the next kernel entry, so the cost there is not new, just in a different place. This also removse td_cache_ucred from struct thread. This is still only done #ifdef DIAGNOSTIC. [ missed this file in the previous commit ] Tested on: i386, alpha
# 39dda4e3	22-Feb-2002	Jake Burkholder <jake@FreeBSD.org>	Make this compile. Pointy hat to: julian
# 77c40664	22-Feb-2002	Julian Elischer <julian@FreeBSD.org>	Add some DIAGNOSTIC code. While in userland, keep the thread's ucred reference in a shadow field so that the usual place to store it is NULL. If DIAGNOSTIC is not set, the thread ucred is kept valid until the next kernel entry, at which time it is checked against the process cred and possibly corrected. Produces a BIG speedup in kernels with INVARIANTS set. (A previous commit corrected it for the non INVARIANTS case already) Reviewed by: dillon@freebsd.org
# 2eb927e2	16-Feb-2002	Julian Elischer <julian@FreeBSD.org>	If the credential on an incoming thread is correct, don't bother reaquiring it. In the same vein, don't bother dropping the thread cred when goinf ot userland. We are guaranteed to nned it when we come back, (which we are guaranteed to do). Reviewed by: jhb@freebsd.org, bde@freebsd.org (slightly different version)
# 2c100766	11-Feb-2002	Julian Elischer <julian@FreeBSD.org>	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# e744f309	17-Jan-2002	Bruce Evans <bde@FreeBSD.org>	Changed the type of pcb_flags from u_char to u_int and adjusted things. This removes the only atomic operation on a char type in the entire kernel.
# c86b6ff5	05-Jan-2002	John Baldwin <jhb@FreeBSD.org>	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
# 9d234f99	04-Jan-2002	John Baldwin <jhb@FreeBSD.org>	Axe a stale comment. Holding sched_lock across both setrunqueue() and mi_switch() is sufficient.
# 48fd1f38	18-Dec-2001	John Baldwin <jhb@FreeBSD.org>	- Change all callers of addupc_task() to check PS_PROFIL explicitly and remove the check from addupc_task(). It would need sched_lock while testing the flag anyways. - Always read sticks while holding sched_lock using a temporary variable where needed. - Always init prticks to 0 in ast() to quiet a warning.
# 7e1f6dfe	17-Dec-2001	John Baldwin <jhb@FreeBSD.org>	Modify the critical section API as follows: - The MD functions critical_enter/exit are renamed to start with a cpu_ prefix. - MI wrapper functions critical_enter/exit maintain a per-thread nesting count and a per-thread critical section saved state set when entering a critical section while at nesting level 0 and restored when exiting to nesting level 0. This moves the saved state out of spin mutexes so that interlocking spin mutexes works properly. - Most low-level MD code that used critical_enter/exit now use cpu_critical_enter/exit. MI code such as device drivers and spin mutexes use the MI wrappers. Note that since the MI wrappers store the state in the current thread, they do not have any return values or arguments. - mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is assigned to curthread->td_savecrit during fork_exit(). Tested on: i386, alpha
# 8e2e767b	26-Oct-2001	John Baldwin <jhb@FreeBSD.org>	Add a per-thread ucred reference for syscalls and synchronous traps from userland. The per thread ucred reference is immutable and thus needs no locks to be read. However, until all the proc locking associated with writes to p_ucred are completed, it is still not safe to use the per-thread reference. Tested on: x86 (SMP), alpha, sparc64
# 278da511	21-Sep-2001	John Baldwin <jhb@FreeBSD.org>	Remove a bogus comment. "atomic" doesn't mean that the operation is done as a physical atomic operation. That would require the code to use the atomic API, which it does not. Instead, the operation is made psuedo atomic (hence the quotes) by use of the lock to protect clearing all of the flags in question.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 356861db	30-Aug-2001	Matthew Dillon <dillon@FreeBSD.org>	Remove the MPSAFE keyword from the parser for syscalls.master. Instead introduce the [M] prefix to existing keywords. e.g. MSTD is the MP SAFE version of STD. This is prepatory for a massive Giant lock pushdown. The old MPSAFE keyword made syscalls.master too messy. Begin comments MP-Safe procedures with the comment: /* * MPSAFE / This comments means that the procedure may be called without Giant held (The procedure itself may still need to obtain Giant temporarily to do its thing). sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE sv_transtrap() is now MP SAFE and assumed to be MP SAFE ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown) trapsignal() is now MP SAFE (Giant Pushdown) Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant) test in syscall[2]() in /*/trap.c now do not. Instead they explicitly unlock Giant if they previously obtained it, and then assert that it is no longer held to catch broken system calls. Rebuild syscall tables.
# 688ebe12	10-Aug-2001	John Baldwin <jhb@FreeBSD.org>	- Close races with signals and other AST's being triggered while we are in the process of exiting the kernel. The ast() function now loops as long as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with preemption disabled so that any further AST's that arrive via an interrupt will be delayed until the low-level MD code returns to user mode. - Use u_int's to store the tick counts for profiling purposes so that we do not need sched_lock just to read p_sticks. This also closes a problem where the call to addupc_task() could screw up the arithmetic due to non-atomic reads of p_sticks. - Axe need_proftick(), aston(), astoff(), astpending(), need_resched(), clear_resched(), and resched_wanted() in favor of direct bit operations on p_sflag. - Fix up locking with sched_lock some. In addupc_intr(), use sched_lock to ensure pr_addr and pr_ticks are updated atomically with setting PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing PS_OWEUPC. We also do not grab the lock just to test a flag. - Simplify the handling of Giant in ast() slightly. Reviewed by: bde (mostly)
# 085be199	04-Jul-2001	Matthew Dillon <dillon@FreeBSD.org>	postsig() currently requires Giant to be held. Giant is held properly at the first postsig() call, but not always held at the second place, resulting in an occassional panic.
# 64acb05b	02-Jul-2001	John Baldwin <jhb@FreeBSD.org>	Grab Giant around postsig() since sendsig() can call into the vm to grow the stack and we already needed Giant for KTRACE.
# 7aa7260e	29-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.
# 6be523bc	29-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Add a new MI pointer to the process' trapframe p_frame instead of using various differently named pointers buried under p_md. Reviewed by: jake (in principle)
# 92809bc0	28-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Grab Giant around trap_pfault() for now.
# 06c836bb	22-Jun-2001	John Baldwin <jhb@FreeBSD.org>	- Grab the proc lock around CURSIG and postsig(). Don't release the proc lock until after grabbing the sched_lock to avoid CURSIG racing with psignal. - Don't grab Giant for addupc_task() as it isn't needed. Reported by: tegge (signal race), bde (addupc_task a while back)
# 262c9f8a	05-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Don't hold sched_lock across addupc_task(). Reported by: David Taylor <davidt@yadt.co.uk> Submitted by: bde
# 0dfefe68	23-May-2001	John Baldwin <jhb@FreeBSD.org>	Don't acquire Giant just to call trap_fatal(), we are about to panic anyway so we'd rather see the printf's then block if the system is hosed.
# 1c1771cb	22-May-2001	Bruce Evans <bde@FreeBSD.org>	Convert npx interrupts into traps instead of vice versa. This is much simpler for npx exceptions that start as traps (no assembly required...) and works better for npx exceptions that start as interrupts (there is no longer a problem for nested interrupts). Submitted by: original (pre-SMPng) version by luoqi
# 23955314	18-May-2001	Alfred Perlstein <alfred@FreeBSD.org>	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
# 8bd57f8f	15-May-2001	John Baldwin <jhb@FreeBSD.org>	Remove unneeded includes of sys/ipl.h and machine/ipl.h.
# 1efb92b7	11-May-2001	John Baldwin <jhb@FreeBSD.org>	Simplify the vm fault trap handling code a bit by using if-else instead of duplicating code in the then case and then using a goto to jump around the else case.
# 6caa8a15	27-Apr-2001	John Baldwin <jhb@FreeBSD.org>	Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP. - It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the _process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the _process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the _process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the _process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in. Reviewed by: jake, peter Looked over by: eivind
# f227364a	06-Mar-2001	John Baldwin <jhb@FreeBSD.org>	- Release Giant a bit earlier on syscall exit. - Don't try to grab Giant before postsig() in userret() as it is no longer needed. - Don't grab Giant before psignal() in ast() but get the proc lock instead.
# 631d7bf3	24-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	- Rename the lcall system call handler from Xsyscall to Xlcall_syscall to be more like Xint0x80_syscall and less like c function syscall(). - Reduce code duplication between the int0x80 and lcall handlers by shuffling the elfags into the right place, saving the sizeof the instruction in tf_err and jumping into the common int0x80 code. Reviewed by: peter
# feb43c5f	22-Feb-2001	John Baldwin <jhb@FreeBSD.org>	The p_md.md_regs member of proc is used in signal handling to reference the the original trapframe of the syscall, trap, or interrupt that entered the kernel. Before SMPng, ast's were handled via a psuedo trap at the end of doerti. With the SMPng commit, ast's were broken out into a separate ast() function that was called from doreti to match the behavior of other architectures. Unfortunately, when this was done, the p_md.md_regs member of curproc was not updateda in ast(), thus when signals are handled by userret() after an interrupt that returns to userland, we end up using a stale trapframe that will result in the registers from the old trapframe overwriting the real trapframe and smashing all the registers right before we return to usermode. The saved %cs:%eip from where we were in usermode are saved in the trapframe for example.
# f308e0d7	22-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Change ast() to take a pointer to a trapframe like other architectures. - Don't use an atomic operation to update cnt.v_soft in ast(). This is the only place the variable is written to, and sched_lock is always held when it is written, so it is already protected and the mutex release of sched_lock asserts a memory barrier that ensures the value will be updated in a timely fashion.
# 26f9f5c7	22-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Use TRAPF_PC() on the alpha to acess the PC in the trap frame. - Don't hold sched_lock around addupc_task() as this apparently breaks profiling badly due to sched_lock being held across copyin(). Reported by: bde (2)
# 5813dc03	19-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Don't call clear_resched() in userret(), instead, clear the resched flag in mi_switch() just before calling cpu_switch() so that the first switch after a resched request will satisfy the request. - While I'm at it, move a few things into mi_switch() and out of cpu_switch(), specifically set the p_oncpu and p_lastcpu members of proc in mi_switch(), and handle the sched_lock state change across a context switch in mi_switch(). - Since cpu_switch() no longer handles the sched_lock state change, we have to setup an initial state for sched_lock in fork_exit() before we release it.
# 0ad74739	19-Feb-2001	Bruce Evans <bde@FreeBSD.org>	Removed all traces of T_ASTFLT (except for gaps where it was). It became unused except in dead code when ast() was split off from trap().
# 86654610	18-Feb-2001	Bruce Evans <bde@FreeBSD.org>	Changed the aston() family to operate on a specified process instead of always on curproc. This is needed to implement signal delivery properly (see a future log message for kern_sig.c). Debogotified the definition of aston(). aston() was defined in terms of signotify() (perhaps because only the latter already operated on a specified process), but aston() is the primitive. Similar changes are needed in the ia64 versions of cpu.h and trap.c. I didn't make them because the ia64 is missing the prerequisite changes to make astpending and need_resched per-process and those changes are too large to make without testing.
# d5a08a60	11-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# 3cbe75a4	10-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Clear the reschedule flag after finding it set in userret(). This used to be in cpu_switch(), but I don't see any difference between doing it here.
# 142ba5f3	09-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Make astpending and need_resched process attributes rather than CPU attributes. This is needed for AST's to be properly posted in a preemptive kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and PS_NEEDRESCHED. They are still accesssed by their old macros: aston(), astoff(), etc. For completeness, an astpending() macro has been added to check for a pending AST, and clear_resched() has been added to clear need_resched(). - Rename syscall2() on the x86 back to syscall() to be consistent with other architectures.
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 297c46b6	07-Feb-2001	John Baldwin <jhb@FreeBSD.org>	Don't enable interrupts for a kernel breakpoint or trace trap. Otherwise, this negates the explicit disabling of interrupts when entering the debugger in Debugger().
# 1a6e52d0	06-Feb-2001	Jeroen Ruigrok van der Werven <asmodai@FreeBSD.org>	Fix typo: seperate -> separate. Seperate does not exist in the english language.
# 03927d3c	29-Jan-2001	Peter Wemm <peter@FreeBSD.org>	Send "#if NISA > 0" to the bit-bucket and replace it with an option. These were compile-time "is the isa code present?" tests and not 'how many isa busses' tests.
# 28df158b	25-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Push Giant down into the trap handlers that need it, instead of acquiring it unconditionally. Reviewed by: jhb
# 625c76db	24-Jan-2001	John Baldwin <jhb@FreeBSD.org>	- Kill the have_giant parameter to userret() along with all instances of that name as a variable. Use mtx_owned(&Giant) where appropriate instead. - Proc locking. - P_FOO -> PS_FOO. - Update comments about enable interrupts during trap and why this may be bad if we trap while holding a spin mutex. - Don't bother resetting p to curproc in syscall() in case we are the child returning from fork. The child hasn't returned from fork through syscall in a while. - Remove fork_return() as it has been superseded by the MI version.
# a448b62a	21-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Make intr_nesting_level per-process, rather than per-cpu. Setup interrupt threads to run with it always >= 1, so that malloc can detect M_WAITOK from "interrupt" context. This is also necessary in order to context switch from sched_ithd() directly. Reviewed By: peter
# 558226ea	19-Jan-2001	Peter Wemm <peter@FreeBSD.org>	Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h
# ef73ae4b	09-Jan-2001	Jake Burkholder <jake@FreeBSD.org>	Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables other then curproc.
# 05f9877c	13-Dec-2000	John Baldwin <jhb@FreeBSD.org>	If we fail to emulate a vm86 trap in kernel mode, then we use vm86_trap() to return to the calling program directly. vm86_trap() doesn't return, thus it was never returning to trap() to release Giant. Thus, release Giant before calling vm86_trap().
# 92cf772d	11-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	- Add code to detect if a system call returns with locks other than Giant held and panic if so (conditional on witness). - Change witness_list to return the number of locks held so this is easier. - Add kern/syscalls.c to the kernel build if witness is defined so that the panic message can contain the name of the offending system call. - Add assertions that Giant and sched_lock are not held when returning from a system call, which were missing for alpha and ia64.
# 7da6f977	17-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	- Split the run queue and sleep queue linkage, so that a process may block on a mutex while on the sleep queue without corrupting it. - Move dropping of Giant to after the acquire of sched_lock. Tested by: John Hay <jhay@icomtek.csir.co.za> jhb
# 20cdcc5b	15-Nov-2000	John Baldwin <jhb@FreeBSD.org>	Don't release and acquire Giant in mi_switch(). Instead, release and acquire Giant as needed in functions that call mi_switch(). The releases need to be done outside of the sched_lock to avoid potential deadlocks from trying to acquire Giant while interrupts are disabled. Submitted by: witness
# 35e0e5b3	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# 6c567274	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	- Change fast interrupts on x86 to push a full interrupt frame and to return through doreti to handle ast's. This is necessary for the clock interrupts to work properly. - Change the clock interrupts on the x86 to be fast instead of threaded. This is needed because both hardclock() and statclock() need to run in the context of the current process, not in a separate thread context. - Kill the prevproc hack as it is no longer needed. - We really need Giant when we call psignal(), but we don't want to block during the clock interrupt. Instead, use two p_flag's in the proc struct to mark the current process as having a pending SIGVTALRM or a SIGPROF and let them be delivered during ast() when hardclock() has finished running. - Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was broken on the x86 if it was turned on since cpl is gone. It's only use was to bogusly run softclock() directly during hardclock() rather than scheduling an SWI. - Remove the COM_LOCK simplelock and replace it with a clock_lock spin mutex. Since the spin mutex already handles disabling/restoring interrupts appropriately, this also lets us axe all the *_intr() fu. - Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use temporary fast interrupts for the APIC trial. - Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending signals in hardclock() that are to be delivered in ast(). Submitted by: jakeb (making statclock safe in a fast interrupt) Submitted by: cp (concept of delaying signals until ast())
# a91b7dc1	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Various whitespace cleanups after the SMPng commit, which jumbled things around a bit in the trap handling code.
# 0e2aab12	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Don't treat a kernel stack fault the same as a general protect fault or a segment not present fault in the non-vm86 case.
# 9c15b3c1	12-Sep-2000	Bruce Evans <bde@FreeBSD.org>	Fixed hang on booting with -d. mtx_enter() was called on an uninitialized lock. The quick fix in trap.c was not quite the version tested and had no effect; back it out.
# bbbb2579	12-Sep-2000	Bruce Evans <bde@FreeBSD.org>	Quick fix for hang on booting with -d. mtx_enter() was called before curproc was initialized. curproc == NULL was interpreted as matching the process holding Giant... Just skip mtx_enter() and mtx_exit() in trap() if (curproc == NULL && cold) (&& cold for safety).
# 0384fff8	06-Sep-2000	Jason Evans <jasone@FreeBSD.org>	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
# c206a860	06-Aug-2000	Paul Saab <ps@FreeBSD.org>	Change the behavior of isa_nmi to log an error message instead of panicing and return a status so that we can decide whether to drop into DDB or panic. If the status from isa_nmi is true, panic the kernel based on machdep.panic_on_nmi, otherwise if DDB is enabled, drop to DDB based on machdep.ddb_on_nmi. Reviewed by: peter, phk
# 3fb50adb	31-Jul-2000	Luoqi Chen <luoqi@FreeBSD.org>	Handle write page faults (both write only or read-modify-write) as MI vm write-only faults. This would allow write-only mmapped regions to function correctly.
# 88f675ba	14-Jul-2000	Paul Saab <ps@FreeBSD.org>	Change the way NMI's are handled. Before, if DDB was enabled and a NMI occured, you could type continue in DDB and the kernel would not attempt to detect what type of NMI was recieved. Now we check for the type of NMI first and then go to DDB if it is enabled. This will solve the problem with having DDB enabled and getting an NMI due to some possibly bad error and being able to continue the operation of the kernel when you really want to panic and know what happened. Submitted by: jhb
# c6d3f3bf	30-Jun-2000	Brian S. Dean <bsd@FreeBSD.org>	Fix my own style bugs (use of spaces instead of tabs for indentation). This is a style-only change.
# 36e9f877	28-Mar-2000	Matthew Dillon <dillon@FreeBSD.org>	Commit major SMP cleanups and move the BGL (big giant lock) in the syscall path inward. A system call may select whether it needs the MP lock or not (the default being that it does need it). A great deal of conditional SMP code for various deadended experiments has been removed. 'cil' and 'cml' have been removed entirely, and the locking around the cpl has been removed. The conditional separately-locked fast-interrupt code has been removed, meaning that interrupts must hold the CPL now (but they pretty much had to anyway). Another reason for doing this is that the original separate-lock for interrupts just doesn't apply to the interrupt thread mechanism being contemplated. Modifications to the cpl may now ONLY occur while holding the MP lock. For example, if an otherwise MP safe syscall needs to mess with the cpl, it must hold the MP lock for the duration and must (as usual) save/restore the cpl in a nested fashion. This is precursor work for the real meat coming later: avoiding having to hold the MP lock for common syscalls and I/O's and interrupt threads. It is expected that the spl mechanisms and new interrupt threading mechanisms will be able to run in tandem, allowing a slow piecemeal transition to occur. This patch should result in a moderate performance improvement due to the considerable amount of code that has been removed from the critical path, especially the simplification of the spl*() calls. The real performance gains will come later. Approved by: jkh Reviewed by: current, bde (exception.s) Some work taken from: luoqi's patch
# 6d9a8d3e	02-Mar-2000	Peter Dufault <dufault@FreeBSD.org>	I applied the wrong patch set. Back out anything associated with the known bogus currtpriority. This undoes the previous changes to sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h Now we have the patch set approved by bde. Approved by: bde
# 383774c4	02-Mar-2000	Peter Dufault <dufault@FreeBSD.org>	Patches that eliminate extra context switches in FIFO case. Fixes p1003_1b regression test in the simple case of no RR and FIFO processes competing. Reviewed by: jkh, bde
# de8050f9	20-Feb-2000	Brian S. Dean <bsd@FreeBSD.org>	Don't forget to reset the hardware debug registers when a process that was using them exits. Don't allow a user process to cause the kernel to take a TRCTRAP on a user space address. Reviewed by: jlemon, sef Approved by: jkh
# 35e61cbd	11-Jan-2000	Kazutaka YOKOTA <yokota@FreeBSD.org>	Add a new mechanism, cndbctl(), to tell the console driver that ddb is entered. Don't refer to `in_Debugger' to see if we are in the debugger. (The variable used to be static in Debugger() and wasn't updated if ddb is entered via traps and panic anyway.) - Don't refer to `in_Debugger'. - Add `db_active' to i386/i386/db_interface.d (as in alpha/alpha/db_interface.c). - Remove cnpollc() stub from ddb/db_input.c. - Add the dbctl function to syscons, pcvt, and sio. (The function for pcvt and sio is noop at the moment.) Jointly developed by: bde and me (The final version was tweaked by me and not reviewed by bde. Thus, if there is any error in this commit, that is entirely of mine, not his.) Some changes were obtained from: NetBSD
# b5616833	08-Nov-1999	Alan Cox <alc@FreeBSD.org>	Passing "0" or "FALSE" as the fourth argument to vm_fault is wrong. It should be "VM_FAULT_NORMAL".
# 923502ff	29-Oct-1999	Poul-Henning Kamp <phk@FreeBSD.org>	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# a7674320	25-Jul-1999	Martin Cracauer <cracauer@FreeBSD.org>	On FPU exceptions, pass a useful error code (one of the FPE_... macros) to the signal handler, for old-style BSD signal handlers as the second (int) argument, for SA_SIGINFO signal handlers as siginfo_t->si_code. This is source-compatible with Solaris, except that we have no <siginfo.h> (which isn't even mentioned in POSIX 1003.1b). An rather complete example program is at http://www3.cons.org/cracauer/freebsd-signal.c This will be added to the regression tests in src/. This commit also adds code to disable the (hardware) FPU from userconfig, so that you can use a software FP emulator on a machine that has hardware floating point. See LINT.
# 50045fbc	18-Jun-1999	Bruce Evans <bde@FreeBSD.org>	Changed the global `idt' from an array to a pointer so that npx.c automatically hacks on the active copy of the IDT if f00f_hack() has changed it. This also allows simplifications in setidt(). This fixes breakage of FP exception handling by rev.1.55 of sys/kernel.h. FP exceptions were sent to npx.c's probe handlers because npx.c "restored" the old handlers to the wrong copy of the IDT. The SYSINIT for f00f_hack() was purposely run quite late to avoid problems like this, but it is bogusly associated with the SYSINIT for proc0 so it was moved with the latter. Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>
# eb9d435a	01-Jun-1999	Jonathan Lemon <jlemon@FreeBSD.org>	Unifdef VM86. Reviewed by: silence on on -current
# dfd5dee1	06-May-1999	Peter Wemm <peter@FreeBSD.org>	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
# 5206bca1	27-Apr-1999	Luoqi Chen <luoqi@FreeBSD.org>	Enable vmspace sharing on SMP. Major changes are, - %fs register is added to trapframe and saved/restored upon kernel entry/exit. - Per-cpu pages are no longer mapped at the same virtual address. - Each cpu now has a separate gdt selector table. A new segment selector is added to point to per-cpu pages, per-cpu global variables are now accessed through this new selector (%fs). The selectors in gdt table are rearranged for cache line optimization. - fask_vfork is now on as default for both UP and SMP. - Some aio code cleanup. Reviewed by: Alan Cox <alc@cs.rice.edu> John Dyson <dyson@iquest.net> Julian Elischer <julian@whistel.com> Bruce Evans <bde@zeta.org.au> David Greenman <dg@root.com>
# db42d908	19-Apr-1999	Peter Wemm <peter@FreeBSD.org>	unifdef -DVM_STACK - it's been on for a while for x86 and was checked and appeared to be working for the Alpha some time ago.
# a2210fe1	09-Mar-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Make TIMER_FREQ a normal, undocumented option. Raise confusion to a higher level with example in LINT. Clarify comment about PPS_SYNC. Ignore for now that it doesn't work in FLL mode, it will in a few days.
# 2267af78	06-Jan-1999	Julian Elischer <julian@FreeBSD.org>	Add (but don't activate) code for a special VM option to make downward growing stacks more general. Add (but don't activate) code to use the new stack facility when running threads, (specifically the linux threads support). This allows people to use both linux compiled linuxthreads, and also the native FreeBSD linux-threads port. The code is conditional on VM_STACK. Not using this will produce the old heavily tested system. Submitted by: Richard Seaman <dick@tar.com>
# 9959b1a8	28-Dec-1998	Mike Smith <msmith@FreeBSD.org>	Improved DDB_UNATTENDED behaviour. From the submitter: There's something that's been bugging me for a while, so I decided to fix it. FreeBSD now will DTRT WRT DDB and DDB_UNATTENDED (!debugger_on_panic), at least in my opinion. The behavior change is such that: 1. Nothing changes when debugger_on_panic != 0. 2. When DDB_UNATTENDED (!debugger_on_panic), if a panic occurs, the machine will reboot. Also, if a trap occurs, the machine will panic and reboot, unlike how it broke to DDB before. HOWEVER, a trap inside DDB will not cause a panic, allowing full use of DDB without having to worry about the machine being stuck at a DDB prompt if something goes wrong during the day. Patches for this behavior follow my signature, and it would be a boon to anyone (like me) who uses DDB_UNATTENDED, but actually wants the machine to panic on a trap (otherwise, what's the use, if the machine causes a fatal trap rather than a true panic, of debugger_on_panic?). The changes cause no adverse behavior, but do involve two symbols becoming global Submitted by: Brian Feldman <green@unixhelp.org>
# 4f2129fa	16-Dec-1998	Bruce Evans <bde@FreeBSD.org>	Removed bogus casts of USRSTACK and/or the other operand in binary expressions involving USRSTACK.
# 2326715f	05-Dec-1998	Archie Cobbs <archie@FreeBSD.org>	Avoid compiler warning (printf arg type mismatch) when compiling #ifdef DEBUG
# 9ad861ed	02-Dec-1998	KATO Takenori <kato@FreeBSD.org>	- For some old Cyrix CPUs, %cr2 is clobbered by interrupts. This problem is worked around by using an interrupt gate for the page fault handler. This code was originally made for NetBSD/pc98 by Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp> and has already been in PC98 tree. Because of this bug, trap_fatal cannot show correct page fault address if %cr2 is obtained in this function. Therefore, trap_fatal uses the value from trap() function. - The trap handler always enables interruption when buggy application or kernel code has disabled interrupts and then trapped. This code was prepared by Bruce Evans <bde@FreeBSD.org>. Submitted by: Bruce Evans <bde@FreeBSD.org> Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp>
# 1fcee469	23-Aug-1998	Bruce Evans <bde@FreeBSD.org>	Fixed printf format errors.
# 288078be	28-Apr-1998	Eivind Eklund <eivind@FreeBSD.org>	Translate T_PROTFLT to SIGSEGV instead of SIGBUS when running under Linux emulation. This make Allegro Common Lisp 4.3 work under FreeBSD! Submitted by: Fred Gilham <gilham@csl.sri.com> Commented on by: bde, dg, msmith, tg Hoping he got everything right: eivind
# c1087c13	15-Apr-1998	Bruce Evans <bde@FreeBSD.org>	Support compiling with `gcc -ansi'.
# 227ee8a1	30-Mar-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde
# 08637435	28-Mar-1998	Bruce Evans <bde@FreeBSD.org>	Moved some #includes from <sys/param.h> nearer to where they are actually used.
# 640c4313	23-Mar-1998	Jonathan Lemon <jlemon@FreeBSD.org>	Add the ability to make real-mode BIOS calls from the kernel. Currently, everything is contained inside #ifdef VM86, so this option must be present in the config file to use this functionality. Thanks to Tor Egge, these changes should work on SMP machines. However, it may not be throughly SMP-safe. Currently, the only BIOS calls made are memory-sizing routines at bootup, these replace reading the RTC values.
# 0b08f5f7	05-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Back out DIAGNOSTIC changes.
# 47cfdb16	04-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Turn DIAGNOSTIC into a new-style option.
# e0d781f3	30-Jan-1998	Eivind Eklund <eivind@FreeBSD.org>	Make POWERFAIL_NMI, PPS_SYNC and NATM new style options. This also fixes a couple of defunct options; submitted by bde.
# 2a024a2b	05-Dec-1997	Sean Eric Fagan <sef@FreeBSD.org>	Changes to allow event-based process monitoring and control.
# 4d9deedb	04-Dec-1997	John-Mark Gurney <jmg@FreeBSD.org>	document and make the NO_F00F_HACK a proper option... also, sort some option includes while I'm here.. Forgotten by: sef
# e41b6f2d	04-Dec-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	After consultation with David, change #ifndef NO_F00F_HACK to #if defined(I586_CPU) && !defined(NO_F00F_HACK)
# c4fbf277	02-Dec-1997	Sean Eric Fagan <sef@FreeBSD.org>	Work around for the Intel Pentium F00F bug; this is Intel's recommended workaround. Note that this currently eats up two pages extra in the system; this could be alleviated by aligning idt correctly, and then only dealing with that (as opposed to the current method of allocated two pages and copying the IDT table to that, and then setting that to be the IDT table).
# 21e52415	24-Nov-1997	Bruce Evans <bde@FreeBSD.org>	Fixed some #include messes. Hid the check of the user %cs in syscall() under `#ifdef DIAGNOSTIC'.
# cb226aaa	06-Nov-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
# b67dffda	09-Oct-1997	Peter Wemm <peter@FreeBSD.org>	Compensate for pcb.h tweaks. (Bruce pointed out the nesting)
# 98823b23	10-Oct-1997	Peter Wemm <peter@FreeBSD.org>	Convert the VM86 option from a global option to an option only depended on by the files that use it. Changing the VM86 option now only causes a recompile of a dozen files or so rather than the entire kernel.
# 91942903	21-Sep-1997	Justin T. Gibbs <gibbs@FreeBSD.org>	autoconf.c: Add cpu_rootconf and cpu_dumpconf so that configuring these two devices can be better controlled by the MI configuration code. machdep.c: MD initialization code for the new callout interface. trap.c: Add support for printing out whether cam interrupts are masked during a panic.
# 279a6932	05-Sep-1997	Peter Wemm <peter@FreeBSD.org>	Cosmetic adjustment for the trap/double fault/panic cpu id listing. It now prints the apic id in hex rather than decimal.
# 5f073933	28-Aug-1997	Jonathan Lemon <jlemon@FreeBSD.org>	Remove the vm86 support as an LKM, and link it directly into the kernel if 'options "VM86"' is in the config file. The LKM was really for development, and has probably outlived its usefulness.
# 9a3b3e8b	26-Aug-1997	Peter Wemm <peter@FreeBSD.org>	Clean up the SMP AP bootstrap and eliminate the wretched idle procs. - We now have enough per-cpu idle context, the real idle loop has been revived (cpu's halt now with nothing to do). - Some preliminary support for running some operations outside the global lock (eg: zeroing "free but not yet zeroed pages") is present but appears to cause problems. Off by default. - the smp_active sysctl now behaves differently. It's merely a 'true/false' option. Setting smp_active to zero causes the AP's to halt in the idle loop and stop scheduling processes. - bootstrap is a lot safer. Instead of sharing a statically compiled in stack a number of times (which has caused lots of problems) and then abandoning it, we use the idle context to boot the AP's directly. This should help >2 cpu support since the bootlock stuff was in doubt. - print physical apic id in traps.. helps identify private pages getting out of sync. (You don't want to know how much hair I tore out with this!) More cleanup to follow, this is more of a checkpoint than a 'finished' thing.
# 40d50994	21-Aug-1997	Philippe Charnier <charnier@FreeBSD.org>	Revert my previous commit about using CS_SECURE macro. Requested by: Bruce.
# 7b185ef8	19-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Preperation for moving cpl into critical region access. Several new fine-grained locks. New FAST_INTR() methods: - separate simplelock for FAST_INTR, no more giant lock. - FAST_INTR()s no longer checks ipending on way out of ISR. sio made MP-safe (I hope).
# 15f35491	18-Aug-1997	Philippe Charnier <charnier@FreeBSD.org>	Use CS_SECURE macro. Reviewed by: John Dyson
# 0b6e0f74	12-Aug-1997	John Dyson <dyson@FreeBSD.org>	Back out a part of the disk scheduling "improvements" :-(. Let me know how the system works now!!!
# c0ecffb9	09-Aug-1997	John Dyson <dyson@FreeBSD.org>	Modify the scheduling policy to take into account disk I/O waits as chargeable CPU usage. This should mitigate the problem of processes doing disk I/O hogging the CPU. Various users have reported the problem, and test code shows that the problem should now be gone.
# 48a09cf2	08-Aug-1997	John Dyson <dyson@FreeBSD.org>	VM86 kernel support. Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>, Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>, and probably alot of others. Submitted by: Jnathan Lemon <jlemon@americantv.com>
# e31521c3	20-Jul-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# b3196e4b	22-Jun-1997	Peter Wemm <peter@FreeBSD.org>	Preliminary support for per-cpu data pages. This eliminates a lot of #ifdef SMP type code. Things like _curproc reside in a data page that is unique on each cpu, eliminating the expensive macros like: #define curproc (SMPcurproc[cpunumber()]) There are some unresolved bootstrap and address space sharing issues at present, but Steve is waiting on this for other work. There is still some strictly temporary code present that isn't exactly pretty. This is part of a larger change that has run into some bumps, this part is standalone so it should be safe. The temporary code goes away when the full idle cpu support is finished. Reviewed by: fsmp, dyson
# 7b3c8424	06-Jun-1997	Bruce Evans <bde@FreeBSD.org>	Preserve %fs and %gs across context switches. This has a relatively low cost since it is only done in cpu_switch(), not for every exception. The extra state is kept in the pcb, and handled much like the npx state, with similar deficiencies (the state is not preserved across signal handlers, and error handling loses state).
# 68352337	02-Jun-1997	Doug Rabson <dfr@FreeBSD.org>	Move interrupt handling code from isa.c to a new file. This should make isa.c (slightly) more portable and will make my life developing the really portable version much easier. Reviewed by: peter, fsmp
# 5400ed3b	31-May-1997	Peter Wemm <peter@FreeBSD.org>	Include file updates.. <machine/spl.h> -> <machine/ipl.h>, add <machine/ipl.h> to those files that were depending on getting SWI_* implicitly via <machine/cpufunc.h>
# ae924961	28-May-1997	Peter Wemm <peter@FreeBSD.org>	remove opt_smp.h and fix the reason it was needed.
# 835834c0	07-May-1997	Peter Wemm <peter@FreeBSD.org>	md_regs is now a struct trapframe *
# b332d9a6	04-May-1997	John Dyson <dyson@FreeBSD.org>	Make sure that *fork() always returns with %edx == 1 in the child. This was sometimes not happening correctly during my threads code work.
# 477a642c	26-Apr-1997	Peter Wemm <peter@FreeBSD.org>	Man the liferafts! Here comes the long awaited SMP -> -current merge! There are various options documented in i386/conf/LINT, there is more to come over the next few days. The kernel should run pretty much "as before" without the options to activate SMP mode. There are a handful of known "loose ends" that need to be fixed, but have been put off since the SMP kernel is in a moderately good condition at the moment. This commit is the result of the tinkering and testing over the last 14 months by many people. A special thanks to Steve Passe for implementing the APIC code!
# 58611a61	14-Apr-1997	Bruce Evans <bde@FreeBSD.org>	Fixed printing of registers in dbflalt_handler(). The registers were always in a tss; that tss just changed from the one in the pcb to common_tss (who knows where it was when there was no curpcb?). Not using the pcb also fixed the problem that there is no pcb in idle(), so we now always get useful register values.
# a2a1c95c	07-Apr-1997	Peter Wemm <peter@FreeBSD.org>	The biggie: Get rid of the UPAGES from the top of the per-process address space. (!) Have each process use the kernel stack and pcb in the kvm space. Since the stacks are at a different address, we cannot copy the stack at fork() and allow the child to return up through the function call tree to return to user mode - create a new execution context and have the new process begin executing from cpu_switch() and go to user mode directly. In theory this should speed up fork a bit. Context switch the tss_esp0 pointer in the common tss. This is a lot simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer to each process's tss since the esp0 pointer is a 32 bit pointer, and the sd_base setting is split into three different bit sections at non-aligned boundaries and requires a lot of twiddling to reset. The 8K of memory at the top of the process space is now empty, and unmapped (and unmappable, it's higher than VM_MAXUSER_ADDRESS). Simplity the pmap code to manage process contexts, we no longer have to double map the UPAGES, this simplifies and should measuably speed up fork(). The following parts came from John Dyson: Set PG_G on the UPAGES that are now in kernel context, and invalidate them when swapping them out. Move the upages object (upobj) from the vmspace to the proc structure. Now that the UPAGES (pcb and kernel stack) are out of user space, make rfork(..RFMEM..) do what was intended by sharing the vmspace entirely via reference counting rather than simply inheriting the mappings.
# 271b264e	07-Apr-1997	Peter Wemm <peter@FreeBSD.org>	No longer use an i386tss as the basis of our pcb - it wasn't particularly convenient and makes life difficult for my next commit. We still need an i386tss to point to for the tss slot in the gdt, so we use a common tss shared between all processes. Note that this is going to break debugging until this series of commits is finished. core dumps will change again too. :-( we really need a more modern core dump format that doesn't depend on the pcb/upages. This change makes VM86 mode harder, but the following commits will remove a lot of constraints for the VM86 system, including the possibility of extending the pcb for an IO port map etc. Obtained from: bde
# a04c970a	05-Apr-1997	John Dyson <dyson@FreeBSD.org>	Fix the gdb executable modify problem. Thanks to the detective work by Alan Cox <alc@cs.rice.edu>, and his description of the problem. The bug was primarily in procfs_mem, but the mistake likely happened due to the lack of vm system support for the operation. I added better support for selective marking of page dirty flags so that vm_map_pageable(wiring) will not cause this problem again. The code in procfs_mem is now less bogus (but maybe still a little so.)
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 7e64cb7a	22-Jan-1997	John Dyson <dyson@FreeBSD.org>	Remove some dead code from trapwrite. Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 959c0278	18-Dec-1996	Bruce Evans <bde@FreeBSD.org>	Only handle copyin/out/etc faults when not in an interrupt handler. This makes unexpected faults (in an interrupt handler) more likely to crash properly. It could be done even better (more robustly and more efficiently) using lazy fault handling.
# f313170d	10-Sep-1996	Bruce Evans <bde@FreeBSD.org>	Updated #includes to 4.4Lite style.
# eaed8903	01-Sep-1996	David Greenman <dg@FreeBSD.org>	Change an splclock that needs to be an splhigh into an splhigh. Reviewed by: bde
# 11282a57	11-Aug-1996	David Greenman <dg@FreeBSD.org>	Add support for i686 machine check trap.
# f3460ead	12-Jul-1996	Bruce Evans <bde@FreeBSD.org>	Fixed cloned comments about npx traps to match context.
# 79df6d85	25-Jun-1996	Bruce Evans <bde@FreeBSD.org>	trap.c: Fixed profiling of system times. It was pre-4.4Lite and didn't support statclocks. System times were too small by a factor of 8. Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead of addupc(). Call addupc_task() directly instead of using the ADDUPC() macro. Removed vestigial support for PROFTIMER. switch.s: Removed addupc(). resourcevar.h: Removed ADDUPC() and declarations of addupc(). cpu.h: Updated a comment. i386's never were tahoe's, and the deferred profiling tick became (possibly) multiple ticks in 4.4Lite. Obtained from: mostly from NetBSD
# d7629dff	13-Jun-1996	Satoshi Asami <asami@FreeBSD.org>	A fast memory copy for Pentiums using floating point registers. It is called from copyin and copyout. The new routine is conditioned on I586_CPU and I586_FAST_BCOPY, so you need options "I586_FAST_BCOPY" (quotes essenstial) in your kernel config file. Also, if you have other kernel types configured in your kernel, an additional check to make sure it is running on a Pentium is inserted. (It is not clear why it doesn't help on P6s, it may be just that the Orion chipset doesn't prefetch as efficiently as Tritons and friends.) Bruce can now hack this away. :)
# c23670e2	11-Jun-1996	Gary Palmer <gpalmer@FreeBSD.org>	Clean up -Wunused warnings. Reviewed by: bde
# b18bfc3d	17-May-1996	John Dyson <dyson@FreeBSD.org>	This set of commits to the VM system does the following, and contain contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>, Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me: More usage of the TAILQ macros. Additional minor fix to queue.h. Performance enhancements to the pageout daemon. Addition of a wait in the case that the pageout daemon has to run immediately. Slightly modify the pageout algorithm. Significant revamp of the pmap/fork code: 1) PTE's and UPAGES's are NO LONGER in the process's map. 2) PTE's and UPAGES's reside in their own objects. 3) TOTAL elimination of recursive page table pagefaults. 4) The page directory now resides in the PTE object. 5) Implemented pmap_copy, thereby speeding up fork time. 6) Changed the pv entries so that the head is a pointer and not an entire entry. 7) Significant cleanup of pmap_protect, and pmap_remove. 8) Removed significant amounts of machine dependent fork code from vm_glue. Pushed much of that code into the machine dependent pmap module. 9) Support more completely the reuse of already zeroed pages (Page table pages and page directories) as being already zeroed. Performance and code cleanups in vm_map: 1) Improved and simplified allocation of map entries. 2) Improved vm_map_copy code. 3) Corrected some minor problems in the simplify code. Implemented splvm (combo of splbio and splimp.) The VM code now seldom uses splhigh. Improved the speed of and simplified kmem_malloc. Minor mod to vm_fault to avoid using pre-zeroed pages in the case of objects with backing objects along with the already existant condition of having a vnode. (If there is a backing object, there will likely be a COW... With a COW, it isn't necessary to start with a pre-zeroed page.) Minor reorg of source to perhaps improve locality of ref.
# 4e489ec4	27-Mar-1996	John Dyson <dyson@FreeBSD.org>	Remove a now unnecessary prototype from pmap.c. Also remove now unnecessary vm_fault's of page table pages in trap.c.
# ba00d77a	27-Mar-1996	Bruce Evans <bde@FreeBSD.org>	Print stack pointer and frame pointer in trap messages. Fixed "trace/trap" message. Reviewed by: davidg
# d66a5066	02-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Mega-commit for Linux emulator update.. This has been stress tested under netscape-2.0 for Linux running all the Java stuff. The scrollbars are now working, at least on my machine. (whew! :-) I'm uncomfortable with the size of this commit, but it's too inter-dependant to easily seperate out. The main changes: COMPAT_LINUX is GONE. Most of the code has been moved out of the i386 machine dependent section into the linux emulator itself. The int 0x80 syscall code was almost identical to the lcall 7,0 code and a minor tweak allows them to both be used with the same C code. All kernels can now just modload the lkm and it'll DTRT without having to rebuild the kernel first. Like IBCS2, you can statically compile it in with "options LINUX". A pile of new syscalls implemented, including getdents(), llseek(), readv(), writev(), msync(), personality(). The Linux-ELF libraries want to use some of these. linux_select() now obeys Linux semantics, ie: returns the time remaining of the timeout value rather than leaving it the original value. Quite a few bugs removed, including incorrect arguments being used in syscalls.. eg: mixups between passing the sigset as an int, vs passing it as a pointer and doing a copyin(), missing return values, unhandled cases, SIOC* ioctls, etc. The build for the code has changed. i386/conf/files now knows how to build linux_genassym and generate linux_assym.h on the fly. Supporting changes elsewhere in the kernel: The user-mode signal trampoline has moved from the U area to immediately below the top of the stack (below PS_STRINGS). This allows the different binary emulations to have their own signal trampoline code (which gets rid of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so that the emulator can provide the exact "struct sigcontext *" argument to the program's signal handlers. The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which have the same values as the re-used SA_DISABLE and SA_ONSTACK which are intended for sigaction only. This enables the support of a SA_RESETHAND flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal semantics where the signal handler is reset when it's triggered. makesyscalls.sh no longer appends the struct sysentvec on the end of the generated init_sysent.c code. It's a lot saner to have it in a seperate file rather than trying to update the structure inside the awk script. :-) At exec time, the dozen bytes or so of signal trampoline code are copied to the top of the user's stack, rather than obtaining the trampoline code the old way by getting a clone of the parent's user area. This allows Linux and native binaries to freely exec each other without getting trampolines mixed up.
# 3eb77c83	24-Feb-1996	John Dyson <dyson@FreeBSD.org>	Fix a problem with tracking the modified bit. Eliminate the ugly inline-asm code, and speed up the page-table-page tracking.
# bd7e5f99	18-Jan-1996	John Dyson <dyson@FreeBSD.org>	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
# 0e41ee30	04-Jan-1996	Garrett Wollman <wollman@FreeBSD.org>	Convert DDB to new-style option.
# db6a20e2	03-Jan-1996	Garrett Wollman <wollman@FreeBSD.org>	Converted two options over to the new scheme: USER_LDT and KTRACE.
# c9681930	19-Dec-1995	David Greenman <dg@FreeBSD.org>	Corrected a typo in a comment.
# 2838c968	19-Dec-1995	David Greenman <dg@FreeBSD.org>	Implemented a (sorely needed for years) double fault handler to catch stack overflows. It sure would be nice if there was an unmapped page between the PCB and the stack (and that the size of the stack was configurable!). With the way things are now, the PCB will get clobbered before the double fault handler gets control, making somewhat of a mess of things. Despite this, it is still fairly easy to poke around in the overflowed stack to figure out the cause.
# b1529bda	14-Dec-1995	Peter Wemm <peter@FreeBSD.org>	GENERIC/LINT: Remove redundant quoting on some option lines. LINT: add a couple of new/missing/undocumented options files.i386: add linux code so that you can compile a kernel with static linux emulation ("options LINUX") i386/*: use #if defined(COMPAT_LINUX) \|\| defined(LINUX) to enable static support of linux emulation (just like "IBCS2" makes ibcs2 static) The main thing this is going to make obvious, is that the LINUX code (when compiled from LINT) has a lot of warnings, some of which dont look too pleasant..
# 5e463408	14-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Make math_emulators LKMable.
# 7dfe504f	09-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Remove various unused symbols and procedures.
# efeaf95a	06-Dec-1995	David Greenman <dg@FreeBSD.org>	Untangled the vm.h include file spaghetti.
# 4ccc87c5	28-Oct-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unused functions and variables, make things static, and other cleanups.
# 029b0fc8	08-Oct-1995	Bruce Evans <bde@FreeBSD.org>	Fix tracing of syscalls. The previous fix required the undocumented option DDB_NO_LCALLS to stop ddb getting control and broke all ddb tracing. Now there is no option and no way for ddb to trace at address _Xsyscall or to _Xsyscall, but tracing everywhere else works. The previous fix did unnecessary things for Linux syscalls. Don't bother checking that syscall frames are for user mode. Make debugger traps inside the kernel (except at addresses _Xsyscall and _Xsyscall+1) fatal if ddb is not configured. They "can't happen". Add prototypes. Remove stupid comments, e.g., /ARGSUSED/ for args that are used.
# 00c6cada	04-Oct-1995	Julian Elischer <julian@FreeBSD.org>	Submitted by: Juergen Lock <nox@jelal.hb.north.de> Obtained from: other people on the net ? 1. stepping over syscalls (gdb ni) sends you to DDB, and returned to the wrong address afterwards, with or without DDB. patch in i386/i386/trap.c below. 2. the linux emulator (modload'ed) still causes panics with DIAGNOSTIC, re-applied a patch posted to one of the lists...
# 4219d2b2	21-Aug-1995	David Greenman <dg@FreeBSD.org>	A couple of micro optimizations to improve NULL syscall performance by about 2%.
# a705fd3e	30-Jul-1995	David Greenman <dg@FreeBSD.org>	Fix a bug in my disabled version of trap_pfault()...curpcb may be NULL even when curproc isn't. This condition occurs at system startup and perhaps at other times.
# 1174c7d1	16-Jul-1995	Peter Wemm <peter@FreeBSD.org>	This fixes a compiler warning, and a cosmetic problem with the linux emul code when compiling with "options KTRACE". ktrsyscall() was expecting an array of integers, this was passing the address of a structure containing an array of integers.. The cosmetic problem was that it was calling the "enter syscall" trace hook twice - this looks like a cut/paste error/typo.
# 446cee6e	16-Jul-1995	Joerg Wunsch <joerg@FreeBSD.org>	Include ``options POWERFAIL_NMI'' for owners of older (non-apm) notebooks where a powerfail condition (external power drop; battery state low) is signalled by an NMI. Makes it beep instead of panicing. Reviewed by: davidg
# 9e951f36	15-Jul-1995	David Greenman <dg@FreeBSD.org>	Truncate the fault address to a page boundry when calling vm_fault(). The last change to fix the fault-twice bug with page tables wasn't quite complete.
# 4a67eb71	14-Jul-1995	David Greenman <dg@FreeBSD.org>	Fixed bug that caused page tables to be faulted twice instead of once. Submitted by: John Dyson
# d3628763	11-Jun-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Merge RELENG_2_0_5 into HEAD
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# f550a707	21-Mar-1995	David Greenman <dg@FreeBSD.org>	Added a new version of trap_pfault() that disallows kernel page faults to the user address space unless pcb_onfault is set. The code is currently commented out because iBCS2 and process debugging parts of the kernel need to be changed/fixed first.
# c6d5f3ac	21-Mar-1995	David Greenman <dg@FreeBSD.org>	Changed some #ifdef DIAGNOSTIC code that I added to be #ifdef DEBUG.
# b5e8ce9f	16-Mar-1995	Bruce Evans <bde@FreeBSD.org>	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
# 1e1e0b44	14-Feb-1995	Søren Schmidt <sos@FreeBSD.org>	First attempt to run linux binaries. This is only the changes needed to the generic kernel. The actual emulator is a separate LKM. (not finished yet, sorry). Submitted by: sos@freebsd.org & sef@kithrup.com
# b1e4a738	09-Feb-1995	David Greenman <dg@FreeBSD.org>	Removed unnecessary check for pr_scale in the AST/OWEUPC case.
# 5a32829d	09-Feb-1995	David Greenman <dg@FreeBSD.org>	Check P_PROFIL flag for profiling rather than pr_scale as it makes more sense.
# fbdfe8ac	24-Jan-1995	David Greenman <dg@FreeBSD.org>	Changed buffer allocation policy (machdep.c) Moved various pmap 'bit' test/set functions back into real functions; gcc generates better code at the expense of more of it. (pmap.c) Fixed a deadlock problem with pv entry allocations (pmap.c) Added a new, optional function 'pmap_prefault' that does clustered page table preloading (pmap.c) Changed the way that page tables are held onto (trap.c). Submitted by: John Dyson
# 20415301	14-Jan-1995	Bruce Evans <bde@FreeBSD.org>	Fix security holes in sigreturn(), ptrace() and procfs. sigreturn() attempted to check for insecure and fatal eflags and segment selectors, but missed many cases and got the IOPL check back to front. The other syscalls didn't check at all. sys_process.c, machdep.c: Only allow PT_WRITE_U to write to the registers (ordinary and FP). psl.h, locore.s, machdep.c: Eliminate PSL_MBZ, PSL_MBO and PSL_USERCLR. We are not supposed to assume anything about the reserved bits. Use PSL_USERCHANGE and PSL_KERNEL instead. Rename PSL_USERSET to PSL_USER. exception.s: Define a private label for use by doreti when returning to user mode fails. machdep.c: In syscalls, allow changing only the eflags that can be changed on 486's in user mode (no longer attempt to allow benign IOPL changes; allow changing the nasty PSL_NT; don't allow changing the i586 bits). Don't attempt to check all the cases involving invalid selectors and %eip's. Just check for privilege violations and let the invalid things cause a trap. procfs_machdep.c: Call the ptrace register functions to do all the work for reading and writing ordinary registers and for single stepping. trap.c: Ignore traps caused by PSL_NT being set. Previously, users could cause a fatal trap in user mode by setting PSL_NT and executing an iret, and a fatal trap in kernel mode by setting PSL_NT and making a syscall. PSL_NT was cleared too late and not in enough modes to fix the problem. Make all traps in user mode (except T_NMI) nonfatal. Recover from traps caused by attempting to load invalid user registers in doreti by restarting the traps so that they appear to occur in user mode. --- Fix bogons that I noticed while fixing the above: psl.h: Fix some comments. Uniformize idempotency ifdef. exception.s, machdep.c: Remove rsvd[0-14]. rsvd0 hasn't been reserved since the 486 came out. Replace rsvd0 by `align'. rsvd[0-11] used wrong (magic non-unique) trap numbers. Replace rsvd[1-14] by rsvd. locore.s: Enable alignment check flag on 486's and 586's. machdep.c: Use a better type for kstack[]. Use TFREGP() to find the registers. Reformat ptrace functions from SEF to something closer to KNF. procfs_machdep.c: The wrong pointer to the registers got fixed as a side effect. Implement reading and writing of FP registers. /proc//regs now work (only) for processes that are in memory. Clean up comments. trap.c, trap.h: Remove unused trap types.
# 0d94caff	09-Jan-1995	David Greenman <dg@FreeBSD.org>	These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
# 13d7c724	24-Dec-1994	Bruce Evans <bde@FreeBSD.org>	Obtained from: 1.1.5 Fix single-stepping of emulated FPU instructions. Don't panic if an FPU instruction is attempted but there is no FPU and no FPU emulator is configured.
# ab4bc4b2	30-Oct-1994	Bruce Evans <bde@FreeBSD.org>	Fix selector arg to match the (missing) prototype for sdtossd(). Cosmetic. Return from trap() if trap_fatal() returns. trap_fatal() isn't fatal if you have ddb. Returning from trap() is usually the right thing to do and much better than falling through.
# 09f7992a	20-Oct-1994	Garrett Wollman <wollman@FreeBSD.org>	Make my ALLDEVS kernel compile (basically, LINT minus a lot of options).
# fabbd9b7	11-Oct-1994	Søren Schmidt <sos@FreeBSD.org>	Ouch, fixed bug in errno translation (ibcs2 support).
# 76d121f2	10-Oct-1994	Søren Schmidt <sos@FreeBSD.org>	Hmm, only translate errno when doing an actual return. Reviewed by: sef@freefall.cdrom.com
# c96f1293	09-Oct-1994	Søren Schmidt <sos@FreeBSD.org>	Updated to convert errno return in syscall if conversion tabel present.
# 3fb3086e	08-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	db_disasm.c: Unused var zapped. pmap.c: tons of unused vars zapped, various other warnings silenced. trap.c: unused vars zapped. vm_machdep.c: A wrong argument, which by chance did the right thing, was corrected.
# 22414e53	30-Sep-1994	David Greenman <dg@FreeBSD.org>	Laptop Advanced Power Management support by HOSOKAWA Tatsumi. Submitted by: HOSOKAWA Tatsumi
# f7d6afc6	11-Sep-1994	David Greenman <dg@FreeBSD.org>	Be more careful about dereferencing curproc, p_vmspace, and curpcb, otherwise the machine will overflow the stack in a recursive fault loop (causing the machine to spontaneously reboot because of the stack fault that ultimately happens). Submitted by: Inspired by Bruce Evans, but this change is different than what he suggested.
# fe7bb84c	08-Sep-1994	Bruce Evans <bde@FreeBSD.org>	Remove <machine/eflags.h> and all dependencies on it. eflags.h is just the Mach/i386 version of the BSD/vax(?) <machine/psl.h>. The Mach version has slightly better names for many macros but is now out of date and little used. It was originally used even less (for spelling PSL_T as EFL_TF in <machine/db_machdep.h>).
# 406d45e0	28-Aug-1994	Bruce Evans <bde@FreeBSD.org>	Don't test if a u_int is < 0. The remaining test is sufficient and the extra one caused a warning.
# 8a129cae	27-Aug-1994	David Greenman <dg@FreeBSD.org>	1) Changed ddb into a option rather than a pseudo-device (use options DDB in your kernel config now). 2) Added ps ddb function from 1.1.5. Cleaned it up a bit and moved into its own file. 3) Added \r handing in db_printf. 4) Added missing memory usage stats to statclock(). 5) Added dummy function to pseudo_set so it will be emitted if there are no other pseudo declarations.
# f3f0ca60	24-Aug-1994	Søren Schmidt <sos@FreeBSD.org>	Changes preparing for iBCS support Reviewed by: Submitted by:
# f23b4c91	18-Aug-1994	Garrett Wollman <wollman@FreeBSD.org>	Fix up some sloppy coding practices: - Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above. NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
# 5c8b38d4	09-Aug-1994	Garrett Wollman <wollman@FreeBSD.org>	Handle NMI's in accordance with data in van Gilluwe book.
# 03e6c253	01-Aug-1994	David Greenman <dg@FreeBSD.org>	Removed all code related to the pagescan daemon, and changed 'act_count' adjustments to compensate for a world without the pagescan daemon.
# 8c481329	10-Jun-1994	David Greenman <dg@FreeBSD.org>	Fixed minor spelling error.
# 3c256f53	06-Jun-1994	David Greenman <dg@FreeBSD.org>	trap.c: Vastly improved trap.c from me. This rewritten version has a variety of features, amoung them: higher performance and much higher code quality. support.s, cpufunc.h: No longer use gs override to enforce range limits - compare directly against VM_MAXUSER_ADDRESS instead. The old way caused problems in preserving the gs selector...and this method is just as fast or faster.
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# 9f82ad3d	29-Apr-1994	Gary Clark II <gclarkii@FreeBSD.org>	Added ifdef for GPL_MATH_EMULATE to keep the sytem from panicing when using it.
# 28626748	07-Apr-1994	David Greenman <dg@FreeBSD.org>	Make Bruce happy: silently enter ddb on a BPT or trace trap if ddb is configured in the kernel.
# d2306226	02-Apr-1994	David Greenman <dg@FreeBSD.org>	New interrupt code from Bruce Evans. In additional to Bruce's attached list of changes, I've made the following additional changes: 1) i386/include/ipl.h renamed to spl.h as the name conflicts with the file of the same name in i386/isa/ipl.h. 2) changed all use of mask (i.e. netmask, biomask, ttymask, etc) to _imask (net_imask, etc). 3) changed vestige of splnet use in if_is to splimp. 4) got rid of "impmask" completely (Bruce had gotten rid of netmask), and are now using net_imask instead. 5) dozens of minor cruft to glue in Bruce's changes. These require changes I made to config(8) as well, and thus it must be rebuilt. -DG from Bruce Evans: sio: o No diff is supplied. Remove the define of setsofttty(). I hope that is enough. .s: o i386/isa/debug.h no longer exists. The event counters became too much trouble to maintain. All function call entry and exception entry counters can be recovered by using profiling kernel (the new profiling supports all entry points; however, it is too slow to leave enabled all the time; it also). Only BDBTRAP() from debug.h is now used. That is moved to exception.s. It might be worth preserving SHOW_BITS() and calling it from _mcount() (if enabled). o T_ASTFLT is now only set just before calling trap(). o All exception handlers set SWI_AST_MASK in cpl as soon as possible after entry and arrange for _doreti to restore it atomically with exiting. It is not possible to set it atomically with entering the kernel, so it must be checked against the user mode bits in the trap frame before committing to using it. There is no place to store the old value of cpl for syscalls or traps, so there are some complications restoring it. Profiling stuff (mostly in .s): o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet. o All interesting labels `foo' are renamed `_foo' and all uninteresting labels `_bar' are renamed `bar'. A small change to gprof allows ignoring labels not starting with underscores. o MCOUNT_LABEL() is to provide names for counters for times spent in exception handlers. o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception handlers. Its arg is the pc where the exception occurred. The new mcount() pretends that this was a call from that pc to a suitable MCOUNT_LABEL(). o MEXITCOUNT is to turn off any timer started by MCOUNT(). /usr/src/sys/i386/i386/exception.s: o The non-BDB BPTTRAP() macros were doing a sti even when interrupts were disabled when the trap occurred. The sti (fixed) sti is actually a no-op unless you have my changes to machdep.c that make the debugger trap gates interrupt gates, but fixing that would make the ifdefs messier. ddb seems to be unharmed by both interrupts always disabled and always enabled (I had the branch in the fix back to front for some time :-(). o There is no known pushal bug. o tf_err can be left as garbage for syscalls. /usr/src/sys/i386/i386/locore.s: o Fix and update BDE_DEBUGGER support. o ENTRY(btext) before initialization was dangerous. o Warm boot shot was longer than intended. /usr/src/sys/i386/i386/machdep.c: o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require other changes. Use the following: o Remove aston() and setsoftclock(). Maybe use the following: o No netisr.h. o Spelling fix. o Delay to read the Rebooting message. o Fix for vm system unmapping a reduced area of memory after bounds_check_with_label() reduces the size of a physical i/o for a partition boundary. A similar fix is required in kern_physio.c. o Correct use of __CONCAT. It never worked here for non- ANSI cpp's. Is it time to drop support for non-ANSI? o gdt_segs init. 0xffffffffUL is bogus because ssd_limit is not 32 bits. The replacement may have the same value :-), but is more natural. o physmem was one page too low. Confusing variable names. Don't use the following: o Better numbers of buffers. Each 8K page requires up to 16 buffer headers. On my system, this results in 5576 buffers containing [up to] 2854912 bytes of memory. The usual allocation of about 384 buffers only holds 192K of disk if you use it on an fs with a block size of 512. o gdt changes for bdb. o TGT -> IDT changes for bdb. o #ifdefed changes for bdb. /usr/src/sys/i386/i386/microtime.s: o Use the correct asm macros. I think asm.h was copied from Mach just for microtime and isn't used now. It certainly doesn't belong in <sys>. Various macros are also duplicated in sys/i386/boot.h and libc/i386/.h. o Don't switch to and from the IRR; it is guaranteed to be selected (default after ICU init and explicitly selected in isa.c too, and never changed until the old microtime clobbered it). /usr/src/sys/i386/i386/support.s: o Non-essential changes (none related to spls or profiling). o Removed slow loads of %gs again. The LDT support may require not relying on %gs, but loading it is not the way to fix it! Some places (copyin ...) forgot to load it. Loading it clobbers the user %gs. trap() still loads it after certain types of faults so that fuword() etc can rely on it without loading it explicitly. Exception handlers don't restore it. If we want to preserve the user %gs, then the fastest method is to not touch it except for context switches. Comparing with VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on a 486, while loading %gs takes 9 cycles and using it takes another. o Fixed a signed branch to unsigned. /usr/src/sys/i386/i386/swtch.s: o Move spl0() outside of idle loop. o Remove cli/sti from idle loop. sw1 does a cli, and in the unlikely event of an interrupt occurring and whichqs becoming zero, sw1 will just jump back to _idle. o There's no spl0() function in asm any more, so use splz(). o swtch() doesn't need to be superaligned, at least with the new mcounting. o Fixed a signed branch to unsigned. o Removed astoff(). /usr/src/sys/i386/i386/trap.c: o The decentralized extern decls were inconsistent, of course. o Fixed typo MATH_EMULTATE in comments. / o Removed unused variables. o Old netmask is now impmask; print it instead. Perhaps we should print some of the new masks. o BTW, trap() should not print anything for normal debugger traps. /usr/src/sys/i386/include/asmacros.h: o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros as necessary. /usr/src/sys/i386/include/cpu.h: o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal while the kernel is running. o Don't use var++ to set boolean variables. It fails after a mere 4G times :-) and is slower than storing a constant on [3-4]86s. /usr/src/sys/i386/include/cpufunc.h: o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of <machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by almost everything for the inlines. /usr/src/sys/i386/include/ipl.h: o New file. Defines spl inlines and SWI macros and declares most variables related to hard and soft interrupt masks. /usr/src/sys/i386/isa/icu.h: o Moved definitions to <machine/ipl.h> /usr/src/sys/i386/isa/icu.s: o Software interrupts (SWIs) and delayed hardware interrupts (HWIs) are now handled uniformally, and dispatching them from splx() is more like dispatching them from _doreti. The dispatcher is essentially (handler[ffs(ipending & ~cpl)](). o More care (not quite enough) is taken to avoid unbounded nesting of interrupts. o The interface to softclock() is changed so that a trap frame is not required. o Fast interrupt handlers are now handled more uniformally. Configuration is still too early (new handlers would require bits in <machine/ipl.h> and functions to vector.s). o splnnn() and splx() are no longer here; they are inline functions (could be macros for other compilers). splz() is the nontrivial part of the old splx(). /usr/src/sys/i386/isa/ipl.h o New file. Supposed to have only bus-dependent stuff. Perhaps the h/w masks should be declared here. /usr/src/sys/i386/isa/isa.c: o DON'T APPLY ALL OF THIS DIFF. You need only things involving mask and MASK and comments about them. netmask is now a pure software mask. It works like the softclock mask. /usr/src/sys/i386/isa/vector.s: o Reorganize AUTO_EOI macros. o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust fastintr handlers. o fastintr handlers need to metamorphose into ordinary interrupt handlers if their SWI bit has become set. Previously, sio had unintended latency for handling output completions and input of SLIP framing characters because this was not done. /usr/src/sys/net/netisr.h: o The machine-dependent stuff is now imported from <machine/ipl.h>. /usr/src/sys/sys/systm.h o DON'T APPLY ALL OF THIS DIFF. You need mainly the different splx() prototype. The spl*() prototypes are duplicated as inlines in <machine/ipl.h> but they need to be duplicated here in case there are no inlines. I sent systm.h and cpufunc.h to Garrett. We agree that spl0 should be replaced by splnone and not the other way around like I've done. /usr/src/sys/kern/kern_clock.c o splsoftclock() now lowers cpl so the direct call to softclock() works as intended. o softclock() interface changed to avoid passing the whole frame (some machines may need another change for profile_tick()). o profiling renamed _profiling to avoid ANSI namespace pollution. (I had to improve the mcount() interface and may as well fix it.) The GUPROF variant doesn't actually reference profiling here, but the 'U' in GUPROF should mean to select the microtimer mcount() and not change the interface.
# ed7fcbd0	24-Mar-1994	David Greenman <dg@FreeBSD.org>	From John Dyson: performance improvements to the new bounce buffer code.
# 943a66f3	14-Mar-1994	David Greenman <dg@FreeBSD.org>	Performance improvements from John Dyson. 1) A new mechanism has been added to prevent pages from being paged out called "vm_page_hold". Similar to vm_page_wire, but much lower overhead. 2) Scheduling algorithm has been changed to improve interactive performance. 3) Paging algorithm improved. 4) Some vnode and swap pager bugs fixed.
# 04f18356	07-Mar-1994	David Greenman <dg@FreeBSD.org>	1) "Pre-faulting" in of pages into process address space Eliminates vm_fault overhead on process startup and mmap referenced data for in-memory pages. (process startup time using in-memory segments much faster) 2) Even more efficient pmap code. Code partially cleaned up. More comments yet to follow. (generally more efficient pte management) 3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein clustering.) (much faster paging performance on non-write behind disk subsystems, slightly faster performance on other systems.) 4) Slightly changed vm_pageout code for more efficiency and better statistics. Also, resist swapout a little more. (less likely to pageout a recently used page) 5) Slight improvement to the page table page trap efficiency. (generally faster system VM fault performance) 6) Defer creation of unnamed anonymous regions pager until needed. (speeds up shared memory bss creation) 7) Remove possible deadlock from swap_pager initialization. 8) Enhanced procfs to provide "vminfo" about vm objects and user pmaps. 9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net & socket performance and to prepare for things to come. John Dyson dyson@implode.root.com David Greenman davidg@root.com
# b9d60b3f	08-Feb-1994	David Greenman <dg@FreeBSD.org>	Fixed bugs in stack grow code, and moved it back into a seperate function like it was originally. Also added back call to "grow" in sendsig now that this routine actually works.
# 0172c219	01-Feb-1994	David Greenman <dg@FreeBSD.org>	Minor cleanup. Decode state information better in the case of a fatal trap.
# d64f660f	17-Jan-1994	David Greenman <dg@FreeBSD.org>	Improvements mostly from John Dyson, with a little bit from me. * Removed pmap_is_wired * added extra cli/sti protection in idle (swtch.s) * slight code improvement in trap.c * added lots of comments * improved paging and other algorithms in VM system
# 7f8cb368	14-Jan-1994	David Greenman <dg@FreeBSD.org>	"New" VM system from John Dyson & myself. For a run-down of the major changes, see the log of any effected file in the sys/vm directory (swap_pager.c for instance).
# c8a13ecd	03-Jan-1994	David Greenman <dg@FreeBSD.org>	Convert syscall to trapframe. Based on work done by John Brezak.
# aaf08d94	18-Dec-1993	Garrett Wollman <wollman@FreeBSD.org>	Make everything compile with -Wtraditional. Make it easier to distribute a binary link-kit. Make all non-optional options (pagers, procfs) standard, and update LINT to reflect new symtab requirements. NB: -Wtraditional will henceforth be forgotten. This editing pass was primarily intended to detect any constructions where the old code might have been relying on traditional C semantics or syntax. These were all fixed, and the result of fixing some of them means that -Wall is now a realistic possibility within a few weeks.
# 6d01f02e	11-Dec-1993	David Greenman <dg@FreeBSD.org>	1) Added proc file system from Paul Kranenburg with changes from John Dyson to make it reliably work under FreeBSD. 2) Added and enabled PROCFS in the GENERICxx and LINT kernels. 3) New execve() from me. Still work to be done here, but this version works well and is needed before other changes can be made. For a description of the design behind this, see freebsd-arch or ask me. 4) Rewrote stack fault code; made user stack VM grow as needed rather than all up front; improves performance a little and reduces process memory requirements. 5) Incorporated fix from Gene Stark to fault/wire a user page table page to fix a problem in copyout. This is a temporary fix and is not appropriate for pageable page tables. For a description of the problem, see Gene's post to the freebsd-hackers mailing list. 6) Tighten up vm_page struct to reduce memory requirements for it. ifdef pager page lock code as it's not being used currently. 7) Introduced new element to vmspace struct - vm_minsaddr; initial (minimum) stack address. Compliment to vm_maxsaddr. 8) Added a panic if the allocation for process u-pages fails. 9) Improve performance and accuracy of kernel profiling by putting in a little inline assembly instead of spl(). 10) Made serial console with sio driver work. Still has problems with serial input, but is almost useable. 11) Added -Bstatic to SYSTEM_LD in Makefile.i386 so that kernels will build properly with the new ld.
# 05e634ef	02-Dec-1993	Andrew Moore <alm@FreeBSD.org>	From: Jeffrey Hsu <hsu@soda.berkeley.edu> The following patch adds the addr argument to signal handlers. The kernel with the patch is no more and no less in compliance or in violation of POSIX and ANSI C than the kernel before the patch. The added functionality this addr argument provides is quite useful. It enables an entire class of algorithms which use mprotect to trace memory references. Beside garbage collectors, I have heard of this technique being applied to debuggers and profilers. The only benchmarking I've performed is using akcl to compile maxima: without the kernel patch, it takes 7 hours to compile maxima, while with stratified garbage collection, it only takes 50 minutes. Basically, I can't think of a reason not to add the addr argument and there is a compelling need for it. If you find the patch acceptable, please let me know so I can send my FreeBSD akcl config files to wfs for inclusion in the core akcl release. The old 386BSD config files there won't work on either NetBSD or FreeBSD.
# a9627169	28-Nov-1993	David Greenman <dg@FreeBSD.org>	Patch from Gene Stark: Subject: Page fault in PTE area fails in copyout Index: sys/i386/i386/trap.c FreeBSD-1.0.2 Description: Reading files of several megabytes into Emacs, or many small files all at once, would fail with "IO error - bad address". Repeat-By: The bug can be exercised by a test program that malloc()'s a 5MB chunk of memory, and then, without accessing the memory first, filling it with data from a file using read(). (I read 64k chunks from /dev/wd0d into successive 64k regions of the 5MB chunk.) The read() will fail with EFAULT at the first virtual address boundary that is a multiple of 0x400000. Fix: The problem was code in sys/i386/i386/trap.c that tries to figure out what kind of trap occurred and to handle it appropriately. It was interpreting any page fault with virtual address >= vm->vm_maxsaddr as being a user stack segment fault. In fact, addresses >= USRSTACK are in the user structure/PTE area, and if they are handled as stack faults, the proper PTE will not be paged in when it is supposed to be. This situation comes up in copyout() and copyoutstr(), if PTE's are accessed for the first time ever. The page fault on accessing the nonexistent PTE is mishandled as a stack fault, and then the fault that occurs on the subsequent access to the page itself causes copyout to fail with EFAULT.
# 381fe1aa	24-Nov-1993	Garrett Wollman <wollman@FreeBSD.org>	Make the LINT kernel compile with -W -Wreturn-type -Wcomment -Werror, and add same (sans -Werror) to Makefile for future compilations.
# 0967373e	12-Nov-1993	David Greenman <dg@FreeBSD.org>	First steps in rewriting locore.s, and making info useful when the machine panics. i386/i386/locore.s: 1) got rid of most .set directives that were being used like #define's, and replaced them with appropriate #define's in the appropriate header files (accessed via genassym). 2) added comments to header inclusions and global definitions, and global variables 3) replaced some hardcoded constants with cpp defines (such as PDESIZE and others) 4) aligned all comments to the same column to make them easier to read 5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to /sys/i386/include/asmacros.h 6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code 7) added new global '_KERNend' to store last location+1 of kernel 8) cleaned up zeroing of bss so that only bss is zeroed 9) fix zeroing of page tables so that it really does zero them all - not just if they follow the bss. 10) rewrote page table initialization code so that 1) works correctly and 2) write protects the kernel text by default 11) properly initialize the kernel page directory, upages, p0stack PT, and page tables. The previous scheme was more than a bit screwy. 12) change allocation of virtual area of IO hole so that it is fixed at KERNBASE + 0xa0000. The previous scheme put it right after the kernel page tables and then later expected it to be at KERNBASE +0xa0000 13) change multiple bogus settings of user read/write of various areas of kernel VM - including the IO hole; we should never be accessing the IO hole in user mode through the kernel page tables 14) split kernel support routines such as bcopy, bzero, copyin, copyout, etc. into a seperate file 'support.s' 15) split swtch and related routines into a seperate 'swtch.s' 16) split routines related to traps, syscalls, and interrupts into a seperate file 'exception.s' 17) remove some unused global variables from locore that got inserted by Garrett when he pulled them out of some .h files. i386/isa/icu.s: 1) clean up global variable declarations 2) move in declaration of astpending and netisr i386/i386/pmap.c: 1) fix calculation of virtual_avail. It previously was calculated to be right in the middle of the kernel page tables - not a good place to start allocating kernel VM. 2) properly allocate kernel page dir/tables etc out of kernel map - previously only took out 2 pages. i386/i386/machdep.c: 1) modify boot() to print a warning that the system will reboot in PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user abort with a key on the console. The machine will wait for ever if a key is typed before the reboot. The default is 15 seconds, but can be set to 0 to mean don't wait at all, -1 to mean wait forever, or any positive value to wait for that many seconds. 2) print "Rebooting..." just before doing it. kern/subr_prf.c: 1) remove PANICWAIT as it is deprecated by the change to machdep.c i386/i386/trap.c: 1) add table of trap type strings and use it to print a real trap/ panic message rather than just a number. Lot's of work to be done here, but this is the first step. Symbolic traceback is in the TODO. i386/i386/Makefile.i386: 1) add support in to build support.s, exception.s and swtch.s ...and various changes to various header files to make all of the above happen.
# d19eeaea	04-Nov-1993	David Greenman <dg@FreeBSD.org>	splnone()'s in the trap code can be deadly. Save/restore previous priority instead.
# d7354624	01-Nov-1993	Christoph Robitschko <chmr@FreeBSD.org>	Modified the "rude stack hack" that it only applies to addresses within the stack area and not memory above VM_MAXUSER_ADDRESS. That way, copyout and friends now work for pages whose page table entries have not yet been allocated/been paged out.
# 960173b9	15-Oct-1993	Rodney W. Grimes <rgrimes@FreeBSD.org>	genassym.c: Remove NKMEMCLUSTERS, it is no longer define or used. locores.s: Fix comment on PTDpde and APTDpde to be pde instead of pte Add new equation for calculating location of Sysmap Remove Bill's old #ifdef garbage for counting up memory, that stuff will never be made to work and was just cluttering up the file. Add code that places the PTD, page table pages, and kernel stack below the 640k ISA hole if there is room for it, otherwise put this stuff all at 1MB. This fixes the 28K bogusity in the boot blocks, that can now go away! Fix the caclulation of where first is to be dependent on NKPDE so that we can skip over the above mentioned areas. The 28K thing is now 44K in size due to the increase in kernel virtual memory space, but since we no longer have to worry about that this is no big deal. Use if NNPX > 0 instead of ifdef NPX for floating point code. machdep.c Change the calculation of for the buffer cache to be 20% of all memory above 2MB and add back the upper limit of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL of the kernel memory space on large memory machines, note that this will not even come into effect unless you have more than 32MB. The current buffer cache limit is 6.7MB due to this caclulation. It seems that we where erroniously allocating bufpages pages for buffer_map. buffer_map is UNUSED in this implementation of the buffer cache, but since the map is referenced in several if statements a quick fix was to simply allocate 1 vm page (but no real memory) to it. pmap.h Remove rcsid, don't want them in the kernel files! Removed some cruft inside an #ifdef DEBUGx that caused compiler errors if you where compiling this for debug. Use the #defines for PD_SHIFT and PG_SHIFT in place of constants. trap.c: Remove patch kit header and rcsid, fix $Id$. Now include "npx.h" and use NNPX for controlling the floating point code. Remove a now completly invalid check for a maximum virtual address, the virtual address now ends at 0xFFFFFFFF so there is no more MAX!! (Thanks David, I completly missed that one!) vm_machdep.c Remove patch kit header and rcsid, fix $Id$. Now include "npx.h" and use NNPX for controlling the floating point code. Replace several 0xFE00000 constants with KERNBASE
# a0ad3760	28-Aug-1993	Rodney W. Grimes <rgrimes@FreeBSD.org>	Changed trap.c so that a panic will occur if we do not have hardware FP and we try to call the emulator when it is not compiled in. Removed the #if defined(i486) \|\| defined(i387) that use to call the panic if we did not have a math emulator. Removed an extranious include of i386/i386/math_emu.h from math_emulate.c.
# 26931201	27-Jul-1993	David Greenman <dg@FreeBSD.org>	* Applied fixes from Bruce Evans to fix COW bugs, >1MB kernel loading, profiling, and various protection checks that cause security holes and system crashes. * Changed min/max/bcmp/ffs/strlen to be static inline functions - included from cpufunc.h in via systm.h. This change improves performance in many parts of the kernel - up to 5% in the networking layer alone. Note that this requires systm.h to be included in any file that uses these functions otherwise it won't be able to find them during the load. * Fixed incorrect call to splx() in if_is.c * Fixed bogus variable assignment to splx() in if_ed.c
# 5b81b6b3	12-Jun-1993	Rodney W. Grimes <rgrimes@FreeBSD.org>	Initial import, 0.1 + pk 0.2.4-B1