Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_intr.c
Revision	Date	Author	Comments
# a2409f17	10-May-2024	Warner Losh <imp@FreeBSD.org>	intr: Document how to get the interrupt frame Document that the only way to get the interrupt thread is to use curthread->td_intr_frame, rather than the old-style of having a NULL pointer for the interrupt thread. As of 38c35248fe3b, support for that has been removed. I neglected to update that commit message with these details. Suggested by: mhorne
# 38c35248	08-Oct-2021	Elliott Mitchell <ehem+freebsd@m5p.com>	kern/intr: remove support for passing trap frame as argument While otherwise a handy potential approach, getting the trap frame via the argument isn't documented and isn't supposed to be used. With all uses removed, now remove support to end the mixed calling conventions. Differential Revision: https://reviews.freebsd.org/D37688 Reviewed by: imp, mhorne Pull Request: https://github.com/freebsd/freebsd-src/pull/1225
# a9e0f316	09-May-2024	Elliott Mitchell <ehem+freebsd@m5p.com>	kern/intr: redeclare intr_setaffinity()'s third arg constant This matches reality and allows removal of a __DECONST(). Fixes: 4c72d075a57 ("LinuxKPI: const argument to irq_set_affinity_hint()") Fixes: 9b33b154b53 ("Add support to cpuset for binding hardware interrupts") Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/1126
# cd04887b	09-May-2024	Elliott Mitchell <ehem+freebsd@m5p.com>	kern/intr: change ->ie_irq to unsigned All architecture implementations actually want this to be unsigned. INTRNG the equivalent is overtly unsigned. x86 and PowerPC merely avoid the need to explicitly convert at several points. Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/1126
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# 39888ed7	15-Oct-2022	Mitchell Horne <mhorne@FreeBSD.org>	kern_intr: Check for NULL event in intr_destroy() It likely won't happen, but is consistent with the other functions of this KPI. Reviewed by: imp, jhb MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D33479
# 05b727fe	12-Oct-2022	Mitchell Horne <mhorne@FreeBSD.org>	Downgrade tty_intr_event from a global It can be static within uart_tty.c. It is an open question whether there remains any real benefit to having uart instances share a swi thread. Reviewed by: imp, markj, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D36938
# d744e271	04-Sep-2022	Gordon Bergling <gbe@FreeBSD.org>	kern: Remove a double word in a source code comment - s/that that/that/ MFC after: 3 days
# c84c5e00	18-Jul-2022	Mitchell Horne <mhorne@FreeBSD.org>	ddb: annotate some commands with DB_CMD_MEMSAFE This is not completely exhaustive, but covers a large majority of commands in the tree. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35583
# 2cf78708	14-Jul-2022	John Baldwin <jhb@FreeBSD.org>	Collapse interrupt thread priorities. Allow high priority hardware interrupts to run at PI_REALTIME via INTR_TYPE_CLK, but collapse all other hardware interrupt threads to the next priority level (PI_INTR). Collapse all SWI priorities to the same priority level (PI_SOFT) just below PI_INTR. Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35646
# ed998d1c	14-Jul-2022	John Baldwin <jhb@FreeBSD.org>	ithreads: Support priority adjustment by schedulers. Use sched_wakeup instead of sched_add when marking an ithread runnable. This allows schedulers to reset their internal time slice tracking state and restore the base ithread priority when an ithread resumes from idle. Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35643
# fea89a28	14-Jul-2022	John Baldwin <jhb@FreeBSD.org>	Add sched_ithread_prio to set the base priority of an interrupt thread. Use it instead of sched_prio when setting the priority of an interrupt thread. Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D35642
# 254e4e5b	28-Dec-2021	John Baldwin <jhb@FreeBSD.org>	Simplify swi for bus_dma. When a DMA request using bounce pages completes, a swi is triggered to schedule pending DMA requests using the just-freed bounce pages. For a long time this bus_dma swi has been tied to a "virtual memory" swi (swi_vm). However, all of the swi_vm implementations are the same and consist of checking a flag (busdma_swi_pending) which is always true and if set calling busdma_swi. I suspect this dates back to the pre-SMPng days and that the intention was for swi_vm to serve as a mux. However, in the current scheme there's no need for the mux. Instead, remove swi_vm and vm_ih. Each bus_dma implementation that uses bounce pages is responsible for creating its own swi (busdma_ih) which it now schedules directly. This swi invokes busdma_swi directly removing the need for busdma_swi_pending. One consequence is that the swi now works on RISC-V which had previously failed to invoke busdma_swi from swi_vm. Reviewed by: imp, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D33447
# 7bc13692	20-Sep-2021	Wojciech Macek <wma@FreeBSD.org>	hwpmc: fix performance issues Differential revision: https://reviews.freebsd.org/D32025 Avoid using atomics as it_wait is guarded by td_lock. Report threshold calculation is done only if at least one PMC hook is installed Fixes: * avoid unnecessary branching (if frame != null ...) by having PMC_HOOK_INSTALLED_ANY condition on the top of them, which should hint the core not to execute speculatively anything which us underneath; * access intr_hwpmc_waiting_report_threshold cacheline only if at least one hook is loaded;
# 6fa041d7	12-Sep-2021	Wojciech Macek <wma@FreeBSD.org>	Measure latency of PMC interruptions Add HWPMC events to measure latency. Provide sysctl to choose the number of outstanding events which trigger HWPMC event. Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D31283
# 67f508db	10-Aug-2021	Alexander Motin <mav@FreeBSD.org>	Mark some sysctls as CTLFLAG_MPSAFE. MFC after: 2 weeks
# 6eb60f5b	09-Mar-2021	Hans Petter Selasky <hselasky@FreeBSD.org>	Use the word "LinuxKPI" instead of "Linux compatibility", to not confuse with user-space Linux compatibility support. No functional change. MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking
# fa2528ac	18-Feb-2021	Alex Richardson <arichardson@FreeBSD.org>	Use atomic loads/stores when updating td->td_state KCSAN complains about racy accesses in the locking code. Those races are fine since they are inside a TD_SET_RUNNING() loop that expects the value to be changed by another CPU. Use relaxed atomic stores/loads to indicate that this variable can be written/read by multiple CPUs at the same time. This will also prevent the compiler from doing unexpected re-ordering. Reported by: GENERIC-KCSAN Test Plan: KCSAN no longer complains, kernel still runs fine. Reviewed By: markj, mjg (earlier version) Differential Revision: https://reviews.freebsd.org/D28569
# aba10e13	25-Jul-2020	Alexander Motin <mav@FreeBSD.org>	Allow swi_sched() to be called from NMI context. For purposes of handling hardware error reported via NMIs I need a way to escape NMI context, being too restrictive to do something significant. To do it this change introduces new swi_sched() flag SWI_FROMNMI, making it careful about used KPIs. On platforms allowing IPI sending from NMI context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI, otherwise it works just like SWI_DELAY. To handle the delayed SWIs this patch calls clk_intr_event on every hardclock() tick. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25754
# 7029da5c	26-Feb-2020	Pawel Biernacki <kaktus@FreeBSD.org>	Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
# f912e8f2	10-Feb-2020	Hans Petter Selasky <hselasky@FreeBSD.org>	Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt handler. Interrupt handlers are removed via intr_event_execute_handlers() when IH_DEAD is set. The thread removing the interrupt is woken up, and calls intr_event_update(). When this happens, the ie_hflags are cleared and re-built from all the remaining handlers sharing the event. When the last IH_NET handler is removed, the IH_NET flag will be cleared from ih_hflags (or ie_hflags may still be being rebuilt in a different context), and the ithread_execute_handlers() may return with ie_hflags missing IH_NET. This can lead to a scenario where IH_NET was present before calling ithread_execute_handlers, and is not present at its return, meaning the need for epoch must be cached locally. This can happen when loading and unloading network drivers. Also make sure the ie_hflags is not cleared before being updated. This is a regression issue after r357004. Backtrace: panic() # trying to access epoch tracker on stack of dead thread _epoch_enter_preempt() ifunit_ref() ifioctl() fo_ioctl() kern_ioctl() sys_ioctl() syscallenter() amd64_syscall() Differential Revision: https://reviews.freebsd.org/D23483 Reviewed by: glebius@, gallatin@, mav@, jeff@ and kib@ Sponsored by: Mellanox Technologies
# 511d1afb	22-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	Enter the network epoch for interrupt handlers of INTR_TYPE_NET. Provide tunable to limit how many times handlers may be executed without reentering epoch. Differential Revision: https://reviews.freebsd.org/D23242
# c4eb6630	22-Jan-2020	Gleb Smirnoff <glebius@FreeBSD.org>	Add ie_hflags to struct intr_event, which accumulates flags from all handlers on this event. For now handle only IH_ENTROPY in that manner.
# 686bcb5c	15-Dec-2019	Jeff Roberson <jeff@FreeBSD.org>	schedlock 4/4 Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778
# 61a74c5c	15-Dec-2019	Jeff Roberson <jeff@FreeBSD.org>	schedlock 1/4 Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626
# 4b28d96e	13-Dec-2019	John Baldwin <jhb@FreeBSD.org>	Remove the deprecated timeout(9) interface. All in-tree consumers have been converted to callout(9). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22602
# 5d0e8299	24-May-2019	Conrad Meyer <cem@FreeBSD.org>	Disable intr_storm_threshold mechanism by default The ixl.4 manual page has documented that the threshold falsely detects interrupt storms on 40Gbit NICs as long ago as 2015, and we have seen similar false positives with the ioat(4) DMA device (which can push GB/s). For example, synthetic load can be generated with tools/tools/ioat 'ioatcontrol 0 200 8192 1 1000' (allocate 200x8kB buffers, generate an interrupt for each one, and do this for 1000 milliseconds). With storm-detection disabled, the Broadwell-EP version of this device is capable of generating ~350k real interrupts per second. The following historical context comes from jhb@: Originally, the threshold worked around incorrect routing of PCI INTx interrupts on single-CPU systems which would end up in a hard hang during boot. Since the threshold was added, our PCI interrupt routing was improved, most PCI interrupts use edge-triggered MSI instead of level-triggered INTx, and typical systems have multiple CPUs available to service interrupts. On the off chance that the threshold is useful in the future, it remains available as a tunable and sysctl. Reviewed by: jhb Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20401
# 4e255d74	10-May-2019	Andrew Gallatin <gallatin@FreeBSD.org>	Bind TCP HPTS (pacer) threads to NUMA domains Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134
# 6d51c0fe	24-Mar-2019	Ian Lepore <ian@FreeBSD.org>	Revert accidental change that should not have been included in r345475. I had changed this value as part of a local experiment, and neglected to change it back before committing the other changes.
# 67da50a0	24-Mar-2019	Ian Lepore <ian@FreeBSD.org>	Truncate a too-long interrupt handler name when there is only one handler. There are only 19 bytes available for the name of an interrupt plus the name(s) of handlers/drivers using it. There is a mechanism from the days of shared interrupts that replaces some of the handler names with '+' when they don't all fit into 19 bytes. In modern times there is typically only one device on an interrupt, but long device names are the norm, especially with embedded systems. Also, in systems with multiple interrupt controllers, the names of the interrupts themselves can be long. For example, 'gic0,s54: imx6_anatop0' doesn't fit, and replacing the device driver name with a '+' provides no useful info at all. When there is only one handler but its name was too long to fit, this change truncates enough leading chars of the handler name (replacing them with a '-' char to indicate that some chars are missing) to use all 19 bytes, preserving the unit number typically on the end of the name. Using the prior example, this results in: 'gic0,s54:-6_anatop0' which provides plenty of info to figure out which device is involved. PR: 211946 Reviewed by: gonzo@ (prior version without the '-' char) Differential Revision: https://reviews.freebsd.org/D19675
# 82a5a275	17-Dec-2018	Andriy Gapon <avg@FreeBSD.org>	add support for marking interrupt handlers as suspended The goal of this change is to fix a problem with PCI shared interrupts during suspend and resume. I have observed a couple of variations of the following scenario. Devices A and B are on the same PCI bus and share the same interrupt. Device A's driver is suspended first and the device is powered down. Device B generates an interrupt. Interrupt handlers of both drivers are called. Device A's interrupt handler accesses registers of the powered down device and gets back bogus values (I assume all 0xff). That data is interpreted as interrupt status bits, etc. So, the interrupt handler gets confused and may produce some noise or enter an infinite loop, etc. This change affects only PCI devices. The pci(4) bus driver marks a child's interrupt handler as suspended after the child's suspend method is called and before the device is powered down. This is done only for traditional PCI interrupts, because only they can be shared. At the moment the change is only for x86. Notable changes in core subsystems / interfaces: - BUS_SUSPEND_INTR and BUS_RESUME_INTR methods are added to bus interface along with convenience functions bus_suspend_intr and bus_resume_intr; - rman_set_irq_cookie and rman_get_irq_cookie functions are added to provide a way to associate an interrupt resource with an interrupt cookie; - intr_event_suspend_handler and intr_event_resume_handler functions are added to the MI interrupt handler interface. I added two new interrupt handler flags, IH_SUSP and IH_CHANGED, to implement the new intr_event functions. IH_SUSP marks a suspended interrupt handler. IH_CHANGED is used to implement a barrier that ensures that a change to the interrupt handler's state is visible to future interrupts. While there, I fixed some whitespace issues in comments and changed a couple of logically boolean variables to be bool. MFC after: 1 month (maybe) Differential Revision: https://reviews.freebsd.org/D15755
# 19fa89e9	25-Aug-2018	Mark Murray <markm@FreeBSD.org>	Remove the Yarrow PRNG algorithm option in accordance with due notice given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898
# e0fa977e	03-Aug-2018	Andriy Gapon <avg@FreeBSD.org>	safer wait-free iteration of shared interrupt handlers The code that iterates a list of interrupt handlers for a (shared) interrupt, whether in the ISR context or in the context of an interrupt thread, does so in a lock-free fashion. Thus, the routines that modify the list need to take special steps to ensure that the iterating code has a consistent view of the list. Previously, those routines tried to play nice only with the code running in the ithread context. The iteration in the ISR context was left to a chance. After commit r336635 atomic operations and memory fences are used to ensure that ie_handlers list is always safe to navigate with respect to inserting and removal of list elements. There is still a question of when it is safe to actually free a removed element. The idea of this change is somewhat similar to the idea of the epoch based reclamation. There are some simplifications comparing to the general epoch based reclamation. All writers are serialized using a mutex, so we do not need to worry about concurrent modifications. Also, all read accesses from the open context are serialized too. So, we can get away just two epochs / phases. When a thread removes an element it switches the global phase from the current phase to the other and then drains the previous phase. Only after the draining the removed element gets actually freed. The code that iterates the list in the ISR context takes a snapshot of the global phase and then increments the use count of that phase before iterating the list. The use count (in the same phase) is decremented after the iteration. This should ensure that there should be no iteration over the removed element when its gets freed. This commit also simplifies the coordination with the interrupt thread context. Now we always schedule the interrupt thread when removing one of handlers for its interrupt. This makes the code both simpler and safer as the interrupt thread masks the interrupt thus ensuring that there is no interaction with the ISR context. P.S. This change matters only for shared interrupts and I realize that those are becoming a thing of the past (and quickly). I also understand that the problem that I am trying to solve is extremely rare. PR: 229106 Reviewed by: cem Discussed with: Samy Al Bahra MFC after: 5 weeks Differential Revision: https://reviews.freebsd.org/D15905
# 111b043c	22-Jul-2018	Andriy Gapon <avg@FreeBSD.org>	change interrupt event's list of handlers from TAILQ to CK_SLIST The primary reason for this commit is to separate mechanical and nearly mechanical code changes from an upcoming fix for unsafe teardown of shared interrupt handlers that have only filters (see D15905). The technical rationale is that SLIST is sufficient. The only operation that gets worse performance -- O(n) instead of O(1) is a removal of a handler, but it is not a critical operation and the list is expected to be rather short. Additionally, it is easier to reason about SLIST when considering the concurrent lock-free access to the list from the interrupt context and the interrupt thread. CK_SLIST is used because the upcoming change depends on the memory order provided by CK_SLIST insert and the fact that CL_SLIST remove does not trash the linkage in a removed element. While here, I also fixed a couple of whitespace issues, made code under ifdef notyet compilable, added a lock assertion to ithread_update() and made intr_event_execute_handlers() static as it had no external callers. Reviewed by: cem (earlier version) MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D16016
# a0638b33	24-May-2018	Conrad Meyer <cem@FreeBSD.org>	Yank crufty INTR_FILTER option It was introduced to the tree in r169320 and r169321 in May 2007. It never got much use and never became a kernel default. The code duplicates the default path quite a bit, with slight modifications. Just yank out the cruft. Whatever goals were being aimed for can probably be met within the existing framework, without a flag day option. Mostly mechanical change: 'unifdef -m -UINTR_FILTER'. Reviewed by: mmacy Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15546
# fc2e87be	19-May-2018	Matt Macy <mmacy@FreeBSD.org>	intr unbreak KTR/LINT build
# ba3f7276	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	intr: eliminate / annotate unused stack locals
# 8a36da99	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
# 29dfb631	03-May-2017	Conrad Meyer <cem@FreeBSD.org>	Extend cpuset_get/setaffinity() APIs Add IRQ placement-only and ithread-only API variants. intr_event_bind has been extended with sibling methods, as it has many more callsites in existing code. Reviewed by: kib@, adrian@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10586
# 83c9dea1	17-Apr-2017	Gleb Smirnoff <glebius@FreeBSD.org>	- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156
# 01f5e086	21-Jul-2015	Konstantin Belousov <kib@FreeBSD.org>	The part of r285680 which removed release semantic for two stores to it_need was wrong []. Restore the releases and add a comment explaining why it is needed. Noted by: alc [] Reviewed by: bde [*] Sponsored by: The FreeBSD Foundation
# 1b79b949	19-Jul-2015	Kirk McKusick <mckusick@FreeBSD.org>	Restructure code for readability improvement. No functional change. Reviewed by: kib
# 283dfee9	18-Jul-2015	Konstantin Belousov <kib@FreeBSD.org>	Further cleanup after r285607. Remove useless release semantic for some stores to it_need. For stores where the release is needed, add a comment explaining why. Fence after the atomic_cmpset() op on the it_need should be acquire only, release is not needed (see above). The combination of atomic_cmpset() + fence_acq() is better expressed there as atomic_cmpset_acq(). Use atomic_cmpset() for swi' ih_need read and clear. Discussed with: alc, bde Reviewed by: bde Comments wording provided by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 70a3efc1	15-Jul-2015	Konstantin Belousov <kib@FreeBSD.org>	Do not use atomic_swap_int(9), it is not available on all architectures. Atomic_cmpset_int(9) is a direct replacement, due to loop. The change fixes arm, arm64, mips an sparc64, which lack atomic_swap(). Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 615b6ea2	15-Jul-2015	Konstantin Belousov <kib@FreeBSD.org>	Reset non-zero it_need indicator to zero atomically with fetching its current value. It is believed that the change is the real fix for the issue which was covered over by the r252683. With the current code, if the interrupt handler sets it_need between read and consequent reset, the update could be lost and ithread_execute_handlers() would not be called in response to the lost update. The r252683 could have hide the issue since at the moment of commit, atomic_load_acq_int() did locked cmpxchg on the variable, which puts the cache line into the exclusive owned state and clears store buffers. Then the immediate store of zero has very high chance of reusing the exclusive state of the cache line and make the load and store sequence operate as atomic swap. For now, add the acq+rel fence immediately after the swap, to not disturb current (but excessive) ordering. Acquire is needed for the ih_need reads after the load, while release does not serve a useful purpose []. Reviewed by: alc Noted by: alc [] Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 03bbcb2f	15-Jul-2015	Konstantin Belousov <kib@FreeBSD.org>	Style. Remove excessive brackets. Compare non-boolean with zero. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# d1b06863	30-Jun-2015	Mark Murray <markm@FreeBSD.org>	Huge cleanup of random(4) code. * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.. live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij)
# ed95805e	30-Apr-2015	John Baldwin <jhb@FreeBSD.org>	Remove support for Xen PV domU kernels. Support for HVM domU kernels remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes
# 10cb2424	30-Oct-2014	Mark Murray <markm@FreeBSD.org>	This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des)
# 3fe93b94	18-Oct-2014	Adrian Chadd <adrian@FreeBSD.org>	Convert a missed u_char cpu -> int cpu. This was caught by a gcc build. Reported by: luigi Sponsored by: Norse Corp, Inc.
# b7627840	04-Oct-2014	Konstantin Belousov <kib@FreeBSD.org>	Add kernel option KSTACK_USAGE_PROF to sample the stack depth on interrupts and report the largest value seen as sysctl debug.max_kstack_used. Useful to estimate how close the kernel stack size is to overflow. In collaboration with: Larry Baird <lab@gta.com> Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week
# 066da805	17-Sep-2014	Adrian Chadd <adrian@FreeBSD.org>	Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather than u_char. Migrate post_filter to use an int for a CPU rather than u_char. Change intr_event_bind() to use an int for CPU rather than u_char. It touches the ppc, sparc64, arm and mips machdep code but it should (hah!) be a no-op. Tested: * i386, AMD64 laptops Reviewed by: jhb
# af3b2549	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Pull in r267961 and r267973 again. Fix for issues reported will follow.
# 37a107a4	27-Jun-2014	Glen Barber <gjb@FreeBSD.org>	Revert r267961, r267973: These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory
# 3da1cf1e	27-Jun-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Extend the meaning of the CTLFLAG_TUN flag to automatically check if there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies
# 81198539	22-Jun-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Permit changing cpu mask for cpu set 1 in presence of drivers binding their threads to particular CPU. Changing ithread cpu mask is now performed by special cpuset_setithread(). It creates additional cpuset root group on first bind invocation. No objection: jhb Tested by: hiren MFC after: 2 weeks Sponsored by: Yandex LLC
# f02e47dc	04-Oct-2013	Mark Murray <markm@FreeBSD.org>	Snapshot. This passes the build test, but has not yet been finished or debugged. Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)
# c495c935	26-Aug-2013	Mark Murray <markm@FreeBSD.org>	Snapshot; Do some running repairs on entropy harvesting. More needs to follow.
# 3eebd44d	03-Jul-2013	Alfred Perlstein <alfred@FreeBSD.org>	The change in r236456 (atomic_store_rel not locked) exposed a bug in the ithread code where we could lose ithread interrupts if intr_event_schedule_thread() is called while the ithread is already running. Effectively memory writes could be ordered incorrectly such that the unatomic write of 0 to ithd->it_need (in ithread_loop) could be delayed until after it was set to be triggered again by intr_event_schedule_thread(). This was particularly a big problem for CAM because CAM optimizes scheduling of ithreads such that it only signals camisr() when it queues to an empty queue. This means that additional completion events would not unstick CAM and quickly lead to a complete lockup of the CAM stack. To fix this use atomics in all places where ithd->it_need is accessed. Submitted by: delphij, mav Obtained from: TrueOS, iXsystems MFC After: 1 week
# b887a155	05-Apr-2013	Konstantin Belousov <kib@FreeBSD.org>	If filter of the interrupt event is not null, print it, in addition to the handler address. Add a mark to distinguish between filter and handler. Note that the arguments for both filter and handler are same. Sponsored by: The FreeBSD Foundation Reviewed by: jhb MFC after: 1 week
# dbd2e167	04-Mar-2013	Davide Italiano <davide@FreeBSD.org>	MFcalloutng (r244355): Make loadavg calculation callout direct. There are several reasons for it: - it is very simple and doesn't worth context switch to SWI; - since SWI is no longer used here, we can remove twelve years old hack, excluding this SWI from from the loadavg statistics; - it fixes problem when eventtimer (HPET) shares interrupt with some other device, and that interrupt thread counted as permanent loadavg of 1; now loadavg accounted before that interrupt thread is scheduled. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, Fabian Keil, markj
# dae3dc73	06-Feb-2013	Neel Natu <neel@FreeBSD.org>	If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. The text used above is verbatim from r195249 and the code should now be in line with the intent of that commit.
# d95dca1d	25-Sep-2012	John Baldwin <jhb@FreeBSD.org>	Add optional entropy harvesting for software interrupts in swi_sched() as controlled by kern.random.sys.harvest.swi. SWI harvesting feeds into the interrupt FIFO and each event is estimated as providing a single bit of entropy. Reviewed by: markm, obrien MFC after: 2 weeks
# c9516c94	06-Aug-2012	Alexander Kabaev <kan@FreeBSD.org>	Do not add handler to event handlers list until ithread is created. In rare event when fast and ithread interrupts share the same vector and the fast handler was registered first, we can end up trying to schedule the ithread that is not created yet. The kernel built with INVARIANTS then triggers an assertion. Change the order to create the ithread first and only then add the handler that needs it to the interrupt event handlers list. Reviewed by: jhb
# 85729c2c	09-Mar-2012	Juli Mallett <jmallett@FreeBSD.org>	Export intrcnt correctly when running under 32-bit compatibility. Reviewed by: gonzo, nwhitehorn
# 44ad5475	08-Mar-2012	John Baldwin <jhb@FreeBSD.org>	Add a new sched_clear_name() method to the scheduler interface to clear the cached name used for KTR_SCHED traces when a thread's name changes. This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's name changes, most commonly via execve(). MFC after: 2 weeks
# 037f43d3	16-Jan-2012	Sergey Kandaurov <pluknet@FreeBSD.org>	Be pedantic and change // comment to C-style one. Noticed by: Bruce Evans
# dc15eac0	01-Jan-2012	Ed Schouten <ed@FreeBSD.org>	Use strchr() and strrchr(). It seems strchr() and strrchr() are used more often than index() and rindex(). Therefore, simply migrate all kernel code to use it. For the XFS code, remove an empty line to make the code identical to the code in the Linux kernel.
# 521ea19d	18-Jul-2011	Attilio Rao <attilio@FreeBSD.org>	- Remove the eintrcnt/eintrnames usage and introduce the concept of sintrcnt/sintrnames which are symbols containing the size of the 2 tables. - For amd64/i386 remove the storage of intr* stuff from assembly files. This area can be widely improved by applying the same to other architectures and likely finding an unified approach among them and move the whole code to be MI. More work in this area is expected to happen fairly soon. No MFC is previewed for this patch. Tested by: pluknet Reviewed by: jhb Approved by: re (kib)
# 5bd186a6	26-Apr-2011	Jeff Roberson <jeff@FreeBSD.org>	- Catch up to falloc() changes. - PHOLD() before using a task structure on the stack. - Fix a LOR between the sleepq lock and thread lock in _intr_drain().
# e4cd31dd	21-Mar-2011	Jeff Roberson <jeff@FreeBSD.org>	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
# d3305205	11-Jan-2011	John Baldwin <jhb@FreeBSD.org>	- Retire some unused ithread priorities: PI_TTYHIGH, PI_TAPE, and PI_DISKLOW. While here, rename PI_TTYLOW to PI_TTY. - Add a macro PI_SWI() that takes a SWI_* constant as an argument and returns the suitable thread priority.
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 1f255bd3	10-Jun-2010	Alexander Motin <mav@FreeBSD.org>	Store interrupt trap frame into struct thread. It allows interrupt handler to obtain both trap frame and opaque argument submitted on registrction. After kernel and all drivers get used to it, legacy hack can be removed. Reviewed by: jhb@
# 932a288f	28-Feb-2010	Andriy Gapon <avg@FreeBSD.org>	MFC r203061: KASSERT contract of return value of interrupt filter X-MFCto7 after: 1 week
# 89fc20cc	27-Jan-2010	Andriy Gapon <avg@FreeBSD.org>	KASSERT that return value of interrupt filter complies with contract For example a return value of zero could lead to a stuck level-triggered interrupt line. Reviewed by: jhb (for INTR_FILTER case) MFC after: 3 weeks
# 7b10638c	21-Jan-2010	John Baldwin <jhb@FreeBSD.org>	MFC 198134,198149,198170,198171,198391,200948: Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64, i386, and sparc64 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler().
# 1b9d701f	03-Nov-2009	Attilio Rao <attilio@FreeBSD.org>	Split P_NOLOAD into a per-thread flag (TDF_NOLOAD). This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>
# d0c9a291	15-Oct-2009	John Baldwin <jhb@FreeBSD.org>	Use language more closely resembling English in a panic message. Pointy hat to: jhb Submitted by: pluknet
# 37b8ef16	15-Oct-2009	John Baldwin <jhb@FreeBSD.org>	Add a facility for associating optional descriptions with active interrupt handlers. This is primarily intended as a way to allow devices that use multiple interrupts (e.g. MSI) to meaningfully distinguish the various interrupt handlers. - Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate a description with an active interrupt handler setup by BUS_SETUP_INTR. It has a default method (bus_generic_describe_intr()) which simply passes the request up to the parent device. - Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports printf(9) style formatting using var args. - Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of an interrupt handler and copy the name passed to intr_event_add_handler() into that buffer instead of just saving the pointer to the name. - Add a new intr_event_describe_handler() which appends a description string to an interrupt handler's name. - Implement support for interrupt descriptions on amd64 and i386 by having the nexus(4) driver supply a custom bus_describe_intr method that invokes a new intr_describe() MD routine which in turn looks up the associated interrupt event and invokes intr_event_describe_handler(). Requested by: many Reviewed by: scottl MFC after: 2 weeks
# cebc7fb1	01-Jul-2009	John Baldwin <jhb@FreeBSD.org>	Improve the handling of cpuset with interrupts. - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib)
# 9bd55acf	25-Jun-2009	John Baldwin <jhb@FreeBSD.org>	Return errors from intr_event_bind() to the caller of intr_set_affinity(). Specifically, if a non-root user attempts to bind an interrupt the request will now report failure with EPERM rather than silently failing with a successful return code. MFC after: 1 week
# e84bcd84	18-May-2009	Robert Watson <rwatson@FreeBSD.org>	Binding interrupts to a CPU consists of two parts: setting up CPU affinity for the interrupt thread, and requesting that underlying hardware direct interrupts to the CPU. For software interrupt threads, implement a no-op interrupt event binder that returns success, so that the interrupt management code will just set the ithread's affinity and succeed. Reviewed by: jhb MFC after: 1 week
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# b386eb91	23-Sep-2008	David E. O'Brien <obrien@FreeBSD.org>	style(9)
# 37e9511f	15-Sep-2008	John Baldwin <jhb@FreeBSD.org>	Expose a new public routine intr_event_execute_handlers() which executes all the non-filter handlers attached to an interrupt event. This can be used by device drivers which multiplex their interrupt onto the interrupt handlers for child devices.
# 6205924a	22-Aug-2008	Kip Macy <kmacy@FreeBSD.org>	Submit a band-aid for interrupt set up race. MFC after: 1 month
# 2976b312	18-Jul-2008	Kip Macy <kmacy@FreeBSD.org>	revert change from local tree
# 4af83c8c	18-Jul-2008	Kip Macy <kmacy@FreeBSD.org>	import vendor fixes to cxgb
# 04a58b9d	29-Jun-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Remove an unneeded error variable to make clear that if reaching the end of the function we never return an error.
# 8df78c41	16-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	- Make SCHED_STATS more generic by adding a wrapper to create the variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
# 9b33b154	10-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	- Add the interrupt vector number to intr_event_create so MI code can lookup hard interrupt events by number. Ignore the irq# for soft intrs. - Add support to cpuset for binding hardware interrupts. This has the side effect of binding any ithread associated with the hard interrupt. As per restrictions imposed by MD code we can only bind interrupts to a single cpu presently. Interrupts can be 'unbound' by binding them to all cpus. Reviewed by: jhb Sponsored by: Nokia
# 8aa9e82e	05-Apr-2008	John Baldwin <jhb@FreeBSD.org>	Move INTR_FILTER from opt_global.h to its own header.
# 1ee1b687	05-Apr-2008	John Baldwin <jhb@FreeBSD.org>	Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt code. - Rename the intr_event 'eoi', 'disable', and 'enable' hooks to 'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric. Also, add a comment describe what the MI code expects them to do. - On amd64, i386, and powerpc this is effectively a NOP. - On arm, don't bother masking the interrupt unless the ithread is scheduled in the non-INTR_FILTER case to match what INTR_FILTER did. Also, don't bother unmasking the interrupt in the post_filter case if we never masked it. The INTR_FILTER case had been doing this by having arm_unmask_irq for the post_filter (formerly 'eoi') hook. - On ia64, stray interrupts are now masked for the non-INTR_FILTER case. They were already masked in the INTR_FILTER case. - On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for both the 'post_filter' and 'post_ithread' hooks to match what the non-INTR_FILTER code did. - On sun4v, retire the ithread wrapper hack by using an appropriate 'post_ithread' hook instead (it's what 'post_ithread'/'enable' was designed to do even in 5.x). Glanced at by: piso Reviewed by: marius Requested by: marius [1], [5] Tested on: amd64, i386, arm, sparc64
# e4b1aa62	03-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	- Fix a mis-merge that crept in during the softclock changes. Spotted by: jhb
# 8d809d50	02-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	Implement per-cpu callout threads, wheels, and locks. - Move callout thread creation from kern_intr.c to kern_timeout.c - Call callout_tick() on every processor via hardclock_cpu() rather than inspecting callout internal details in kern_clock.c. - Remove callout implementation details from callout.h - Package up all of the global variables into a per-cpu callout structure. - Start one thread per-cpu. Threads are not strictly bound. They prefer to execute on the native cpu but may migrate temporarily if interrupts are starving callout processing. - Run all callouts by default in the thread for cpu0 to maintain current ordering and concurrency guarantees. Many consumers may not properly handle concurrent execution. - The new callout_reset_on() api allows specifying a particular cpu to execute the callout on. This may migrate a callout to a new cpu. callout_reset() schedules on the last assigned cpu while callout_reset_curcpu() schedules on the current cpu. Reviewed by: phk Sponsored by: Nokia
# 6d2d1c04	17-Mar-2008	John Baldwin <jhb@FreeBSD.org>	Simplify the interrupt code a bit: - Always include the ie_disable and ie_eoi methods in 'struct intr_event' and collapse down to one intr_event_create() routine. The disable and eoi hooks simply aren't used currently in the !INTR_FILTER case. - Expand 'disab' to 'disable' in a few places. - Use function casts for arm and i386:intr_eoi_src() instead of wrapper routines since to trim one extra indirection. Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER} Tested on: {amd64,i386} x {FILTER, !FILTER}
# 237fdd78	16-Mar-2008	Robert Watson <rwatson@FreeBSD.org>	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# eaf86d16	14-Mar-2008	John Baldwin <jhb@FreeBSD.org>	Add preliminary support for binding interrupts to CPUs: - Add a new intr_event method ie_assign_cpu() that is invoked when the MI code wishes to bind an interrupt source to an individual CPU. The MD code may reject the binding with an error. If an assign_cpu function is not provided, then the kernel assumes the platform does not support binding interrupts to CPUs and fails all requests to do so. - Bind ithreads to CPUs on their next execution loop once an interrupt event is bound to a CPU. Only shared ithreads are bound. We currently leave private ithreads for drivers using filters + ithreads in the INTR_FILTER case unbound. - A new intr_event_bind() routine is used to bind an interrupt event to a CPU. - Implement binding on amd64 and i386 by way of the existing pic_assign_cpu PIC method. - For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up an interrupt source and binds its interrupt event to the specified CPU. MI code can currently (ab)use this by doing: intr_bind(rman_get_start(irq_res), cpu); however, I plan to add a truly MI interface (probably a bus_bind_intr(9)) where the implementation in the x86 nexus(4) driver would end up calling intr_bind() internally. Requested by: kmacy, gallatin, jeff Tested on: {amd64, i386} x {regular, INTR_FILTER}
# 6617724c	12-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 539976ff	29-Oct-2007	Julian Elischer <julian@FreeBSD.org>	fix typo in code normally not compiled in.
# 3c1ffc32	28-Oct-2007	Julian Elischer <julian@FreeBSD.org>	Fix typo in code obviously not being compiled on any of my machines. found by: rdivacky@
# 9ef95d01	26-Oct-2007	Julian Elischer <julian@FreeBSD.org>	rename the process to 'idle' and 'intr' as per jhb.
# ca9a0ddf	26-Oct-2007	Julian Elischer <julian@FreeBSD.org>	if one changes a function's arguments, one must also change the callers.
# 7ab24ea3	26-Oct-2007	Julian Elischer <julian@FreeBSD.org>	Introduce a way to make pure kernal threads. kthread_add() takes the same parameters as the old kthread_create() plus a pointer to a process structure, and adds a kernel thread to that process. kproc_kthread_add() takes the parameters for kthread_add, plus a process name and a pointer to a pointer to a process instead of just a pointer, and if the proc * is NULL, it creates the process to the specifications required, before adding the thread to it. All other old kthread_xxx() calls return, but act on (struct thread ) instead of (struct proc ). One reason to change the name is so that any old kernel modules that are lying around and expect kthread_create() to make a process will not just accidentally link. fix top to show kernel threads by their thread name in -SH mode add a tdnam formatting option to ps to show thread names. make all idle threads actual kthreads and put them into their own idled process. make all interrupt threads kthreads and put them in an interd process (mainly for aesthetic and accounting reasons) rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper' man page fixes to follow.
# 3745c395	20-Oct-2007	Julian Elischer <julian@FreeBSD.org>	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
# 982d11f8	04-Jun-2007	Jeff Roberson <jeff@FreeBSD.org>	Commit 14/14 of sched_lock decomposition. - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 67596082	04-Jun-2007	Attilio Rao <attilio@FreeBSD.org>	Rework the PCPU_* (MD) interface: - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)
# 3401f2c1	31-May-2007	Paolo Pisati <piso@FreeBSD.org>	In some particular cases (like in pccard and pccbb), the real device handler is wrapped in a couple of functions - a filter wrapper and an ithread wrapper. In this case (and just in this case), the filter wrapper could ask the system to schedule the ithread and mask the interrupt source if the wrapped handler is composed of just an ithread handler: modify the "old" interrupt code to make it support this situation, while the "new" interrupt code is already ok. Discussed with: jhb
# bafe5a31	06-May-2007	Paolo Pisati <piso@FreeBSD.org>	Bring in the reminaing bits to make interrupt filtering work: o push much of the i386 and amd64 MD interrupt handling code (intr_machdep.c::intr_execute_handlers()) into MI code (kern_intr.c::ithread_loop()) o move filter handling to kern_intr.c::intr_filter_loop() o factor out the code necessary to mask and ack an interrupt event (intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()), and make them part of 'struct intr_event', passing them as arguments to kern_intr.c::intr_event_create(). o spawn a private ithread per handler (struct intr_handler::ih_thread) with filter and ithread functions. Approved by: re (implicit?)
# 0ae62c18	18-Apr-2007	Nate Lawson <njl@FreeBSD.org>	Bump the interrupt storm detection counter to 1000. My slow fileserver gets a bogus irq storm detected when periodic daily kicks off at 3 am and disconnects the disk. Change the print logic to print once per second when the storm is occurring instead of only once. Otherwise, it appeared that something else was causing the errors each night at 3 am since the print only occurred the first time. Reviewed by: jhb MFC after: 1 week
# e41bcf3c	02-Mar-2007	John Baldwin <jhb@FreeBSD.org>	- Don't do the interrupt storm protection stuff for software interrupt handlers. - Use pause() when throtting during an interrupt storm. Reported by: kris (1)
# f2d619c8	27-Feb-2007	Paolo Pisati <piso@FreeBSD.org>	Do not execute filter only handlers in ithread_execute_handlers(): this fixes the panics when filter only and ithread only handlers where sharing the same irq .
# ef544f63	22-Feb-2007	Paolo Pisati <piso@FreeBSD.org>	o break newbus api: add a new argument of type driver_filter_t to bus_setup_intr() o add an int return code to all fast handlers o retire INTR_FAST/IH_FAST For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current Reviewed by: many Approved by: re@
# f0393f06	23-Jan-2007	Jeff Roberson <jeff@FreeBSD.org>	- Remove setrunqueue and replace it with direct calls to sched_add(). setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
# c3045318	12-Dec-2006	John Baldwin <jhb@FreeBSD.org>	Add a function to return the MD interrupt source cookie associated with an interrupt event. Use this in the x86 code to fixup the intrcnt names when an interrupt handler is removed.
# bc17acb2	12-Dec-2006	John Baldwin <jhb@FreeBSD.org>	Add a comment and fix a whitespace nit.
# ad1e7d28	05-Dec-2006	Julian Elischer <julian@FreeBSD.org>	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# 8460a577	26-Oct-2006	John Birrell <jb@FreeBSD.org>	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# 1ca2c018	17-Oct-2006	Bruce Evans <bde@FreeBSD.org>	kern_intr.c: - Count (scheduling of) software interrupts (SWIs) as SWIs, not as hardware interrupts. - Don't count (scheduling of) delayed SWIs as interrupts at all, since in the delayed case it is expected that there are many more scheduling calls than handling calls. Perhaps all interrupts should be counted only when they are handled, but it is only counts of delayed SWIs that shouldn never be combined with the other counts. subr_trap.c: - Count (handling of) Asynchronous System Traps (ASTs) as traps, not as software interrupts. Before these changes, the counter for SWIs only counted ASTs, and SWIs weren't counted separately, but a subcounter for ASTs alone is less needed than for most other exception sources. 4.4BSD-Lite uses the counters for similar things (actually matching their names) on its main arches (hp300, ..., !i386) where more of the exceptions are in hardware.
# 19e9205a	12-Jul-2006	John Baldwin <jhb@FreeBSD.org>	Simplify the pager support in DDB. Allowing different db commands to install custom pager functions didn't actually happen in practice (they all just used the simple pager and passed in a local quit pointer). So, just hardcode the simple pager as the only pager and make it set a global db_pager_quit flag that db commands can check when the user hits 'q' (or a suitable variant) at the pager prompt. Also, now that it's easy to do so, enable paging by default for all ddb commands. Any command that wishes to honor the quit flag can do so by checking db_pager_quit. Note that the pager can also be effectively disabled by setting $lines to 0. Other fixes: - 'show idt' on i386 and pc98 now actually checks the quit flag and terminates early. - 'show intr' now actually checks the quit flag and terminates early.
# 0f180a7c	17-Apr-2006	John Baldwin <jhb@FreeBSD.org>	Change msleep() and tsleep() to not alter the calling thread's priority if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.
# bb141be1	15-Apr-2006	Scott Long <scottl@FreeBSD.org>	Take a better stab at making this compile.
# 83bc5d54	15-Apr-2006	Scott Long <scottl@FreeBSD.org>	Take a stab at making this compile.
# 9477358d	13-Apr-2006	John Baldwin <jhb@FreeBSD.org>	Turn on ithread_destroy() and call it from intr_event_destroy() to tear down an interrupt event's associated thread (if it has one).
# fe486a37	26-Oct-2005	John Baldwin <jhb@FreeBSD.org>	Add a swi_remove() function to teardown software interrupt handlers. For now it just calls intr_event_remove_handler(), but at some point it might also be responsible for tearing down interrupt events created via swi_add.
# e0f66ef8	25-Oct-2005	John Baldwin <jhb@FreeBSD.org>	Reorganize the interrupt handling code a bit to make a few things cleaner and increase flexibility to allow various different approaches to be tried in the future. - Split struct ithd up into two pieces. struct intr_event holds the list of interrupt handlers associated with interrupt sources. struct intr_thread contains the data relative to an interrupt thread. Currently we still provide a 1:1 relationship of events to threads with the exception that events only have an associated thread if there is at least one threaded interrupt handler attached to the event. This means that on x86 we no longer have 4 bazillion interrupt threads with no handlers. It also means that interrupt events with only INTR_FAST handlers no longer have an associated thread either. - Renamed struct intrhand to struct intr_handler to follow the struct intr_foo naming convention. This did require renaming the powerpc MD struct intr_handler to struct ppc_intr_handler. - INTR_FAST no longer implies INTR_EXCL on all architectures except for powerpc. This means that multiple INTR_FAST handlers can attach to the same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach to the same interrupt. Sharing INTR_FAST handlers may not always be desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun either. Drivers can always still use INTR_EXCL to ask for an interrupt exclusively. The way this sharing works is that when an interrupt comes in, all the INTR_FAST handlers are executed first, and if any threaded handlers exist, the interrupt thread is scheduled afterwards. This type of layout also makes it possible to investigate using interrupt filters ala OS X where the filter determines whether or not its companion threaded handler should run. - Aside from the INTR_FAST changes above, the impact on MD interrupt code is mostly just 's/ithread/intr_event/'. - A new MI ddb command 'show intrs' walks the list of interrupt events dumping their state. It also has a '/v' verbose switch which dumps info about all of the handlers attached to each event. - We currently don't destroy an interrupt thread when the last threaded handler is removed because it would suck for things like ppbus(8)'s braindead behavior. The code is present, though, it is just under #if 0 for now. - Move the code to actually execute the threaded handlers for an interrrupt event into a separate function so that ithread_loop() becomes more readable. Previously this code was all in the middle of ithread_loop() and indented halfway across the screen. - Made struct intr_thread private to kern_intr.c and replaced td_ithd with a thread private flag TDP_ITHREAD. - In statclock, check curthread against idlethread directly rather than curthread's proc against idlethread's proc. (Not really related to intr changes) Tested on: alpha, amd64, i386, sparc64 Tested on: arm, ia64 (older version of patch by cognet and marcel)
# 10f508d9	15-Sep-2005	John Baldwin <jhb@FreeBSD.org>	Don't disallow sleeping for handlers on swi's since some swi handlers (like CAM) do sleep in their handlers. Requested by: scottl
# 51460da8	15-Sep-2005	John Baldwin <jhb@FreeBSD.org>	- Add a new simple facility for marking the current thread as being in a state where sleeping on a sleep queue is not allowed. The facility doesn't support recursion but uses a simple private per-thread flag (TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is set and INVARIANTS is enabled. - Use this new facility to replace the g_xup and g_xdown mutexes that were (ab)used to achieve similar behavior. - Disallow sleeping in interrupt threads when invoking interrupt handlers. MFC after: 1 week Reviewed by: phk
# 943928c9	20-Jun-2005	John Baldwin <jhb@FreeBSD.org>	Simplify the storming logic and remove a variable as a result. Approved by: re (dwhite)
# aa9aa68d	12-Apr-2005	John Baldwin <jhb@FreeBSD.org>	Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic operations in some places and simple non-per CPU math in others.
# 9454b2d8	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for copyright notices, minor format tweaks as necessary
# 63710c4d	30-Dec-2004	John Baldwin <jhb@FreeBSD.org>	Stop explicitly touching td_base_pri outside of the scheduler and simply set a thread's priority via sched_prio() when that is the desired action. The schedulers will start managing td_base_pri internally shortly.
# d0b4135e	17-Nov-2004	John Baldwin <jhb@FreeBSD.org>	Don't bother exiting storming mode once a second to see if it has gone away, instead only exit storming mode when an interrupt stops firing long enough for the ithread to exit the loop and go back to sleep. Tested by: macrus (cruder version)
# a51dae09	16-Nov-2004	John Baldwin <jhb@FreeBSD.org>	Adjust the interrupt storm handling code to better handle a storm. When a storm is detected, enter "storming" mode which throttles the interrupt source such that the handlers are run once every clock tick. Previously we allowed a full set of storm_threshold interations through the handler before going back to sleep. Also, this currently will intentionally exit storming mode once a second to see if the storm has passed. Tested by: marcus Discussed with: bde
# 0811d60a	05-Nov-2004	John Baldwin <jhb@FreeBSD.org>	- Make setting of IT_ENTROPY a bit simpler in ithread_update(). - Tweak the updating of the ithread name in ithread_update() so that the '+' and '*' characters for device names that were too short only get added at the end after as many device names as possible were fit into the allocated space. Prior to this, some long devices would result in '+' chars showing up between two different devices rather than at the end.
# c957c14d	03-Nov-2004	John Baldwin <jhb@FreeBSD.org>	Revert most of 1.109. Although it improved the situation on one particular motherboard, in practice the changes resulted in many false positives for heavy network loads, etc. resulting in poor performance. Also, the motherboard referenced in the 1.109 log has other problems and simply does not seem to work with the APIC enabled even with the changes in 1.109. The correct fix for that board seems to be to not use the APIC at all. One thing kept from 1.109 is that throttled interrupts are now effectively polled on every clock tick rather than just 10 times per second. MFC after: 1 month Tested by: Shunsuke SHINOMIYA shino at fornext dot org
# d39d4a6e	01-Nov-2004	John Baldwin <jhb@FreeBSD.org>	- Change the ddb paging "support" to use a variable (db_lines_per_page) to control the number of lines per page rather than a constant. The variable can be examined and changed in ddb as '$lines'. Setting the variable to 0 will effectively turn off paging. - Change db_putchar() to force out pending whitespace before outputting newlines and carriage returns so that one can rub out content on the current line via '\r \r' type strings. - Change the simple pager to rub out the --More-- prompt explicitly when the routine exits. - Add some aliases to the simple pager to make it more compatible with more(1): 'e' and 'j' do a single line. 'd' does half a page, and 'f' does a full page. MFC after: 1 month Inspired by: kris
# ed062c8d	04-Sep-2004	Julian Elischer <julian@FreeBSD.org>	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
# 2630e4c9	31-Aug-2004	Julian Elischer <julian@FreeBSD.org>	Give setrunqueue() and sched_add() more of a clue as to where they are coming from and what is expected from them. MFC after: 2 days
# 2cfe973b	16-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	Annotate call to DELAY() in interrupt storm mitigation as being something to revisit. Approved by: re (scottl)
# 6f40c417	05-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	In ithread_schedule(), when we plan to go harvest some entropy as a result of scheduling an ithread, cut a KTR_INTR trace record so that it's clear in tracing interrupt activity where and when the entropy harvesting code is invoked.
# 0c0b25ae	02-Jul-2004	John Baldwin <jhb@FreeBSD.org>	Implement preemption of kernel threads natively in the scheduler rather than as one-off hacks in various other parts of the kernel: - Add a function maybe_preempt() that is called from sched_add() to determine if a thread about to be added to a run queue should be preempted to directly. If it is not safe to preempt or if the new thread does not have a high enough priority, then the function returns false and sched_add() adds the thread to the run queue. If the thread should be preempted to but the current thread is in a nested critical section, then the flag TDF_OWEPREEMPT is set and the thread is added to the run queue. Otherwise, mi_switch() is called immediately and the thread is never added to the run queue since it is switch to directly. When exiting an outermost critical section, if TDF_OWEPREEMPT is set, then clear it and call mi_switch() to perform the deferred preemption. - Remove explicit preemption from ithread_schedule() as calling setrunqueue() now does all the correct work. This also removes the do_switch argument from ithread_schedule(). - Do not use the manual preemption code in mtx_unlock if the architecture supports native preemption. - Don't call mi_switch() in a loop during shutdown to give ithreads a chance to run if the architecture supports native preemption since the ithreads will just preempt DELAY(). - Don't call mi_switch() from the page zeroing idle thread for architectures that support native preemption as it is unnecessary. - Native preemption is enabled on the same archs that supported ithread preemption, namely alpha, i386, and amd64. This change should largely be a NOP for the default case as committed except that we will do fewer context switches in a few cases and will avoid the run queues completely when preempting. Approved by: scottl (with his re@ hat)
# bf0acc27	02-Jul-2004	John Baldwin <jhb@FreeBSD.org>	- Change mi_switch() and sched_switch() to accept an optional thread to switch to. If a non-NULL thread pointer is passed in, then the CPU will switch to that thread directly rather than calling choosethread() to pick a thread to choose to. - Make sched_switch() aware of idle threads and know to do TD_SET_CAN_RUN() instead of sticking them on the run queue rather than requiring all callers of mi_switch() to know to do this if they can be called from an idlethread. - Move constants for arguments to mi_switch() and thread_single() out of the middle of the function prototypes and up above into their own section.
# 05b2c96f	05-Jun-2004	Bruce Evans <bde@FreeBSD.org>	Detect interrupt storms better. The storm detection didn't work at all with an ASUS A7N8X-E motherboard in APIC mode, since storming interrupts don't repeat immediately. Use DELAY(1) to wait a bit for them to repeat. This affects all systems. Only delay for the first (10 * intr_storm_threshold) interrupts (per interrupt handler) so that this is only a pessimization while warming up. Throttle after calling the sub-handlers instead of before so that the long delay given by throttling can be used instead of the DELAY(1) to detect storms after warming up. Reduced the throttling period from 1/10 second to 1/hz seconds so that throttling doesn't destroy performance so much. Interrupts that are detected as storming are effectively handled by polling at a frequency of hz Hz. On A7N8X-E's there is another hardware or configuration bug that makes the throttled frequency closer to 2*hz Hz.
# 7b1fe905	16-Apr-2004	Bruce Evans <bde@FreeBSD.org>	Fixed some style bugs in previous commit (mainly an insertion sort error for declarations, and poorly worded messages). Fixed some nearby style bugs (unsorted declarations).
# 7870c3c6	16-Apr-2004	John Baldwin <jhb@FreeBSD.org>	- Enable (unmask) interrupt sources earlier in the ithread loop. Specifically, we used to enable the source after locking sched_lock and just before we had already decided to do a context switch. This meant that an ithread could never process more than one interrupt per context switch. Enabling earlier in the loop before sched_lock is acquired allows an ithread to handle multiple interrupts per context switch if interrupts fire very rapidly. For the case of heavy interrupt load this can reduce the number of context switches (and thus overhead) as well as reduce interrupt latency. - Now that we can handle multiple interrupts per context switch, add simple interrupt storm protection to threaded interrupts. If X number of consecutive interrupts are triggered before the itherad voluntarily yields to another thread, then the interrupt thread will sleep with the associated interrupt source disabled (masked) for 1/10th of a second. The default value of X is 500, but it can be tweaked via the tunable/ sysctl hw.intr_storm_threshold. If an interrupt storm is detected, then a message is output to the kernel console on the first occurrence per interrupt thread. Interrupt storm protection can be disabled completely by setting this value to 0. There is no scientific reasoning for the 1/10th of a second or 500 interrupts values, so they may require tweaking at some point in the future. Tested by: rwatson (an earlier version w/o the storm protection) Tested by: mux (reportedly made a machine with two PCI interrupts storming usable rather than hard locked) Reviewed by: imp
# 60744399	05-Mar-2004	John Baldwin <jhb@FreeBSD.org>	kthread_exit() no longer requires Giant, so don't force callers to acquire Giant just to call kthread_exit(). Requested by: many
# 29bcc451	24-Jan-2004	Jeff Roberson <jeff@FreeBSD.org>	- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or SW_INVOL. Assert that one of these is set in mi_switch() and propery adjust the rusage statistics. This is to simplify the large number of users of this interface which were previously all required to adjust the proper counter prior to calling mi_switch(). This also facilitates more switch and locking optimizations. - Change all callers of mi_switch() to pass the appropriate paramter and remove direct references to the process statistics.
# 288e351b	13-Jan-2004	Don Lewis <truckman@FreeBSD.org>	If a device attach routine fails during boot and calls bus_teardown_intr(), ithread_remove_handler() may fail to remove the interrupt handler if it decides to let the ithread do the removal. The problem is that during boot "cold" is set, which causes msleep() to return immediately. This will cause ithread_remove_handler() to fail to wait for the ithread to do the removal from the handler TAILQ before freeing the handler back to the heap. Bad things will happen when some other user of the TAILQ, such as ithread_add_handler() or the actual ithread attempts to use the freed handler. Fix the problem by forcing ithread_remove_handler() to do the actual removal itself if the "cold" flag is set. Reviewed by: jhb
# 4e3a7a14	20-Nov-2003	Mark Murray <markm@FreeBSD.org>	Fix a major faux pas of mine. I was causing 2 very bad things to happen in interrupt context; 1) sleep locks, and 2) malloc/free calls. 1) is fixed by using spin locks instead. 2) is fixed by preallocating a FIFO (implemented with a STAILQ) and using elements from this FIFO instead. This turns out to be rather fast. OK'ed by: re (scottl) Thanks to: peter, jhb, rwatson, jake Apologies to: *
# 3fed54aa	18-Nov-2003	Mark Murray <markm@FreeBSD.org>	Hackfix to patch around a kernel panic I introduced. Real fix to follow. In the meanwhile, we are not harvesting interrupt entropy. Approved by: re (jhb)
# 90e3387e	16-Nov-2003	Peter Wemm <peter@FreeBSD.org>	Expand the argument to the ithread enable/disable helper hooks from an int to something big enough to hold a pointer. amd64 needs this.
# 8bc08464	03-Nov-2003	John Baldwin <jhb@FreeBSD.org>	Don't require INTR_FAST handlers to be exclusive in the MI layer. Instead, let the MD code choose whether or not to implement such a policy. The new i386 interrupt code allows multiple FAST handlers for a given source for example. However, the code does not allow FAST and non-FAST handlers to be mixed.
# 8b201c42	24-Oct-2003	John Baldwin <jhb@FreeBSD.org>	- Add a DDB command 'show intrcnt' to show the non-zero interrupt counts. - Add a DDB function to dump the contents of an ithread and optionally details about each handler in that ithread. This function can be used by MD code to implement DDB commands that display information about interrupt sources and their registered handlers.
# 79501b66	01-Jul-2003	Scott Long <scottl@FreeBSD.org>	Make swi_vm be INTR_MPSAFE. On all platforms, it is only used to activate busdma_swi(). Now that busdma_swi() uses driver-provided locking, this should be safe.
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 67096659	31-May-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unused variable(s). Found by: FlexeLint
# b1ac98d8	01-May-2003	Julian Elischer <julian@FreeBSD.org>	Move the flag that indicates an idle thread from the KSE to the thread. It was always referenced via the thread anyhow. Reviewed by: jhb (a LOOOOONG time ago)
# e674d807	17-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Add some locking in for a few proc and thread fields.
# 8804bf6b	17-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Use local struct proc variables to reduce repeated td->td_proc dereferences and improve readability.
# 9520fc2b	17-Apr-2003	John Baldwin <jhb@FreeBSD.org>	Adjust a KTR trace to log thread state instead of proc state as that is more relevant.
# 9b4982bf	04-Mar-2003	John Baldwin <jhb@FreeBSD.org>	Add a WITNESS_WARN() call to verify that we hold no locks after running a handler from an interrupt thread.
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# 4a338afd	17-Feb-2003	Julian Elischer <julian@FreeBSD.org>	Move a bunch of flags from the KSE to the thread. I was in two minds as to where to put them in the first case.. I should have listenned to the other mind. Submitted by: parts by davidxu@ Reviewed by: jeff@ mini@
# c11110ea	14-Feb-2003	Alfred Perlstein <alfred@FreeBSD.org>	Fix crash dumps on ata and scsi. To fix scsi, don't wait for ithreads if we're dumping, it makes the debugger sad. To fix ata, use what appears to be a polling method if we're dumping, I stole this from tmm but added code to ensure that this change is only in effect while dumping. Tested by: des
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 24fbeaf9	28-Dec-2002	Jake Burkholder <jake@FreeBSD.org>	Don't put a newline in KTR traces.
# bb8992b3	17-Oct-2002	Robert Drehmel <robert@FreeBSD.org>	Instead of (sizeof(source_buffer) - 1) bytes, copy at most (sizeof(destination_buffer) - 1) bytes into the destination buffer. This was not harmful because they currently both provide space for (MAXCOMLEN + 1) bytes.
# e80fb434	17-Oct-2002	Robert Drehmel <robert@FreeBSD.org>	Use strlcpy() instead of strncpy() to copy NUL terminated strings for safety and consistency.
# 316ec49a	02-Oct-2002	Scott Long <scottl@FreeBSD.org>	Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates. Reviewed by: jake, peter, jhb
# 37c84183	28-Sep-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
# 98f93c07	22-Sep-2002	Jake Burkholder <jake@FreeBSD.org>	Removed unneeded include (missed in last revision).
# e3b6e33c	21-Sep-2002	Jake Burkholder <jake@FreeBSD.org>	Moved netisr code from kern/kern_intr.c to net/netisr.c as threatened in a comment.
# 71fad9fd	11-Sep-2002	Julian Elischer <julian@FreeBSD.org>	Completely redo thread states. Reviewed by: davidxu@freebsd.org
# 65c17e74	05-Sep-2002	David Xu <davidxu@FreeBSD.org>	Remove extra ';'
# 04774f23	01-Aug-2002	Julian Elischer <julian@FreeBSD.org>	Slight cleanup of some comments/whitespace. Make idle process state more consistant. Add an assert on thread state. Clean up idleproc/mi_switch() interaction. Use a local instead of referencing curthread 7 times in a row (I've been told curthread can be expensive on some architectures) Remove some commented out code. Add a little commented out code (completion coming soon) Reviewed by: jhb@freebsd.org
# e602ba25	29-Jun-2002	Julian Elischer <julian@FreeBSD.org>	Part 1 of KSE-III The ability to schedule multiple threads per process (one one cpu) by making ALL system calls optionally asynchronous. to come: ia64 and power-pc patches, patches for gdb, test program (in tools) Reviewed by: Almost everyone who counts (at various times, peter, jhb, matt, alfred, mini, bernd, and a cast of thousands) NOTE: this is still Beta code, and contains lots of debugging stuff. expect slight instability in signals..
# 2d0231f5	29-May-2002	Julian Elischer <julian@FreeBSD.org>	diff reduction from KSE to keep WW-III from happenning on -current
# b106d2f5	11-Apr-2002	John Baldwin <jhb@FreeBSD.org>	- Set the base priority of an ithread that has no handlers when we set its normal priority. - Lock sched_lock while we dink with the priorities. - Remove a few extra blank lines.
# 2b60cfc5	09-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Don't lock the ithread lock in ithread_create(). The ithread isn't on any lists or in any tables yet so there are no other references to it, thus we don't need to lock it.
# 6008862b	04-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 4d77a549	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# 2dbd9d5b	09-Mar-2002	Luigi Rizzo <luigi@FreeBSD.org>	Make the DEVICE_POLLING code compile with -Werror and in LINT
# daccb638	11-Feb-2002	Luigi Rizzo <luigi@FreeBSD.org>	MFS: synchronize the code with the version in -stable, specifically: + SYSCTL_ULONG -> SYSCTL_UINT + some procedure renaming and variable rearrangement + fix the 'interface going deaf' problem same as in -stable.
# 2c100766	11-Feb-2002	Julian Elischer <julian@FreeBSD.org>	In a threaded world, differnt priorirites become properties of different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
# 079b7bad	07-Feb-2002	Julian Elischer <julian@FreeBSD.org>	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
# c86b6ff5	05-Jan-2002	John Baldwin <jhb@FreeBSD.org>	Change the preemption code for software interrupt thread schedules and mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
# e4fc250c	14-Dec-2001	Luigi Rizzo <luigi@FreeBSD.org>	Device Polling code for -current. Non-SMP, i386-only, no polling in the idle loop at the moment. To use this code you must compile a kernel with options DEVICE_POLLING and at runtime enable polling with sysctl kern.polling.enable=1 The percentage of CPU reserved to userland can be set with sysctl kern.polling.user_frac=NN (default is 50) while the remainder is used by polling device drivers and netisr's. These are the only two variables that you should need to touch. There are a few more parameters in kern.polling but the default values are adequate for all purposes. See the code in kern_poll.c for more details on them. Polling in the idle loop will be implemented shortly by introducing a kernel thread which does the job. Until then, the amount of CPU dedicated to polling will never exceed (100-user_frac). The equivalent (actually, better) code for -stable is at http://info.iet.unipi.it/~luigi/polling/ and also supports polling in the idle loop. NOTE to Alpha developers: There is really nothing in this code that is i386-specific. If you move the 2 lines supporting the new option from sys/conf/{files,options}.i386 to sys/conf/{files,options} I am pretty sure that this should work on the Alpha as well, just that I do not have a suitable test box to try it. If someone feels like trying it, I would appreciate it. NOTE to other developers: sure some things could be done better, and as always I am open to constructive criticism, which a few of you have already given and I greatly appreciated. However, before proposing radical architectural changes, please take some time to possibly try out this code, or at the very least read the comments in kern_poll.c, especially re. the reason why I am using a soft netisr and cannot (I believe) replace it with a simple timeout. Quick description of files touched by this commit: sys/conf/files.i386 new file kern/kern_poll.c sys/conf/options.i386 new option sys/i386/i386/trap.c poll in trap (disabled by default) sys/kern/kern_clock.c initialization and hardclock hooks. sys/kern/kern_intr.c minor swi_net changes sys/kern/kern_poll.c the bulk of the code. sys/net/if.h new flag sys/net/if_var.h declaration for functions used in device drivers. sys/net/netisr.h NETISR_POLL sys/dev/fxp/if_fxp.c sys/dev/fxp/if_fxpvar.h sys/pci/if_dc.c sys/pci/if_dcreg.h sys/pci/if_sis.c sys/pci/if_sisreg.h device driver modifications
# 91f91617	09-Dec-2001	David E. O'Brien <obrien@FreeBSD.org>	Repeat after me -- "Use of ANSI string concatenation can be bad." In this case, C99's __func__ is properly defined as: static const char __func__[] = "function-name"; and GCC 3.1 will not allow it to be used in bogus string concatenation.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 6533ba2e	02-Sep-2001	David E. O'Brien <obrien@FreeBSD.org>	Match the declaration in net/netisr.h. Submitted by: gcc 3.0.1
# 688ebe12	10-Aug-2001	John Baldwin <jhb@FreeBSD.org>	- Close races with signals and other AST's being triggered while we are in the process of exiting the kernel. The ast() function now loops as long as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with preemption disabled so that any further AST's that arrive via an interrupt will be delayed until the low-level MD code returns to user mode. - Use u_int's to store the tick counts for profiling purposes so that we do not need sched_lock just to read p_sticks. This also closes a problem where the call to addupc_task() could screw up the arithmetic due to non-atomic reads of p_sticks. - Axe need_proftick(), aston(), astoff(), astpending(), need_resched(), clear_resched(), and resched_wanted() in favor of direct bit operations on p_sflag. - Fix up locking with sched_lock some. In addupc_intr(), use sched_lock to ensure pr_addr and pr_ticks are updated atomically with setting PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing PS_OWEUPC. We also do not grab the lock just to test a flag. - Simplify the handling of Giant in ast() slightly. Reviewed by: bde (mostly)
# a300519d	29-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Make the schedlock saved critical section state a per-thread property.
# 84bbc4db	25-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Count the switch when an ithread goes idle as a voluntary context switch. Submitted by: bde
# 2e1aaccc	20-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Preemption by an interrupt thread is an involuntary switch, not a voluntary one. Pointy-hat to: me
# 5a280d9c	16-Jun-2001	Peter Wemm <peter@FreeBSD.org>	Add INTR_TYPE_AV so that we can get to the PI_AV priority in the ithread handlers. This is beneficial since it means that pcm's MPSAFE handler can get run before things that will block on Giant in the shared irq case.
# d279178d	01-Jun-2001	Thomas Moestl <tmm@FreeBSD.org>	Clean up the code exporting interrupt statistics via sysctl a bit: - move the sysctl code to kern_intr.c - do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine the length of the intrcnt array - move the declarations of intrnames, eintrnames, intrcnt and eintrcnt from machine-dependent include files to sys/interrupt.h - remove the hw.nintr sysctl, it is not needed. - fix various style bugs Requested by: bde Reviewed by: bde (some time ago)
# 4d29cb2d	17-May-2001	John Baldwin <jhb@FreeBSD.org>	- Remove the global ithread_list_lock spin lock in favor of per-ithread sleep locks. - Delay returning from ithread_remove_handler() until we are certain that the interrupt handler being removed has in fact been removed from the ithread. - XXX: There is still a problem in that nothing protects the kernel from adding a new handler while the ithread is running, though with our current architectures this is not a problem. Requested by: gibbs (2)
# 8bd57f8f	15-May-2001	John Baldwin <jhb@FreeBSD.org>	Remove unneeded includes of sys/ipl.h and machine/ipl.h.
# 6caa8a15	27-Apr-2001	John Baldwin <jhb@FreeBSD.org>	Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP. - It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the _process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the _process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the _process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the _process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in. Reviewed by: jake, peter Looked over by: eivind
# f34fa851	28-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Catch up to header include changes: - <sys/mutex.h> now requires <sys/systm.h> - <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
# b944b903	27-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Catch up to the mtx_saveintr -> mtx_savecrit change.
# 1f723035	23-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Use (..., "%s", foo) instead of (..., foo) to avoid a warning about a non-constant format string when calling kthread_create() to create an ithread.
# 003fb9ec	01-Mar-2001	John Baldwin <jhb@FreeBSD.org>	Ok, the kernel will panic in kmem_malloc() if the kernel map is full, so malloc with M_WAITOK can't actually return NULL. I wish I could get two people to give me the same answer about this when I ask... Submitted by: jake
# 653dd8c2	01-Mar-2001	John Baldwin <jhb@FreeBSD.org>	- Check to see if malloc() returned NULL even with M_WAITOK. - Add a KASSERT() to ensure an ithread has a backing kernel thread when we schedule it. - Don't attempt to preemptively switch to an ithread if p_stat of curproc is not SRUN.
# 5b270b2a	27-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Sigh. Try to get priorities sorted out. Don't bother trying to update native priority, it is diffcult to get right and likely to end up horribly wrong. Use an honestly wrong fixed value that seems to work; PUSER for user threads, and the interrupt priority for ithreads. Set it once when the process is created and forget about it. Suggested by: bde Pointy hat: me
# de271f01	21-Feb-2001	John Baldwin <jhb@FreeBSD.org>	Work around a race condition where an interrupt handler can be removed from an interrupt thread while the interrupt thread is blocked on Giant waiting to execute the interrupt handler being removed. The result was that the intrhand structure would be free'd, and we would call 0xdeadc0de. The work around is to check to see if the interrupt thread is idle when removing a handler. If not, then we mark the interrupt handler as being dead using the new IH_DEAD flag and don't remove it from the interrupt threads' list of handlers. When the interrupt thread resumes, it will see a dead handler while traversing the list of handlers and will remove the handler then.
# 60f2b032	21-Feb-2001	John Baldwin <jhb@FreeBSD.org>	Just use the ithread->it_proc directly in a KTR tracepoint instead of assigning a local var to it and using it, as otherwise the local var wasn't used, and generated a warning in the !KTR case. Noticed by: bde
# addec20c	21-Feb-2001	John Baldwin <jhb@FreeBSD.org>	Add KTR tracepoints for adding/removing interrupt handlers, creating/destroying interrupt threads, and updating the state of an interrupt thread.
# 76bd604e	21-Feb-2001	John Baldwin <jhb@FreeBSD.org>	Fix a bug where the 'ithread' variable was being set in a KASSERT() condition and thus was not initialized properly in the !INVARIANTS case. Noticed by: bde Pointy hat to: me
# 719f43d3	21-Feb-2001	John Baldwin <jhb@FreeBSD.org>	Remove attempt to add in PREEMPTION #ifdef test in MI code that didn't work because opt_preemption.h wasn't #include'd. Instead, make use of the do_switch parameter to ithread_schedule() and do the check in the alpha interrupt code.
# 3e5da754	20-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Add a new ithread_schedule() function to do the bulk of the work of scheduling an interrupt thread to run when needed. This has the side effect of enabling support for entropy gathering from interrupts on all architectures. - Change the software interrupt and x86 and alpha hardware interrupt code to use ithread_schedule() for most of their processing when scheduling an interrupt to run. - Remove the pesky Warning message about interrupt threads having entropy enabled. I'm not sure why I put that in there in the first place. - Add more error checking for parameters and change some cases that returned EINVAL to panic on failure instead via KASSERT(). - Instead of doing a documented evil hack of setting the P_NOLOAD flag on every interrupt thread whose pri was SWI_CLOCK, set the flag explicity for clk_ithd's proc during start_softintr().
# d5a08a60	11-Feb-2001	Jake Burkholder <jake@FreeBSD.org>	Implement a unified run queue and adjust priority levels accordingly. - All processes go into the same array of queues, with different scheduling classes using different portions of the array. This allows user processes to have their priorities propogated up into interrupt thread range if need be. - I chose 64 run queues as an arbitrary number that is greater than 32. We used to have 4 separate arrays of 32 queues each, so this may not be optimal. The new run queue code was written with this in mind; changing the number of run queues only requires changing constants in runq.h and adjusting the priority levels. - The new run queue code takes the run queue as a parameter. This is intended to be used to create per-cpu run queues. Implement wrappers for compatibility with the old interface which pass in the global run queue structure. - Group the priority level, user priority, native priority (before propogation) and the scheduling class into a struct priority. - Change any hard coded priority levels that I found to use symbolic constants (TTIPRI and TTOPRI). - Remove the curpriority global variable and use that of curproc. This was used to detect when a process' priority had lowered and it should yield. We now effectively yield on every interrupt. - Activate propogate_priority(). It should now have the desired effect without needing to also propogate the scheduling class. - Temporarily comment out the call to vm_page_zero_idle() in the idle loop. It interfered with propogate_priority() because the idle process needed to do a non-blocking acquire of Giant and then other processes would try to propogate their priority onto it. The idle process should not do anything except idle. vm_page_zero_idle() will return in the form of an idle priority kernel thread which is woken up at apprioriate times by the vm system. - Update struct kinfo_proc to the new priority interface. Deliberately change its size by adjusting the spare fields. It remained the same size, but the layout has changed, so userland processes that use it would parse the data incorrectly. The size constraint should really be changed to an arbitrary version number. Also add a debug.sizeof sysctl node for struct kinfo_proc.
# b4151f71	09-Feb-2001	John Baldwin <jhb@FreeBSD.org>	- Move struct ithd to sys/interrupt.h. - Add a set of MI helper functions for interrupt threads: - ithread_create() creates a new interrupt thread - ithread_destroy() destroys an interrupt thread - ithread_add_handler() attaches a new handler to an interrupt thread - ithread_remove_handler() detaches a handler from an interrupt thread - Rename sinthand_add() and sched_swi() to swi_add() and swi_sched() respectively so that they live in a consistent namespace. - struct intrhand is no longer a public type. It would be private to kern_intr.c but the current implementation of fast interrupts on the alpha requires the type to be exported. However, all handlers should be treated as void * cookies in the way that new-bus treats them. This includes references to software interrupt handlers.
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 198c5b08	19-Jan-2001	Peter Wemm <peter@FreeBSD.org>	Remove the static splXXX functions and replace them by static __inline stubs. Remove the xxx_imask variables which have been all but gone for a while.
# 338c0bc6	30-Dec-2000	Seigo Tanimura <tanimura@FreeBSD.org>	Ignore a net interrupt if the corresponding handler is not registered. This fixes panic on my laptop where a spurious arp packet is received when arp is not ready to run.
# 48786ef4	18-Dec-2000	John Baldwin <jhb@FreeBSD.org>	Fix another sched_sihand -> sched_swi in a KTR trace message.
# 1eb44f02	04-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Remove the last of the MD netisr code. It is now all MI. Remove spending, which was unused now that all software interrupts have their own thread. Make the legacy schednetisr use an atomic op for setting bits in the netisr mask. Reviewed by: jhb
# f6a6e37a	04-Dec-2000	Jake Burkholder <jake@FreeBSD.org>	Whitespace. Fix an overly long line.
# fa2fbc3d	18-Nov-2000	Jake Burkholder <jake@FreeBSD.org>	- Protect the callout wheel with a separate spin mutex, callout_lock. - Use the mutex in hardclock to ensure no races between it and softclock. - Make softclock be INTR_MPSAFE and provide a flag, CALLOUT_MPSAFE, which specifies that a callout handler does not need giant. There is still no way to set this flag when regstering a callout. Reviewed by: -smp@, jlemon
# 896c2303	15-Nov-2000	John Baldwin <jhb@FreeBSD.org>	- Replace some instances of sched_ithd with sched_swi in KTR tracepoints. - Assert that Giant is not owned during the main loop of sithd_loop().
# b5d09a79	10-Nov-2000	John Baldwin <jhb@FreeBSD.org>	Ignore the INTR_MPSAFE flag when calculating the priority of an interrupt thread.
# a924ab97	06-Nov-2000	John Baldwin <jhb@FreeBSD.org>	Minor nit: missed ithd_loop -> sithd_loop in the KTR tracepoints.
# 8088699f	24-Oct-2000	John Baldwin <jhb@FreeBSD.org>	- Overhaul the software interrupt code to use interrupt threads for each type of software interrupt. Roughly, what used to be a bit in spending now maps to a swi thread. Each thread can have multiple handlers, just like a hardware interrupt thread. - Instead of using a bitmask of pending interrupts, we schedule the specific software interrupt thread to run, so spending, NSWI, and the shandlers array are no longer needed. We can now have an arbitrary number of software interrupt threads. When you register a software interrupt thread via sinthand_add(), you get back a struct intrhand that you pass to sched_swi() when you wish to schedule your swi thread to run. - Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit more intuitive. Also, prefix all the members of struct intrhand with 'ih_'. - Make swi_net() a MI function since there is now no point in it being MD. Submitted by: cp
# 35e0e5b3	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# 1931cf94	05-Oct-2000	John Baldwin <jhb@FreeBSD.org>	- Heavyweight interrupt threads on the alpha for device I/O interrupts. - Make softinterrupts (SWI's) almost completely MI, and divorce them completely from the x86 hardware interrupt code. - The ihandlers array is now gone. Instead, there is a MI shandlers array that just contains SWI handlers. - Most of the former machine/ipl.h files have moved to a new sys/ipl.h. - Stub out all the spl*() functions on all architectures. Submitted by: dfr
# 9a94c9c5	13-Sep-2000	John Baldwin <jhb@FreeBSD.org>	- Remove the inthand2_t type and use the equivalent driver_intr_t type from newbus for referencing device interrupt handlers. - Move the 'struct intrec' type which describes interrupt sources into sys/interrupt.h instead of making it just be a x86 structure. - Don't create 'ithd' and 'intrec' typedefs, instead, just use 'struct ithd' and 'struct intrec' - Move the code to translate new-bus interrupt flags into an interrupt thread priority out of the x86 nexus code and into a MI ithread_priority() function in sys/kern/kern_intr.c. - Remove now-uneeded x86-specific headers from sys/dev/ata/ata-all.c and sys/pci/pci_compat.c.
# d1f088da	11-Oct-1999	Peter Wemm <peter@FreeBSD.org>	Trim unused options (or #ifdef for undoc options). Submitted by: phk
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 54a8c693	21-Apr-1999	Peter Wemm <peter@FreeBSD.org>	Stage 1 of a cleanup of the i386 interrupt registration mechanism. Interrupts under the new scheme are managed by the i386 nexus with the awareness of the resource manager. There is further room for optimizing the interfaces still. All the users of register_intr()/intr_create() should be gone, with the exception of pcic and i386/isa/clock.c.
# 1c5bb3ea	10-Nov-1998	Peter Wemm <peter@FreeBSD.org>	add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()
# da653c61	26-Sep-1998	Doug Rabson <dfr@FreeBSD.org>	Start using the new SWI registration system instead of hardwiring everything.
# 18c5a6c4	11-Aug-1998	Bruce Evans <bde@FreeBSD.org>	Implemented dynamic registration of software interrupt handlers. Not used yet. Use dummy SWI handlers to avoid some checks for null pointers.
# a23d65bf	14-Jul-1998	Bruce Evans <bde@FreeBSD.org>	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
# 9a2daf91	18-Jun-1998	Bruce Evans <bde@FreeBSD.org>	Changed the type of an isa/general interrupt handler to take a `void *' arg. Fixed or hid most of the resulting type mismatches. Handlers can now be updated locally (except for reworking their global declarations in isa_device.h).
# 3900ddb2	11-Jun-1998	Doug Rabson <dfr@FreeBSD.org>	Only build this on i386 for now. I may use it for the alpha later but currently it doesn't compile.
# ecbb00a2	07-Jun-1998	Doug Rabson <dfr@FreeBSD.org>	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
# ab36c3d3	16-Apr-1998	Bruce Evans <bde@FreeBSD.org>	Really finish supporting compiling with `gcc -ansi'.
# cfa5673e	10-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Move include of <machine/ipl.h> inside ifndef SMP where it is used, to avoid getting 'unused include file' warnings in the SMP case.
# fd35ecc3	05-Oct-1997	Nate Williams <nate@FreeBSD.org>	- Hide the 'device doesn't supported shared interrupts' code behind bootverbose, since the older register_intr() code didn't print out anything, and the laptop support will cause lots of these un-necessary messages.
# fbca51f5	21-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Added a half dozen casts to eliminate annoying warnings.
# 77625cfe	19-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Moved splq() to isa/ipl_funcs.c for SMP only. This is in preperation for moving all cpl accesses behind a critical region lock.
# 1fd0b058	02-Aug-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# c6d372f6	09-Jul-1997	Andrey A. Chernov <ache@FreeBSD.org>	Back out changes for 'conflicts' with IRQ, remove intr_registered()
# 7f533ff7	08-Jun-1997	Andrey A. Chernov <ache@FreeBSD.org>	Add safety check in case "conflicts" keyword specified more times than needed
# 0c514a25	02-Jun-1997	Doug Rabson <dfr@FreeBSD.org>	The defines INTR_FAST and INTR_EXCL are part of the public interface. The previous commit made them private which broke things.
# 68352337	02-Jun-1997	Doug Rabson <dfr@FreeBSD.org>	Move interrupt handling code from isa.c to a new file. This should make isa.c (slightly) more portable and will make my life developing the really portable version much easier. Reviewed by: peter, fsmp
# 8c046d14	01-Jun-1997	Peter Wemm <peter@FreeBSD.org>	Move "typedef struct intrec {} intrec" from sys/interrupt.h to kern_intr.c since that's the only place that it's used. Submitted by: se (apparently on suggestion from dfr)
# 4407ec71	31-May-1997	Peter Wemm <peter@FreeBSD.org>	<machine/spl.h> -> <machine/ipl.h> s/intrmask/intrmask_t/g Reviewed by: bde, se
# f9220cde	28-May-1997	Stefan Eßer <se@FreeBSD.org>	Fix problem reported by PHK: Panic in pcic probe because of NULL pointer dereference (head->next in intr_disconnect).
# 425f9fda	26-May-1997	Stefan Eßer <se@FreeBSD.org>	Add support for shared interrupts to the kernel. This code is meant be (eventually) architecture independent. It provides an emulation of the ISA interrupt registration function register_intr(), but that function does no longer manipulated the interrupt controller and interrupt descriptor table, but calls the architecture dependent function setup_icu() for that purpose. After theISA/EISA bus code has been modified to directly call the new interrupt registartion functions (intr_create() and intr_connect()), the emulation of register_intr() should be dropped. The C level interrupt handler function should take a (void*) argument, and the function pointer type (inthand2_t) should defined in some other place than isa_device.h. This commit is a pre-requisite for the removal of the PCI specific shared interrupt code. Reviewed by: dfr,bde