History log of /freebsd-current/sys/kern/kern_intr.c
Revision Date Author Comments
# a2409f17 10-May-2024 Warner Losh <imp@FreeBSD.org>

intr: Document how to get the interrupt frame

Document that the only way to get the interrupt thread is to use
curthread->td_intr_frame, rather than the old-style of having a NULL
pointer for the interrupt thread. As of 38c35248fe3b, support for that
has been removed. I neglected to update that commit message with these
details.

Suggested by: mhorne


# 38c35248 08-Oct-2021 Elliott Mitchell <ehem+freebsd@m5p.com>

kern/intr: remove support for passing trap frame as argument

While otherwise a handy potential approach, getting the trap frame via
the argument isn't documented and isn't supposed to be used. With all
uses removed, now remove support to end the mixed calling conventions.

Differential Revision: https://reviews.freebsd.org/D37688

Reviewed by: imp, mhorne
Pull Request: https://github.com/freebsd/freebsd-src/pull/1225


# a9e0f316 09-May-2024 Elliott Mitchell <ehem+freebsd@m5p.com>

kern/intr: redeclare intr_setaffinity()'s third arg constant

This matches reality and allows removal of a __DECONST().

Fixes: 4c72d075a57 ("LinuxKPI: const argument to irq_set_affinity_hint()")
Fixes: 9b33b154b53 ("Add support to cpuset for binding hardware interrupts")
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1126


# cd04887b 09-May-2024 Elliott Mitchell <ehem+freebsd@m5p.com>

kern/intr: change ->ie_irq to unsigned

All architecture implementations actually want this to be unsigned.
INTRNG the equivalent is overtly unsigned. x86 and PowerPC merely avoid
the need to explicitly convert at several points.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1126


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 39888ed7 15-Oct-2022 Mitchell Horne <mhorne@FreeBSD.org>

kern_intr: Check for NULL event in intr_destroy()

It likely won't happen, but is consistent with the other functions of
this KPI.

Reviewed by: imp, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D33479


# 05b727fe 12-Oct-2022 Mitchell Horne <mhorne@FreeBSD.org>

Downgrade tty_intr_event from a global

It can be static within uart_tty.c. It is an open question whether there
remains any real benefit to having uart instances share a swi thread.

Reviewed by: imp, markj, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36938


# d744e271 04-Sep-2022 Gordon Bergling <gbe@FreeBSD.org>

kern: Remove a double word in a source code comment

- s/that that/that/

MFC after: 3 days


# c84c5e00 18-Jul-2022 Mitchell Horne <mhorne@FreeBSD.org>

ddb: annotate some commands with DB_CMD_MEMSAFE

This is not completely exhaustive, but covers a large majority of
commands in the tree.

Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35583


# 2cf78708 14-Jul-2022 John Baldwin <jhb@FreeBSD.org>

Collapse interrupt thread priorities.

Allow high priority hardware interrupts to run at PI_REALTIME via
INTR_TYPE_CLK, but collapse all other hardware interrupt threads to
the next priority level (PI_INTR). Collapse all SWI priorities to
the same priority level (PI_SOFT) just below PI_INTR.

Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35646


# ed998d1c 14-Jul-2022 John Baldwin <jhb@FreeBSD.org>

ithreads: Support priority adjustment by schedulers.

Use sched_wakeup instead of sched_add when marking an ithread
runnable. This allows schedulers to reset their internal time slice
tracking state and restore the base ithread priority when an ithread
resumes from idle.

Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35643


# fea89a28 14-Jul-2022 John Baldwin <jhb@FreeBSD.org>

Add sched_ithread_prio to set the base priority of an interrupt thread.

Use it instead of sched_prio when setting the priority of an interrupt
thread.

Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35642


# 254e4e5b 28-Dec-2021 John Baldwin <jhb@FreeBSD.org>

Simplify swi for bus_dma.

When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.

Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.

One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.

Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447


# 7bc13692 20-Sep-2021 Wojciech Macek <wma@FreeBSD.org>

hwpmc: fix performance issues

Differential revision: https://reviews.freebsd.org/D32025

Avoid using atomics as it_wait is guarded by td_lock.

Report threshold calculation is done only if at least one PMC hook
is installed

Fixes:
* avoid unnecessary branching (if frame != null ...)
by having PMC_HOOK_INSTALLED_ANY
condition on the top of them, which should hint
the core not to execute speculatively anything
which us underneath;
* access intr_hwpmc_waiting_report_threshold cacheline
only if at least one hook is loaded;


# 6fa041d7 12-Sep-2021 Wojciech Macek <wma@FreeBSD.org>

Measure latency of PMC interruptions

Add HWPMC events to measure latency.
Provide sysctl to choose the number of outstanding events which
trigger HWPMC event.

Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D31283


# 67f508db 10-Aug-2021 Alexander Motin <mav@FreeBSD.org>

Mark some sysctls as CTLFLAG_MPSAFE.

MFC after: 2 weeks


# 6eb60f5b 09-Mar-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

Use the word "LinuxKPI" instead of "Linux compatibility", to not confuse with
user-space Linux compatibility support. No functional change.

MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking


# fa2528ac 18-Feb-2021 Alex Richardson <arichardson@FreeBSD.org>

Use atomic loads/stores when updating td->td_state

KCSAN complains about racy accesses in the locking code. Those races are
fine since they are inside a TD_SET_RUNNING() loop that expects the value
to be changed by another CPU.

Use relaxed atomic stores/loads to indicate that this variable can be
written/read by multiple CPUs at the same time. This will also prevent
the compiler from doing unexpected re-ordering.

Reported by: GENERIC-KCSAN
Test Plan: KCSAN no longer complains, kernel still runs fine.
Reviewed By: markj, mjg (earlier version)
Differential Revision: https://reviews.freebsd.org/D28569


# aba10e13 25-Jul-2020 Alexander Motin <mav@FreeBSD.org>

Allow swi_sched() to be called from NMI context.

For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.

To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25754


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# f912e8f2 10-Feb-2020 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix for unbalanced EPOCH(9) usage in the generic kernel interrupt
handler.

Interrupt handlers are removed via intr_event_execute_handlers() when
IH_DEAD is set. The thread removing the interrupt is woken up, and
calls intr_event_update(). When this happens, the ie_hflags are
cleared and re-built from all the remaining handlers sharing the
event. When the last IH_NET handler is removed, the IH_NET flag will
be cleared from ih_hflags (or ie_hflags may still be being rebuilt in
a different context), and the ithread_execute_handlers() may return
with ie_hflags missing IH_NET. This can lead to a scenario where
IH_NET was present before calling ithread_execute_handlers, and is not
present at its return, meaning the need for epoch must be cached
locally.

This can happen when loading and unloading network drivers. Also make
sure the ie_hflags is not cleared before being updated.

This is a regression issue after r357004.

Backtrace:
panic()
# trying to access epoch tracker on stack of dead thread
_epoch_enter_preempt()
ifunit_ref()
ifioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()
syscallenter()
amd64_syscall()

Differential Revision: https://reviews.freebsd.org/D23483
Reviewed by: glebius@, gallatin@, mav@, jeff@ and kib@
Sponsored by: Mellanox Technologies


# 511d1afb 22-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Enter the network epoch for interrupt handlers of INTR_TYPE_NET.

Provide tunable to limit how many times handlers may be executed
without reentering epoch.

Differential Revision: https://reviews.freebsd.org/D23242


# c4eb6630 22-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Add ie_hflags to struct intr_event, which accumulates flags from all
handlers on this event. For now handle only IH_ENTROPY in that manner.


# 686bcb5c 15-Dec-2019 Jeff Roberson <jeff@FreeBSD.org>

schedlock 4/4

Don't hold the scheduler lock while doing context switches. Instead we
unlock after selecting the new thread and switch within a spinlock
section leaving interrupts and preemption disabled to prevent local
concurrency. This means that mi_switch() is entered with the thread
locked but returns without. This dramatically simplifies scheduler
locking because we will not hold the schedlock while spinning on
blocked lock in switch.

This change has not been made to 4BSD but in principle it would be
more straightforward.

Discussed with: markj
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22778


# 61a74c5c 15-Dec-2019 Jeff Roberson <jeff@FreeBSD.org>

schedlock 1/4

Eliminate recursion from most thread_lock consumers. Return from
sched_add() without the thread_lock held. This eliminates unnecessary
atomics and lock word loads as well as reducing the hold time for
scheduler locks. This will eventually allow for lockless remote adds.

Discussed with: kib
Reviewed by: jhb
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22626


# 4b28d96e 13-Dec-2019 John Baldwin <jhb@FreeBSD.org>

Remove the deprecated timeout(9) interface.

All in-tree consumers have been converted to callout(9).

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D22602


# 5d0e8299 24-May-2019 Conrad Meyer <cem@FreeBSD.org>

Disable intr_storm_threshold mechanism by default

The ixl.4 manual page has documented that the threshold falsely detects
interrupt storms on 40Gbit NICs as long ago as 2015, and we have seen
similar false positives with the ioat(4) DMA device (which can push GB/s).

For example, synthetic load can be generated with tools/tools/ioat
'ioatcontrol 0 200 8192 1 1000' (allocate 200x8kB buffers, generate an
interrupt for each one, and do this for 1000 milliseconds). With
storm-detection disabled, the Broadwell-EP version of this device is capable
of generating ~350k real interrupts per second.

The following historical context comes from jhb@: Originally, the threshold
worked around incorrect routing of PCI INTx interrupts on single-CPU systems
which would end up in a hard hang during boot. Since the threshold was
added, our PCI interrupt routing was improved, most PCI interrupts use
edge-triggered MSI instead of level-triggered INTx, and typical systems have
multiple CPUs available to service interrupts.

On the off chance that the threshold is useful in the future, it remains
available as a tunable and sysctl.

Reviewed by: jhb
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20401


# 4e255d74 10-May-2019 Andrew Gallatin <gallatin@FreeBSD.org>

Bind TCP HPTS (pacer) threads to NUMA domains

Bind the TCP pacer threads to NUMA domains and build per-domain
pacer-thread lookup tables. These tables allow us to use the
inpcb's NUMA domain information to match an inpcb with a pacer
thread on the same domain.

The motivation for this is to keep the TCP connection local to a
NUMA domain as much as possible.

Thanks to jhb for pre-reviewing an earlier version of the patch.

Reviewed by: rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20134


# 6d51c0fe 24-Mar-2019 Ian Lepore <ian@FreeBSD.org>

Revert accidental change that should not have been included in r345475.
I had changed this value as part of a local experiment, and neglected to
change it back before committing the other changes.


# 67da50a0 24-Mar-2019 Ian Lepore <ian@FreeBSD.org>

Truncate a too-long interrupt handler name when there is only one handler.

There are only 19 bytes available for the name of an interrupt plus the
name(s) of handlers/drivers using it. There is a mechanism from the days of
shared interrupts that replaces some of the handler names with '+' when they
don't all fit into 19 bytes.

In modern times there is typically only one device on an interrupt, but long
device names are the norm, especially with embedded systems. Also, in systems
with multiple interrupt controllers, the names of the interrupts themselves
can be long. For example, 'gic0,s54: imx6_anatop0' doesn't fit, and
replacing the device driver name with a '+' provides no useful info at all.

When there is only one handler but its name was too long to fit, this
change truncates enough leading chars of the handler name (replacing them
with a '-' char to indicate that some chars are missing) to use all 19
bytes, preserving the unit number typically on the end of the name. Using
the prior example, this results in: 'gic0,s54:-6_anatop0' which provides
plenty of info to figure out which device is involved.

PR: 211946
Reviewed by: gonzo@ (prior version without the '-' char)
Differential Revision: https://reviews.freebsd.org/D19675


# 82a5a275 17-Dec-2018 Andriy Gapon <avg@FreeBSD.org>

add support for marking interrupt handlers as suspended

The goal of this change is to fix a problem with PCI shared interrupts
during suspend and resume.

I have observed a couple of variations of the following scenario.
Devices A and B are on the same PCI bus and share the same interrupt.
Device A's driver is suspended first and the device is powered down.
Device B generates an interrupt. Interrupt handlers of both drivers are
called. Device A's interrupt handler accesses registers of the powered
down device and gets back bogus values (I assume all 0xff). That data is
interpreted as interrupt status bits, etc. So, the interrupt handler
gets confused and may produce some noise or enter an infinite loop, etc.

This change affects only PCI devices. The pci(4) bus driver marks a
child's interrupt handler as suspended after the child's suspend method
is called and before the device is powered down. This is done only for
traditional PCI interrupts, because only they can be shared.

At the moment the change is only for x86.

Notable changes in core subsystems / interfaces:
- BUS_SUSPEND_INTR and BUS_RESUME_INTR methods are added to bus
interface along with convenience functions bus_suspend_intr and
bus_resume_intr;
- rman_set_irq_cookie and rman_get_irq_cookie functions are added to
provide a way to associate an interrupt resource with an interrupt
cookie;
- intr_event_suspend_handler and intr_event_resume_handler functions
are added to the MI interrupt handler interface.

I added two new interrupt handler flags, IH_SUSP and IH_CHANGED, to
implement the new intr_event functions. IH_SUSP marks a suspended
interrupt handler. IH_CHANGED is used to implement a barrier that
ensures that a change to the interrupt handler's state is visible
to future interrupts.
While there, I fixed some whitespace issues in comments and changed a
couple of logically boolean variables to be bool.

MFC after: 1 month (maybe)
Differential Revision: https://reviews.freebsd.org/D15755


# 19fa89e9 25-Aug-2018 Mark Murray <markm@FreeBSD.org>

Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR: 230870
Reviewed by: cem
Approved by: so(delphij,gtetlow)
Approved by: re(marius)
Differential Revision: https://reviews.freebsd.org/D16898


# e0fa977e 03-Aug-2018 Andriy Gapon <avg@FreeBSD.org>

safer wait-free iteration of shared interrupt handlers

The code that iterates a list of interrupt handlers for a (shared)
interrupt, whether in the ISR context or in the context of an interrupt
thread, does so in a lock-free fashion. Thus, the routines that modify
the list need to take special steps to ensure that the iterating code
has a consistent view of the list. Previously, those routines tried to
play nice only with the code running in the ithread context. The
iteration in the ISR context was left to a chance.

After commit r336635 atomic operations and memory fences are used to
ensure that ie_handlers list is always safe to navigate with respect to
inserting and removal of list elements.

There is still a question of when it is safe to actually free a removed
element.

The idea of this change is somewhat similar to the idea of the epoch
based reclamation. There are some simplifications comparing to the
general epoch based reclamation. All writers are serialized using a
mutex, so we do not need to worry about concurrent modifications. Also,
all read accesses from the open context are serialized too.

So, we can get away just two epochs / phases. When a thread removes an
element it switches the global phase from the current phase to the other
and then drains the previous phase. Only after the draining the removed
element gets actually freed. The code that iterates the list in the ISR
context takes a snapshot of the global phase and then increments the use
count of that phase before iterating the list. The use count (in the
same phase) is decremented after the iteration. This should ensure that
there should be no iteration over the removed element when its gets
freed.

This commit also simplifies the coordination with the interrupt thread
context. Now we always schedule the interrupt thread when removing one
of handlers for its interrupt. This makes the code both simpler and
safer as the interrupt thread masks the interrupt thus ensuring that
there is no interaction with the ISR context.

P.S. This change matters only for shared interrupts and I realize that
those are becoming a thing of the past (and quickly). I also understand
that the problem that I am trying to solve is extremely rare.

PR: 229106
Reviewed by: cem
Discussed with: Samy Al Bahra
MFC after: 5 weeks
Differential Revision: https://reviews.freebsd.org/D15905


# 111b043c 22-Jul-2018 Andriy Gapon <avg@FreeBSD.org>

change interrupt event's list of handlers from TAILQ to CK_SLIST

The primary reason for this commit is to separate mechanical and nearly
mechanical code changes from an upcoming fix for unsafe teardown of
shared interrupt handlers that have only filters (see D15905).

The technical rationale is that SLIST is sufficient. The only operation
that gets worse performance -- O(n) instead of O(1) is a removal of a
handler, but it is not a critical operation and the list is expected to
be rather short.

Additionally, it is easier to reason about SLIST when considering the
concurrent lock-free access to the list from the interrupt context and
the interrupt thread.

CK_SLIST is used because the upcoming change depends on the memory order
provided by CK_SLIST insert and the fact that CL_SLIST remove does not
trash the linkage in a removed element.

While here, I also fixed a couple of whitespace issues, made code under
ifdef notyet compilable, added a lock assertion to ithread_update() and
made intr_event_execute_handlers() static as it had no external callers.

Reviewed by: cem (earlier version)
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D16016


# a0638b33 24-May-2018 Conrad Meyer <cem@FreeBSD.org>

Yank crufty INTR_FILTER option

It was introduced to the tree in r169320 and r169321 in May 2007.

It never got much use and never became a kernel default. The code
duplicates the default path quite a bit, with slight modifications. Just
yank out the cruft. Whatever goals were being aimed for can probably be met
within the existing framework, without a flag day option.

Mostly mechanical change: 'unifdef -m -UINTR_FILTER'.

Reviewed by: mmacy
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15546


# fc2e87be 19-May-2018 Matt Macy <mmacy@FreeBSD.org>

intr unbreak KTR/LINT build


# ba3f7276 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

intr: eliminate / annotate unused stack locals


# 8a36da99 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 29dfb631 03-May-2017 Conrad Meyer <cem@FreeBSD.org>

Extend cpuset_get/setaffinity() APIs

Add IRQ placement-only and ithread-only API variants. intr_event_bind
has been extended with sibling methods, as it has many more callsites in
existing code.

Reviewed by: kib@, adrian@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10586


# 83c9dea1 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156


# 01f5e086 21-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

The part of r285680 which removed release semantic for two stores to
it_need was wrong [*]. Restore the releases and add a comment
explaining why it is needed.

Noted by: alc [*]
Reviewed by: bde [*]
Sponsored by: The FreeBSD Foundation


# 1b79b949 19-Jul-2015 Kirk McKusick <mckusick@FreeBSD.org>

Restructure code for readability improvement. No functional change.

Reviewed by: kib


# 283dfee9 18-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

Further cleanup after r285607.

Remove useless release semantic for some stores to it_need. For
stores where the release is needed, add a comment explaining why.

Fence after the atomic_cmpset() op on the it_need should be acquire
only, release is not needed (see above). The combination of
atomic_cmpset() + fence_acq() is better expressed there as
atomic_cmpset_acq().

Use atomic_cmpset() for swi' ih_need read and clear.

Discussed with: alc, bde
Reviewed by: bde
Comments wording provided by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 70a3efc1 15-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

Do not use atomic_swap_int(9), it is not available on all
architectures. Atomic_cmpset_int(9) is a direct replacement, due to
loop. The change fixes arm, arm64, mips an sparc64, which lack
atomic_swap().

Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 615b6ea2 15-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

Reset non-zero it_need indicator to zero atomically with fetching its
current value. It is believed that the change is the real fix for the
issue which was covered over by the r252683.

With the current code, if the interrupt handler sets it_need between
read and consequent reset, the update could be lost and
ithread_execute_handlers() would not be called in response to the lost
update.

The r252683 could have hide the issue since at the moment of commit,
atomic_load_acq_int() did locked cmpxchg on the variable, which puts
the cache line into the exclusive owned state and clears store
buffers. Then the immediate store of zero has very high chance of
reusing the exclusive state of the cache line and make the load and
store sequence operate as atomic swap.

For now, add the acq+rel fence immediately after the swap, to not
disturb current (but excessive) ordering. Acquire is needed for the
ih_need reads after the load, while release does not serve a useful
purpose [*].

Reviewed by: alc
Noted by: alc [*]
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 03bbcb2f 15-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

Style. Remove excessive brackets. Compare non-boolean with zero.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# d1b06863 30-Jun-2015 Mark Murray <markm@FreeBSD.org>

Huge cleanup of random(4) code.

* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
random(4) device in the kernel, and the KERN_ARND sysctl will
return nothing. With RANDOM_DUMMY there will be a random(4) that
always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
(direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
when the dust settles.
- Use macros for locks.
- Fix comments.

* src/share/man/...
- Update the man pages.

* src/etc/...
- The startup/shutdown work is done in D2924.

* src/UPDATING
- Add UPDATING announcement.

* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.

* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.

* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.

* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
selection is the way to go.

* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.

* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.

* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
that instead. All that is needed here is N=0, N++, N==0, and some
localised trickery is used to manufacture a 128-bit 0ULLL.

* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
now the test will do a basic check of compressibility. Clairvoyant
talent is still a good idea.
- This is still a long way off a proper unit test.

* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])

* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.

Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)


# ed95805e 30-Apr-2015 John Baldwin <jhb@FreeBSD.org>

Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains. Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead. Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
components for PV kernels. These components are now standard as they
are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision: https://reviews.freebsd.org/D2362
Reviewed by: royger
Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes: yes


# 10cb2424 30-Oct-2014 Mark Murray <markm@FreeBSD.org>

This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random.

This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources.

The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people.

The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway.

Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to.

My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise.

My Nomex pants are on. Let the feedback commence!

Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?)
Approved by: so(des)


# 3fe93b94 18-Oct-2014 Adrian Chadd <adrian@FreeBSD.org>

Convert a missed u_char cpu -> int cpu.

This was caught by a gcc build.

Reported by: luigi
Sponsored by: Norse Corp, Inc.


# b7627840 04-Oct-2014 Konstantin Belousov <kib@FreeBSD.org>

Add kernel option KSTACK_USAGE_PROF to sample the stack depth on
interrupts and report the largest value seen as sysctl
debug.max_kstack_used. Useful to estimate how close the kernel stack
size is to overflow.

In collaboration with: Larry Baird <lab@gta.com>
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week


# 066da805 17-Sep-2014 Adrian Chadd <adrian@FreeBSD.org>

Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather
than u_char.

Migrate post_filter to use an int for a CPU rather than u_char.

Change intr_event_bind() to use an int for CPU rather than u_char.

It touches the ppc, sparc64, arm and mips machdep code but it should
(hah!) be a no-op.

Tested:

* i386, AMD64 laptops

Reviewed by: jhb


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 81198539 22-Jun-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Permit changing cpu mask for cpu set 1 in presence of drivers
binding their threads to particular CPU.

Changing ithread cpu mask is now performed by special cpuset_setithread().
It creates additional cpuset root group on first bind invocation.

No objection: jhb
Tested by: hiren
MFC after: 2 weeks
Sponsored by: Yandex LLC


# f02e47dc 04-Oct-2013 Mark Murray <markm@FreeBSD.org>

Snapshot. This passes the build test, but has not yet been finished or debugged.

Contains:

* Refactor the hardware RNG CPU instruction sources to feed into
the software mixer. This is unfinished. The actual harvesting needs
to be sorted out. Modified by me (see below).

* Remove 'frac' parameter from random_harvest(). This was never
used and adds extra code for no good reason.

* Remove device write entropy harvesting. This provided a weak
attack vector, was not very good at bootstrapping the device. To
follow will be a replacement explicit reseed knob.

* Separate out all the RANDOM_PURE sources into separate harvest
entities. This adds some secuity in the case where more than one
is present.

* Review all the code and fix anything obviously messy or inconsistent.
Address som review concerns while I'm here, like rename the pseudo-rng
to 'dummy'.

Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)


# c495c935 26-Aug-2013 Mark Murray <markm@FreeBSD.org>

Snapshot; Do some running repairs on entropy harvesting. More needs to follow.


# 3eebd44d 03-Jul-2013 Alfred Perlstein <alfred@FreeBSD.org>

The change in r236456 (atomic_store_rel not locked) exposed a bug
in the ithread code where we could lose ithread interrupts if
intr_event_schedule_thread() is called while the ithread is already
running. Effectively memory writes could be ordered incorrectly
such that the unatomic write of 0 to ithd->it_need (in ithread_loop)
could be delayed until after it was set to be triggered again by
intr_event_schedule_thread().

This was particularly a big problem for CAM because CAM optimizes
scheduling of ithreads such that it only signals camisr() when it
queues to an empty queue. This means that additional completion
events would not unstick CAM and quickly lead to a complete lockup
of the CAM stack.

To fix this use atomics in all places where ithd->it_need is accessed.

Submitted by: delphij, mav
Obtained from: TrueOS, iXsystems
MFC After: 1 week


# b887a155 05-Apr-2013 Konstantin Belousov <kib@FreeBSD.org>

If filter of the interrupt event is not null, print it, in addition to
the handler address. Add a mark to distinguish between filter and
handler.

Note that the arguments for both filter and handler are same.

Sponsored by: The FreeBSD Foundation
Reviewed by: jhb
MFC after: 1 week


# dbd2e167 04-Mar-2013 Davide Italiano <davide@FreeBSD.org>

MFcalloutng (r244355):
Make loadavg calculation callout direct. There are several reasons for it:
- it is very simple and doesn't worth context switch to SWI;
- since SWI is no longer used here, we can remove twelve years old hack,
excluding this SWI from from the loadavg statistics;
- it fixes problem when eventtimer (HPET) shares interrupt with some other
device, and that interrupt thread counted as permanent loadavg of 1; now
loadavg accounted before that interrupt thread is scheduled.

Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo, marius, ian, Fabian Keil, markj


# dae3dc73 06-Feb-2013 Neel Natu <neel@FreeBSD.org>

If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

The text used above is verbatim from r195249 and the code should now be
in line with the intent of that commit.


# d95dca1d 25-Sep-2012 John Baldwin <jhb@FreeBSD.org>

Add optional entropy harvesting for software interrupts in swi_sched()
as controlled by kern.random.sys.harvest.swi. SWI harvesting feeds into
the interrupt FIFO and each event is estimated as providing a single bit of
entropy.

Reviewed by: markm, obrien
MFC after: 2 weeks


# c9516c94 06-Aug-2012 Alexander Kabaev <kan@FreeBSD.org>

Do not add handler to event handlers list until ithread is created.

In rare event when fast and ithread interrupts share the same vector
and the fast handler was registered first, we can end up trying to
schedule the ithread that is not created yet. The kernel built with
INVARIANTS then triggers an assertion.

Change the order to create the ithread first and only then add the
handler that needs it to the interrupt event handlers list.

Reviewed by: jhb


# 85729c2c 09-Mar-2012 Juli Mallett <jmallett@FreeBSD.org>

Export intrcnt correctly when running under 32-bit compatibility.

Reviewed by: gonzo, nwhitehorn


# 44ad5475 08-Mar-2012 John Baldwin <jhb@FreeBSD.org>

Add a new sched_clear_name() method to the scheduler interface to clear
the cached name used for KTR_SCHED traces when a thread's name changes.
This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's
name changes, most commonly via execve().

MFC after: 2 weeks


# 037f43d3 16-Jan-2012 Sergey Kandaurov <pluknet@FreeBSD.org>

Be pedantic and change // comment to C-style one.

Noticed by: Bruce Evans


# dc15eac0 01-Jan-2012 Ed Schouten <ed@FreeBSD.org>

Use strchr() and strrchr().

It seems strchr() and strrchr() are used more often than index() and
rindex(). Therefore, simply migrate all kernel code to use it.

For the XFS code, remove an empty line to make the code identical to
the code in the Linux kernel.


# 521ea19d 18-Jul-2011 Attilio Rao <attilio@FreeBSD.org>

- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.

No MFC is previewed for this patch.

Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)


# 5bd186a6 26-Apr-2011 Jeff Roberson <jeff@FreeBSD.org>

- Catch up to falloc() changes.
- PHOLD() before using a task structure on the stack.
- Fix a LOR between the sleepq lock and thread lock in _intr_drain().


# e4cd31dd 21-Mar-2011 Jeff Roberson <jeff@FreeBSD.org>

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# d3305205 11-Jan-2011 John Baldwin <jhb@FreeBSD.org>

- Retire some unused ithread priorities: PI_TTYHIGH, PI_TAPE, and
PI_DISKLOW. While here, rename PI_TTYLOW to PI_TTY.
- Add a macro PI_SWI() that takes a SWI_* constant as an argument and
returns the suitable thread priority.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 1f255bd3 10-Jun-2010 Alexander Motin <mav@FreeBSD.org>

Store interrupt trap frame into struct thread. It allows interrupt handler
to obtain both trap frame and opaque argument submitted on registrction.
After kernel and all drivers get used to it, legacy hack can be removed.

Reviewed by: jhb@


# 932a288f 28-Feb-2010 Andriy Gapon <avg@FreeBSD.org>

MFC r203061: KASSERT contract of return value of interrupt filter

X-MFCto7 after: 1 week


# 89fc20cc 27-Jan-2010 Andriy Gapon <avg@FreeBSD.org>

KASSERT that return value of interrupt filter complies with contract

For example a return value of zero could lead to a stuck level-triggered
interrupt line.

Reviewed by: jhb (for INTR_FILTER case)
MFC after: 3 weeks


# 7b10638c 21-Jan-2010 John Baldwin <jhb@FreeBSD.org>

MFC 198134,198149,198170,198171,198391,200948:
Add a facility for associating optional descriptions with active interrupt
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64, i386, and sparc64 by
having the nexus(4) driver supply a custom bus_describe_intr method that
invokes a new intr_describe() MD routine which in turn looks up the
associated interrupt event and invokes intr_event_describe_handler().


# 1b9d701f 03-Nov-2009 Attilio Rao <attilio@FreeBSD.org>

Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).
This improvements aims for avoiding further cache-misses in scheduler
specific functions which need to keep track of average thread running
time and further locking in places setting for this flag.

Reported by: jeff (originally), kris (currently)
Reviewed by: jhb
Tested by: Giuseppe Cocomazzi <sbudella at email dot it>


# d0c9a291 15-Oct-2009 John Baldwin <jhb@FreeBSD.org>

Use language more closely resembling English in a panic message.

Pointy hat to: jhb
Submitted by: pluknet


# 37b8ef16 15-Oct-2009 John Baldwin <jhb@FreeBSD.org>

Add a facility for associating optional descriptions with active interrupt
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
the nexus(4) driver supply a custom bus_describe_intr method that invokes
a new intr_describe() MD routine which in turn looks up the associated
interrupt event and invokes intr_event_describe_handler().

Requested by: many
Reviewed by: scottl
MFC after: 2 weeks


# cebc7fb1 01-Jul-2009 John Baldwin <jhb@FreeBSD.org>

Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

Approved by: re (kib)


# 9bd55acf 25-Jun-2009 John Baldwin <jhb@FreeBSD.org>

Return errors from intr_event_bind() to the caller of intr_set_affinity().
Specifically, if a non-root user attempts to bind an interrupt the request
will now report failure with EPERM rather than silently failing with a
successful return code.

MFC after: 1 week


# e84bcd84 18-May-2009 Robert Watson <rwatson@FreeBSD.org>

Binding interrupts to a CPU consists of two parts: setting up CPU
affinity for the interrupt thread, and requesting that underlying
hardware direct interrupts to the CPU. For software interrupt
threads, implement a no-op interrupt event binder that returns
success, so that the interrupt management code will just set the
ithread's affinity and succeed.

Reviewed by: jhb
MFC after: 1 week


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# b386eb91 23-Sep-2008 David E. O'Brien <obrien@FreeBSD.org>

style(9)


# 37e9511f 15-Sep-2008 John Baldwin <jhb@FreeBSD.org>

Expose a new public routine intr_event_execute_handlers() which executes
all the non-filter handlers attached to an interrupt event. This can be
used by device drivers which multiplex their interrupt onto the interrupt
handlers for child devices.


# 6205924a 22-Aug-2008 Kip Macy <kmacy@FreeBSD.org>

Submit a band-aid for interrupt set up race.

MFC after: 1 month


# 2976b312 18-Jul-2008 Kip Macy <kmacy@FreeBSD.org>

revert change from local tree


# 4af83c8c 18-Jul-2008 Kip Macy <kmacy@FreeBSD.org>

import vendor fixes to cxgb


# 04a58b9d 29-Jun-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Remove an unneeded error variable to make clear that if reaching
the end of the function we never return an error.


# 8df78c41 16-Apr-2008 Jeff Roberson <jeff@FreeBSD.org>

- Make SCHED_STATS more generic by adding a wrapper to create the
variables and sysctl nodes.
- In reset walk the children of kern_sched_stats and reset the counters
via the oid_arg1 pointer. This allows us to add arbitrary counters to
the tree and still reset them properly.
- Define a set of switch types to be passed with flags to mi_switch().
These types are named SWT_*. These types correspond to SCHED_STATS
counters and are automatically handled in this way.
- Make the new SWT_ types more specific than the older switch stats.
There are now stats for idle switches, remote idle wakeups, remote
preemption ithreads idling, etc.
- Add switch statistics for ULE's pickcpu algorithm. These stats include
how much migration there is, how often affinity was successful, how
often threads were migrated to the local cpu on wakeup, etc.

Sponsored by: Nokia


# 9b33b154 10-Apr-2008 Jeff Roberson <jeff@FreeBSD.org>

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


# 8aa9e82e 05-Apr-2008 John Baldwin <jhb@FreeBSD.org>

Move INTR_FILTER from opt_global.h to its own header.


# 1ee1b687 05-Apr-2008 John Baldwin <jhb@FreeBSD.org>

Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).

Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64


# e4b1aa62 03-Apr-2008 Jeff Roberson <jeff@FreeBSD.org>

- Fix a mis-merge that crept in during the softclock changes.

Spotted by: jhb


# 8d809d50 02-Apr-2008 Jeff Roberson <jeff@FreeBSD.org>

Implement per-cpu callout threads, wheels, and locks.

- Move callout thread creation from kern_intr.c to kern_timeout.c
- Call callout_tick() on every processor via hardclock_cpu() rather than
inspecting callout internal details in kern_clock.c.
- Remove callout implementation details from callout.h
- Package up all of the global variables into a per-cpu callout structure.
- Start one thread per-cpu. Threads are not strictly bound. They prefer
to execute on the native cpu but may migrate temporarily if interrupts
are starving callout processing.
- Run all callouts by default in the thread for cpu0 to maintain current
ordering and concurrency guarantees. Many consumers may not properly
handle concurrent execution.
- The new callout_reset_on() api allows specifying a particular cpu to
execute the callout on. This may migrate a callout to a new cpu.
callout_reset() schedules on the last assigned cpu while
callout_reset_curcpu() schedules on the current cpu.

Reviewed by: phk
Sponsored by: Nokia


# 6d2d1c04 17-Mar-2008 John Baldwin <jhb@FreeBSD.org>

Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.

Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}


# 237fdd78 16-Mar-2008 Robert Watson <rwatson@FreeBSD.org>

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# eaf86d16 14-Mar-2008 John Baldwin <jhb@FreeBSD.org>

Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:

intr_bind(rman_get_start(irq_res), cpu);

however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.

Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}


# 6617724c 12-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


# 539976ff 29-Oct-2007 Julian Elischer <julian@FreeBSD.org>

fix typo in code normally not compiled in.


# 3c1ffc32 28-Oct-2007 Julian Elischer <julian@FreeBSD.org>

Fix typo in code obviously not being compiled on any of my machines.
found by: rdivacky@


# 9ef95d01 26-Oct-2007 Julian Elischer <julian@FreeBSD.org>

rename the process to 'idle' and 'intr' as per jhb.


# ca9a0ddf 26-Oct-2007 Julian Elischer <julian@FreeBSD.org>

if one changes a function's arguments, one must also change the callers.


# 7ab24ea3 26-Oct-2007 Julian Elischer <julian@FreeBSD.org>

Introduce a way to make pure kernal threads.
kthread_add() takes the same parameters as the old kthread_create()
plus a pointer to a process structure, and adds a kernel thread
to that process.

kproc_kthread_add() takes the parameters for kthread_add,
plus a process name and a pointer to a pointer to a process instead of just
a pointer, and if the proc * is NULL, it creates the process to the
specifications required, before adding the thread to it.

All other old kthread_xxx() calls return, but act on (struct thread *)
instead of (struct proc *). One reason to change the name is so that
any old kernel modules that are lying around and expect kthread_create()
to make a process will not just accidentally link.

fix top to show kernel threads by their thread name in -SH mode
add a tdnam formatting option to ps to show thread names.

make all idle threads actual kthreads and put them into their own idled process.
make all interrupt threads kthreads and put them in an interd process
(mainly for aesthetic and accounting reasons)
rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper'

man page fixes to follow.


# 3745c395 20-Oct-2007 Julian Elischer <julian@FreeBSD.org>

Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.


# 982d11f8 04-Jun-2007 Jeff Roberson <jeff@FreeBSD.org>

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


# 67596082 04-Jun-2007 Attilio Rao <attilio@FreeBSD.org>

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


# 3401f2c1 31-May-2007 Paolo Pisati <piso@FreeBSD.org>

In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb


# bafe5a31 06-May-2007 Paolo Pisati <piso@FreeBSD.org>

Bring in the reminaing bits to make interrupt filtering work:

o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.

Approved by: re (implicit?)


# 0ae62c18 18-Apr-2007 Nate Lawson <njl@FreeBSD.org>

Bump the interrupt storm detection counter to 1000. My slow fileserver
gets a bogus irq storm detected when periodic daily kicks off at 3 am
and disconnects the disk. Change the print logic to print once per second
when the storm is occurring instead of only once. Otherwise, it appeared
that something else was causing the errors each night at 3 am since the
print only occurred the first time.

Reviewed by: jhb
MFC after: 1 week


# e41bcf3c 02-Mar-2007 John Baldwin <jhb@FreeBSD.org>

- Don't do the interrupt storm protection stuff for software interrupt
handlers.
- Use pause() when throtting during an interrupt storm.

Reported by: kris (1)


# f2d619c8 27-Feb-2007 Paolo Pisati <piso@FreeBSD.org>

Do not execute filter only handlers in ithread_execute_handlers():
this fixes the panics when filter only and ithread only handlers where
sharing the same irq .


# ef544f63 22-Feb-2007 Paolo Pisati <piso@FreeBSD.org>

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


# f0393f06 23-Jan-2007 Jeff Roberson <jeff@FreeBSD.org>

- Remove setrunqueue and replace it with direct calls to sched_add().
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.

Discussed with: julian

- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.

Suggested by: jhb

Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.


# c3045318 12-Dec-2006 John Baldwin <jhb@FreeBSD.org>

Add a function to return the MD interrupt source cookie associated with
an interrupt event. Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.


# bc17acb2 12-Dec-2006 John Baldwin <jhb@FreeBSD.org>

Add a comment and fix a whitespace nit.


# ad1e7d28 05-Dec-2006 Julian Elischer <julian@FreeBSD.org>

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


# 8460a577 26-Oct-2006 John Birrell <jb@FreeBSD.org>

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


# 1ca2c018 17-Oct-2006 Bruce Evans <bde@FreeBSD.org>

kern_intr.c:
- Count (scheduling of) software interrupts (SWIs) as SWIs, not as
hardware interrupts.
- Don't count (scheduling of) delayed SWIs as interrupts at all, since
in the delayed case it is expected that there are many more scheduling
calls than handling calls. Perhaps all interrupts should be counted
only when they are handled, but it is only counts of delayed SWIs that
shouldn never be combined with the other counts.

subr_trap.c:
- Count (handling of) Asynchronous System Traps (ASTs) as traps, not as
software interrupts.

Before these changes, the counter for SWIs only counted ASTs, and SWIs
weren't counted separately, but a subcounter for ASTs alone is less
needed than for most other exception sources.

4.4BSD-Lite uses the counters for similar things (actually matching
their names) on its main arches (hp300, ..., !i386) where more of the
exceptions are in hardware.


# 19e9205a 12-Jul-2006 John Baldwin <jhb@FreeBSD.org>

Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.


# 0f180a7c 17-Apr-2006 John Baldwin <jhb@FreeBSD.org>

Change msleep() and tsleep() to not alter the calling thread's priority
if the specified priority is zero. This avoids a race where the calling
thread could read a snapshot of it's current priority, then a different
thread could change the first thread's priority, then the original thread
would call sched_prio() inside msleep() undoing the change made by the
second thread. I used a priority of zero as no thread that calls msleep()
or tsleep() should be specifying a priority of zero anyway.

The various places that passed 'curthread->td_priority' or some variant
as the priority now pass 0.


# bb141be1 15-Apr-2006 Scott Long <scottl@FreeBSD.org>

Take a better stab at making this compile.


# 83bc5d54 15-Apr-2006 Scott Long <scottl@FreeBSD.org>

Take a stab at making this compile.


# 9477358d 13-Apr-2006 John Baldwin <jhb@FreeBSD.org>

Turn on ithread_destroy() and call it from intr_event_destroy() to tear
down an interrupt event's associated thread (if it has one).


# fe486a37 26-Oct-2005 John Baldwin <jhb@FreeBSD.org>

Add a swi_remove() function to teardown software interrupt handlers. For
now it just calls intr_event_remove_handler(), but at some point it might
also be responsible for tearing down interrupt events created via swi_add.


# e0f66ef8 25-Oct-2005 John Baldwin <jhb@FreeBSD.org>

Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)

Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)


# 10f508d9 15-Sep-2005 John Baldwin <jhb@FreeBSD.org>

Don't disallow sleeping for handlers on swi's since some swi handlers
(like CAM) do sleep in their handlers.

Requested by: scottl


# 51460da8 15-Sep-2005 John Baldwin <jhb@FreeBSD.org>

- Add a new simple facility for marking the current thread as being in a
state where sleeping on a sleep queue is not allowed. The facility
doesn't support recursion but uses a simple private per-thread flag
(TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is
set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
(ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.

MFC after: 1 week
Reviewed by: phk


# 943928c9 20-Jun-2005 John Baldwin <jhb@FreeBSD.org>

Simplify the storming logic and remove a variable as a result.

Approved by: re (dwhite)


# aa9aa68d 12-Apr-2005 John Baldwin <jhb@FreeBSD.org>

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 63710c4d 30-Dec-2004 John Baldwin <jhb@FreeBSD.org>

Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.


# d0b4135e 17-Nov-2004 John Baldwin <jhb@FreeBSD.org>

Don't bother exiting storming mode once a second to see if it has gone
away, instead only exit storming mode when an interrupt stops firing long
enough for the ithread to exit the loop and go back to sleep.

Tested by: macrus (cruder version)


# a51dae09 16-Nov-2004 John Baldwin <jhb@FreeBSD.org>

Adjust the interrupt storm handling code to better handle a storm. When
a storm is detected, enter "storming" mode which throttles the interrupt
source such that the handlers are run once every clock tick. Previously
we allowed a full set of storm_threshold interations through the handler
before going back to sleep. Also, this currently will intentionally exit
storming mode once a second to see if the storm has passed.

Tested by: marcus
Discussed with: bde


# 0811d60a 05-Nov-2004 John Baldwin <jhb@FreeBSD.org>

- Make setting of IT_ENTROPY a bit simpler in ithread_update().
- Tweak the updating of the ithread name in ithread_update() so that the
'+' and '*' characters for device names that were too short only get
added at the end after as many device names as possible were fit into
the allocated space. Prior to this, some long devices would result
in '+' chars showing up between two different devices rather than at the
end.


# c957c14d 03-Nov-2004 John Baldwin <jhb@FreeBSD.org>

Revert most of 1.109. Although it improved the situation on one particular
motherboard, in practice the changes resulted in many false positives for
heavy network loads, etc. resulting in poor performance. Also, the
motherboard referenced in the 1.109 log has other problems and simply does
not seem to work with the APIC enabled even with the changes in 1.109. The
correct fix for that board seems to be to not use the APIC at all. One
thing kept from 1.109 is that throttled interrupts are now effectively
polled on every clock tick rather than just 10 times per second.

MFC after: 1 month
Tested by: Shunsuke SHINOMIYA shino at fornext dot org


# d39d4a6e 01-Nov-2004 John Baldwin <jhb@FreeBSD.org>

- Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.

MFC after: 1 month
Inspired by: kris


# ed062c8d 04-Sep-2004 Julian Elischer <julian@FreeBSD.org>

Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week


# 2630e4c9 31-Aug-2004 Julian Elischer <julian@FreeBSD.org>

Give setrunqueue() and sched_add() more of a clue as to
where they are coming from and what is expected from them.

MFC after: 2 days


# 2cfe973b 16-Aug-2004 Robert Watson <rwatson@FreeBSD.org>

Annotate call to DELAY() in interrupt storm mitigation as being
something to revisit.

Approved by: re (scottl)


# 6f40c417 05-Aug-2004 Robert Watson <rwatson@FreeBSD.org>

In ithread_schedule(), when we plan to go harvest some entropy as
a result of scheduling an ithread, cut a KTR_INTR trace record so
that it's clear in tracing interrupt activity where and when the
entropy harvesting code is invoked.


# 0c0b25ae 02-Jul-2004 John Baldwin <jhb@FreeBSD.org>

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


# bf0acc27 02-Jul-2004 John Baldwin <jhb@FreeBSD.org>

- Change mi_switch() and sched_switch() to accept an optional thread to
switch to. If a non-NULL thread pointer is passed in, then the CPU will
switch to that thread directly rather than calling choosethread() to pick
a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
requiring all callers of mi_switch() to know to do this if they can be
called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
the middle of the function prototypes and up above into their own
section.


# 05b2c96f 05-Jun-2004 Bruce Evans <bde@FreeBSD.org>

Detect interrupt storms better. The storm detection didn't work at all
with an ASUS A7N8X-E motherboard in APIC mode, since storming interrupts
don't repeat immediately. Use DELAY(1) to wait a bit for them to repeat.
This affects all systems. Only delay for the first
(10 * intr_storm_threshold) interrupts (per interrupt handler) so that
this is only a pessimization while warming up. Throttle after calling
the sub-handlers instead of before so that the long delay given by
throttling can be used instead of the DELAY(1) to detect storms after
warming up.

Reduced the throttling period from 1/10 second to 1/hz seconds so that
throttling doesn't destroy performance so much. Interrupts that are
detected as storming are effectively handled by polling at a frequency
of hz Hz. On A7N8X-E's there is another hardware or configuration bug
that makes the throttled frequency closer to 2*hz Hz.


# 7b1fe905 16-Apr-2004 Bruce Evans <bde@FreeBSD.org>

Fixed some style bugs in previous commit (mainly an insertion sort error
for declarations, and poorly worded messages).

Fixed some nearby style bugs (unsorted declarations).


# 7870c3c6 16-Apr-2004 John Baldwin <jhb@FreeBSD.org>

- Enable (unmask) interrupt sources earlier in the ithread loop.
Specifically, we used to enable the source after locking sched_lock
and just before we had already decided to do a context switch.
This meant that an ithread could never process more than one interrupt
per context switch. Enabling earlier in the loop before sched_lock is
acquired allows an ithread to handle multiple interrupts per context
switch if interrupts fire very rapidly. For the case of heavy interrupt
load this can reduce the number of context switches (and thus overhead)
as well as reduce interrupt latency.
- Now that we can handle multiple interrupts per context switch, add simple
interrupt storm protection to threaded interrupts. If X number of
consecutive interrupts are triggered before the itherad voluntarily
yields to another thread, then the interrupt thread will sleep with the
associated interrupt source disabled (masked) for 1/10th of a second.
The default value of X is 500, but it can be tweaked via the tunable/
sysctl hw.intr_storm_threshold. If an interrupt storm is detected, then
a message is output to the kernel console on the first occurrence per
interrupt thread. Interrupt storm protection can be disabled completely
by setting this value to 0. There is no scientific reasoning for the
1/10th of a second or 500 interrupts values, so they may require tweaking
at some point in the future.

Tested by: rwatson (an earlier version w/o the storm protection)
Tested by: mux (reportedly made a machine with two PCI interrupts
storming usable rather than hard locked)
Reviewed by: imp


# 60744399 05-Mar-2004 John Baldwin <jhb@FreeBSD.org>

kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by: many


# 29bcc451 24-Jan-2004 Jeff Roberson <jeff@FreeBSD.org>

- Add a flags parameter to mi_switch. The value of flags may be SW_VOL or
SW_INVOL. Assert that one of these is set in mi_switch() and propery
adjust the rusage statistics. This is to simplify the large number of
users of this interface which were previously all required to adjust the
proper counter prior to calling mi_switch(). This also facilitates more
switch and locking optimizations.
- Change all callers of mi_switch() to pass the appropriate paramter and
remove direct references to the process statistics.


# 288e351b 13-Jan-2004 Don Lewis <truckman@FreeBSD.org>

If a device attach routine fails during boot and calls bus_teardown_intr(),
ithread_remove_handler() may fail to remove the interrupt handler if
it decides to let the ithread do the removal. The problem is that during
boot "cold" is set, which causes msleep() to return immediately. This
will cause ithread_remove_handler() to fail to wait for the ithread
to do the removal from the handler TAILQ before freeing the handler
back to the heap. Bad things will happen when some other user of the
TAILQ, such as ithread_add_handler() or the actual ithread attempts to use
the freed handler. Fix the problem by forcing ithread_remove_handler()
to do the actual removal itself if the "cold" flag is set.

Reviewed by: jhb


# 4e3a7a14 20-Nov-2003 Mark Murray <markm@FreeBSD.org>

Fix a major faux pas of mine. I was causing 2 very bad things to
happen in interrupt context; 1) sleep locks, and 2) malloc/free
calls.

1) is fixed by using spin locks instead.

2) is fixed by preallocating a FIFO (implemented with a STAILQ)
and using elements from this FIFO instead. This turns out
to be rather fast.

OK'ed by: re (scottl)
Thanks to: peter, jhb, rwatson, jake
Apologies to: *


# 3fed54aa 18-Nov-2003 Mark Murray <markm@FreeBSD.org>

Hackfix to patch around a kernel panic I introduced. Real fix to
follow. In the meanwhile, we are not harvesting interrupt entropy.

Approved by: re (jhb)


# 90e3387e 16-Nov-2003 Peter Wemm <peter@FreeBSD.org>

Expand the argument to the ithread enable/disable helper hooks from an
int to something big enough to hold a pointer. amd64 needs this.


# 8bc08464 03-Nov-2003 John Baldwin <jhb@FreeBSD.org>

Don't require INTR_FAST handlers to be exclusive in the MI layer. Instead,
let the MD code choose whether or not to implement such a policy. The new
i386 interrupt code allows multiple FAST handlers for a given source for
example. However, the code does not allow FAST and non-FAST handlers to be
mixed.


# 8b201c42 24-Oct-2003 John Baldwin <jhb@FreeBSD.org>

- Add a DDB command 'show intrcnt' to show the non-zero interrupt counts.
- Add a DDB function to dump the contents of an ithread and optionally
details about each handler in that ithread. This function can be used
by MD code to implement DDB commands that display information about
interrupt sources and their registered handlers.


# 79501b66 01-Jul-2003 Scott Long <scottl@FreeBSD.org>

Make swi_vm be INTR_MPSAFE. On all platforms, it is only used to activate
busdma_swi(). Now that busdma_swi() uses driver-provided locking, this
should be safe.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 67096659 31-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused variable(s).

Found by: FlexeLint


# b1ac98d8 01-May-2003 Julian Elischer <julian@FreeBSD.org>

Move the flag that indicates an idle thread from the KSE to the thread.
It was always referenced via the thread anyhow.

Reviewed by: jhb (a LOOOOONG time ago)


# e674d807 17-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Add some locking in for a few proc and thread fields.


# 8804bf6b 17-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Use local struct proc variables to reduce repeated td->td_proc dereferences
and improve readability.


# 9520fc2b 17-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Adjust a KTR trace to log thread state instead of proc state as that is
more relevant.


# 9b4982bf 04-Mar-2003 John Baldwin <jhb@FreeBSD.org>

Add a WITNESS_WARN() call to verify that we hold no locks after running
a handler from an interrupt thread.


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 4a338afd 17-Feb-2003 Julian Elischer <julian@FreeBSD.org>

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


# c11110ea 14-Feb-2003 Alfred Perlstein <alfred@FreeBSD.org>

Fix crash dumps on ata and scsi.

To fix scsi, don't wait for ithreads if we're dumping, it makes the
debugger sad.

To fix ata, use what appears to be a polling method if we're dumping,
I stole this from tmm but added code to ensure that this change is
only in effect while dumping.

Tested by: des


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 24fbeaf9 28-Dec-2002 Jake Burkholder <jake@FreeBSD.org>

Don't put a newline in KTR traces.


# bb8992b3 17-Oct-2002 Robert Drehmel <robert@FreeBSD.org>

Instead of (sizeof(source_buffer) - 1) bytes, copy at most
(sizeof(destination_buffer) - 1) bytes into the destination buffer.
This was not harmful because they currently both provide space for
(MAXCOMLEN + 1) bytes.


# e80fb434 17-Oct-2002 Robert Drehmel <robert@FreeBSD.org>

Use strlcpy() instead of strncpy() to copy NUL terminated strings
for safety and consistency.


# 316ec49a 02-Oct-2002 Scott Long <scottl@FreeBSD.org>

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


# 37c84183 28-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 98f93c07 22-Sep-2002 Jake Burkholder <jake@FreeBSD.org>

Removed unneeded include (missed in last revision).


# e3b6e33c 21-Sep-2002 Jake Burkholder <jake@FreeBSD.org>

Moved netisr code from kern/kern_intr.c to net/netisr.c as threatened in a
comment.


# 71fad9fd 11-Sep-2002 Julian Elischer <julian@FreeBSD.org>

Completely redo thread states.

Reviewed by: davidxu@freebsd.org


# 65c17e74 05-Sep-2002 David Xu <davidxu@FreeBSD.org>

Remove extra ';'


# 04774f23 01-Aug-2002 Julian Elischer <julian@FreeBSD.org>

Slight cleanup of some comments/whitespace.
Make idle process state more consistant.
Add an assert on thread state.
Clean up idleproc/mi_switch() interaction.
Use a local instead of referencing curthread 7 times in a row
(I've been told curthread can be expensive on some architectures)
Remove some commented out code.
Add a little commented out code (completion coming soon)

Reviewed by: jhb@freebsd.org


# e602ba25 29-Jun-2002 Julian Elischer <julian@FreeBSD.org>

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


# 2d0231f5 29-May-2002 Julian Elischer <julian@FreeBSD.org>

diff reduction from KSE to keep WW-III from happenning on -current


# b106d2f5 11-Apr-2002 John Baldwin <jhb@FreeBSD.org>

- Set the base priority of an ithread that has no handlers when we set its
normal priority.
- Lock sched_lock while we dink with the priorities.
- Remove a few extra blank lines.


# 2b60cfc5 09-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Don't lock the ithread lock in ithread_create(). The ithread isn't on any
lists or in any tables yet so there are no other references to it, thus
we don't need to lock it.


# 6008862b 04-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 2dbd9d5b 09-Mar-2002 Luigi Rizzo <luigi@FreeBSD.org>

Make the DEVICE_POLLING code compile with -Werror and in LINT


# daccb638 11-Feb-2002 Luigi Rizzo <luigi@FreeBSD.org>

MFS: synchronize the code with the version in -stable, specifically:
+ SYSCTL_ULONG -> SYSCTL_UINT
+ some procedure renaming and variable rearrangement
+ fix the 'interface going deaf' problem same as in -stable.


# 2c100766 11-Feb-2002 Julian Elischer <julian@FreeBSD.org>

In a threaded world, differnt priorirites become properties of
different entities. Make it so.

Reviewed by: jhb@freebsd.org (john baldwin)


# 079b7bad 07-Feb-2002 Julian Elischer <julian@FreeBSD.org>

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# c86b6ff5 05-Jan-2002 John Baldwin <jhb@FreeBSD.org>

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


# e4fc250c 14-Dec-2001 Luigi Rizzo <luigi@FreeBSD.org>

Device Polling code for -current.

Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

options DEVICE_POLLING

and at runtime enable polling with

sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications


# 91f91617 09-Dec-2001 David E. O'Brien <obrien@FreeBSD.org>

Repeat after me -- "Use of ANSI string concatenation can be bad."
In this case, C99's __func__ is properly defined as:

static const char __func__[] = "function-name";

and GCC 3.1 will not allow it to be used in bogus string concatenation.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 6533ba2e 02-Sep-2001 David E. O'Brien <obrien@FreeBSD.org>

Match the declaration in net/netisr.h.

Submitted by: gcc 3.0.1


# 688ebe12 10-Aug-2001 John Baldwin <jhb@FreeBSD.org>

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


# a300519d 29-Jun-2001 John Baldwin <jhb@FreeBSD.org>

Make the schedlock saved critical section state a per-thread property.


# 84bbc4db 25-Jun-2001 John Baldwin <jhb@FreeBSD.org>

Count the switch when an ithread goes idle as a voluntary context switch.

Submitted by: bde


# 2e1aaccc 20-Jun-2001 John Baldwin <jhb@FreeBSD.org>

Preemption by an interrupt thread is an involuntary switch, not a voluntary
one.

Pointy-hat to: me


# 5a280d9c 16-Jun-2001 Peter Wemm <peter@FreeBSD.org>

Add INTR_TYPE_AV so that we can get to the PI_AV priority in the ithread
handlers. This is beneficial since it means that pcm's MPSAFE handler
can get run before things that will block on Giant in the shared irq
case.


# d279178d 01-Jun-2001 Thomas Moestl <tmm@FreeBSD.org>

Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by: bde
Reviewed by: bde (some time ago)


# 4d29cb2d 17-May-2001 John Baldwin <jhb@FreeBSD.org>

- Remove the global ithread_list_lock spin lock in favor of per-ithread
sleep locks.
- Delay returning from ithread_remove_handler() until we are certain that
the interrupt handler being removed has in fact been removed from the
ithread.
- XXX: There is still a problem in that nothing protects the kernel from
adding a new handler while the ithread is running, though with our
current architectures this is not a problem.

Requested by: gibbs (2)


# 8bd57f8f 15-May-2001 John Baldwin <jhb@FreeBSD.org>

Remove unneeded includes of sys/ipl.h and machine/ipl.h.


# 6caa8a15 27-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# f34fa851 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


# b944b903 27-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Catch up to the mtx_saveintr -> mtx_savecrit change.


# 1f723035 23-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Use (..., "%s", foo) instead of (..., foo) to avoid a warning about a
non-constant format string when calling kthread_create() to create an
ithread.


# 003fb9ec 01-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Ok, the kernel will panic in kmem_malloc() if the kernel map is full, so
malloc with M_WAITOK can't actually return NULL. I wish I could get two
people to give me the same answer about this when I ask...

Submitted by: jake


# 653dd8c2 01-Mar-2001 John Baldwin <jhb@FreeBSD.org>

- Check to see if malloc() returned NULL even with M_WAITOK.
- Add a KASSERT() to ensure an ithread has a backing kernel thread when we
schedule it.
- Don't attempt to preemptively switch to an ithread if p_stat of curproc
is not SRUN.


# 5b270b2a 27-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Sigh. Try to get priorities sorted out. Don't bother trying to
update native priority, it is diffcult to get right and likely
to end up horribly wrong. Use an honestly wrong fixed value
that seems to work; PUSER for user threads, and the interrupt
priority for ithreads. Set it once when the process is created
and forget about it.

Suggested by: bde
Pointy hat: me


# de271f01 21-Feb-2001 John Baldwin <jhb@FreeBSD.org>

Work around a race condition where an interrupt handler can be removed from
an interrupt thread while the interrupt thread is blocked on Giant waiting
to execute the interrupt handler being removed. The result was that the
intrhand structure would be free'd, and we would call 0xdeadc0de. The work
around is to check to see if the interrupt thread is idle when removing a
handler. If not, then we mark the interrupt handler as being dead using
the new IH_DEAD flag and don't remove it from the interrupt threads' list
of handlers. When the interrupt thread resumes, it will see a dead handler
while traversing the list of handlers and will remove the handler then.


# 60f2b032 21-Feb-2001 John Baldwin <jhb@FreeBSD.org>

Just use the ithread->it_proc directly in a KTR tracepoint instead of
assigning a local var to it and using it, as otherwise the local var wasn't
used, and generated a warning in the !KTR case.

Noticed by: bde


# addec20c 21-Feb-2001 John Baldwin <jhb@FreeBSD.org>

Add KTR tracepoints for adding/removing interrupt handlers,
creating/destroying interrupt threads, and updating the state of an
interrupt thread.


# 76bd604e 21-Feb-2001 John Baldwin <jhb@FreeBSD.org>

Fix a bug where the 'ithread' variable was being set in a KASSERT()
condition and thus was not initialized properly in the !INVARIANTS case.

Noticed by: bde
Pointy hat to: me


# 719f43d3 21-Feb-2001 John Baldwin <jhb@FreeBSD.org>

Remove attempt to add in PREEMPTION #ifdef test in MI code that didn't
work because opt_preemption.h wasn't #include'd. Instead, make use of the
do_switch parameter to ithread_schedule() and do the check in the alpha
interrupt code.


# 3e5da754 20-Feb-2001 John Baldwin <jhb@FreeBSD.org>

- Add a new ithread_schedule() function to do the bulk of the work of
scheduling an interrupt thread to run when needed. This has the side
effect of enabling support for entropy gathering from interrupts on
all architectures.
- Change the software interrupt and x86 and alpha hardware interrupt code
to use ithread_schedule() for most of their processing when scheduling
an interrupt to run.
- Remove the pesky Warning message about interrupt threads having entropy
enabled. I'm not sure why I put that in there in the first place.
- Add more error checking for parameters and change some cases that
returned EINVAL to panic on failure instead via KASSERT().
- Instead of doing a documented evil hack of setting the P_NOLOAD flag
on every interrupt thread whose pri was SWI_CLOCK, set the flag
explicity for clk_ithd's proc during start_softintr().


# d5a08a60 11-Feb-2001 Jake Burkholder <jake@FreeBSD.org>

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


# b4151f71 09-Feb-2001 John Baldwin <jhb@FreeBSD.org>

- Move struct ithd to sys/interrupt.h.
- Add a set of MI helper functions for interrupt threads:
- ithread_create() creates a new interrupt thread
- ithread_destroy() destroys an interrupt thread
- ithread_add_handler() attaches a new handler to an interrupt thread
- ithread_remove_handler() detaches a handler from an interrupt thread
- Rename sinthand_add() and sched_swi() to swi_add() and swi_sched()
respectively so that they live in a consistent namespace.
- struct intrhand is no longer a public type. It would be private to
kern_intr.c but the current implementation of fast interrupts on the
alpha requires the type to be exported. However, all handlers should
be treated as void * cookies in the way that new-bus treats them. This
includes references to software interrupt handlers.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 198c5b08 19-Jan-2001 Peter Wemm <peter@FreeBSD.org>

Remove the static splXXX functions and replace them by static __inline
stubs. Remove the xxx_imask variables which have been all but gone for
a while.


# 338c0bc6 30-Dec-2000 Seigo Tanimura <tanimura@FreeBSD.org>

Ignore a net interrupt if the corresponding handler is not
registered.

This fixes panic on my laptop where a spurious arp packet
is received when arp is not ready to run.


# 48786ef4 18-Dec-2000 John Baldwin <jhb@FreeBSD.org>

Fix another sched_sihand -> sched_swi in a KTR trace message.


# 1eb44f02 04-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Remove the last of the MD netisr code. It is now all MI. Remove
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.

Reviewed by: jhb


# f6a6e37a 04-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

Whitespace. Fix an overly long line.


# fa2fbc3d 18-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

- Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.

Reviewed by: -smp@, jlemon


# 896c2303 15-Nov-2000 John Baldwin <jhb@FreeBSD.org>

- Replace some instances of sched_ithd with sched_swi in KTR tracepoints.
- Assert that Giant is not owned during the main loop of sithd_loop().


# b5d09a79 10-Nov-2000 John Baldwin <jhb@FreeBSD.org>

Ignore the INTR_MPSAFE flag when calculating the priority of an interrupt
thread.


# a924ab97 06-Nov-2000 John Baldwin <jhb@FreeBSD.org>

Minor nit: missed ithd_loop -> sithd_loop in the KTR tracepoints.


# 8088699f 24-Oct-2000 John Baldwin <jhb@FreeBSD.org>

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


# 35e0e5b3 20-Oct-2000 John Baldwin <jhb@FreeBSD.org>

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


# 1931cf94 05-Oct-2000 John Baldwin <jhb@FreeBSD.org>

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


# 9a94c9c5 13-Sep-2000 John Baldwin <jhb@FreeBSD.org>

- Remove the inthand2_t type and use the equivalent driver_intr_t type from
newbus for referencing device interrupt handlers.
- Move the 'struct intrec' type which describes interrupt sources into
sys/interrupt.h instead of making it just be a x86 structure.
- Don't create 'ithd' and 'intrec' typedefs, instead, just use 'struct ithd'
and 'struct intrec'
- Move the code to translate new-bus interrupt flags into an interrupt thread
priority out of the x86 nexus code and into a MI ithread_priority()
function in sys/kern/kern_intr.c.
- Remove now-uneeded x86-specific headers from sys/dev/ata/ata-all.c and
sys/pci/pci_compat.c.


# d1f088da 11-Oct-1999 Peter Wemm <peter@FreeBSD.org>

Trim unused options (or #ifdef for undoc options).

Submitted by: phk


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 54a8c693 21-Apr-1999 Peter Wemm <peter@FreeBSD.org>

Stage 1 of a cleanup of the i386 interrupt registration mechanism.
Interrupts under the new scheme are managed by the i386 nexus with the
awareness of the resource manager. There is further room for optimizing
the interfaces still. All the users of register_intr()/intr_create()
should be gone, with the exception of pcic and i386/isa/clock.c.


# 1c5bb3ea 10-Nov-1998 Peter Wemm <peter@FreeBSD.org>

add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()


# da653c61 26-Sep-1998 Doug Rabson <dfr@FreeBSD.org>

Start using the new SWI registration system instead of hardwiring everything.


# 18c5a6c4 11-Aug-1998 Bruce Evans <bde@FreeBSD.org>

Implemented dynamic registration of software interrupt handlers. Not
used yet.

Use dummy SWI handlers to avoid some checks for null pointers.


# a23d65bf 14-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.


# 9a2daf91 18-Jun-1998 Bruce Evans <bde@FreeBSD.org>

Changed the type of an isa/general interrupt handler to take a
`void *' arg. Fixed or hid most of the resulting type mismatches.
Handlers can now be updated locally (except for reworking their
global declarations in isa_device.h).


# 3900ddb2 11-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

Only build this on i386 for now. I may use it for the alpha later but
currently it doesn't compile.


# ecbb00a2 07-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# ab36c3d3 16-Apr-1998 Bruce Evans <bde@FreeBSD.org>

Really finish supporting compiling with `gcc -ansi'.


# cfa5673e 10-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Move include of <machine/ipl.h> inside ifndef SMP where it is used, to
avoid getting 'unused include file' warnings in the SMP case.


# fd35ecc3 05-Oct-1997 Nate Williams <nate@FreeBSD.org>

- Hide the 'device doesn't supported shared interrupts' code behind
bootverbose, since the older register_intr() code didn't print out
anything, and the laptop support will cause lots of these un-necessary
messages.


# fbca51f5 21-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

Added a half dozen casts to eliminate annoying warnings.


# 77625cfe 19-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

Moved splq() to isa/ipl_funcs.c for SMP only.
This is in preperation for moving all cpl accesses behind a critical region lock.


# 1fd0b058 02-Aug-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# c6d372f6 09-Jul-1997 Andrey A. Chernov <ache@FreeBSD.org>

Back out changes for 'conflicts' with IRQ, remove intr_registered()


# 7f533ff7 08-Jun-1997 Andrey A. Chernov <ache@FreeBSD.org>

Add safety check in case "conflicts" keyword specified more times than
needed


# 0c514a25 02-Jun-1997 Doug Rabson <dfr@FreeBSD.org>

The defines INTR_FAST and INTR_EXCL are part of the public interface. The
previous commit made them private which broke things.


# 68352337 02-Jun-1997 Doug Rabson <dfr@FreeBSD.org>

Move interrupt handling code from isa.c to a new file. This should make
isa.c (slightly) more portable and will make my life developing the really
portable version much easier.

Reviewed by: peter, fsmp


# 8c046d14 01-Jun-1997 Peter Wemm <peter@FreeBSD.org>

Move "typedef struct intrec {} intrec" from sys/interrupt.h to kern_intr.c
since that's the only place that it's used.

Submitted by: se (apparently on suggestion from dfr)


# 4407ec71 31-May-1997 Peter Wemm <peter@FreeBSD.org>

<machine/spl.h> -> <machine/ipl.h>
s/intrmask/intrmask_t/g

Reviewed by: bde, se


# f9220cde 28-May-1997 Stefan Eßer <se@FreeBSD.org>

Fix problem reported by PHK: Panic in pcic probe because of NULL pointer
dereference (head->next in intr_disconnect).


# 425f9fda 26-May-1997 Stefan Eßer <se@FreeBSD.org>

Add support for shared interrupts to the kernel. This code is meant
be (eventually) architecture independent. It provides an emulation
of the ISA interrupt registration function register_intr(), but that
function does no longer manipulated the interrupt controller and
interrupt descriptor table, but calls the architecture dependent
function setup_icu() for that purpose.

After theISA/EISA bus code has been modified to directly call the new
interrupt registartion functions (intr_create() and intr_connect()),
the emulation of register_intr() should be dropped.

The C level interrupt handler function should take a (void*) argument,
and the function pointer type (inthand2_t) should defined in some other
place than isa_device.h.

This commit is a pre-requisite for the removal of the PCI specific shared
interrupt code.

Reviewed by: dfr,bde