History log of /freebsd-current/sys/sys/smp.h
Revision Date Author Comments
# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 9fb6718d 24-Apr-2023 Mark Johnston <markj@FreeBSD.org>

smp: Dynamically allocate the stoppcbs array

This avoids bloating the kernel image when MAXCPU is large.

A follow-up patch for kgdb and other kernel debuggers is needed since
the stoppcbs symbol is now a pointer. Bump __FreeBSD_version so that
debuggers can use osreldate to figure out how to handle stoppcbs.

PR: 269572
MFC after: never
Reviewed by: mjg, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39806


# d28d74de 14-Dec-2021 sebastien.bini <sebastien.bini@stormshield.eu>

smp.h: make sign conversion explicit

When comparing singed with unsigned the signed value is casted
to unsigned. Make this explicit as it might lead to compilation
warnings otherwise.

Obtained from: Stormshield


# ef50d5fb 23-Sep-2021 Alexander Motin <mav@FreeBSD.org>

x86: Add NUMA nodes into CPU topology.

Depending on hardware, NUMA nodes may match last level caches, or
they may be above them (AMD Zen 2/3) or below (Intel Xeon w/ SNC).
This information is provided by ACPI instead of CPUID, and it is
provided for each CPU individually instead of mask widths, but
this code should be able to properly handle all the above cases.

This change should immediately allow idle stealing in sched_ule(4)
to prefer load from NUMA-local CPUs to remote ones when the node
does not match LLC. Later we may think of how to better handle it
on sched_pickcpu() side.

MFC after: 1 month


# aefe0a8c 28-Jul-2021 Alexander Motin <mav@FreeBSD.org>

Refactor/optimize cpu_search_*().

Remove cpu_search_both(), unused for many years. Without it there is
less sense for the trick of compiling common cpu_search() into separate
cpu_search_lowest() and cpu_search_highest(), so split them completely,
making code more readable. While there, split iteration over children
groups and CPUs, complicating code for very small deduplication.

Stop passing cpuset_t arguments by value and avoid some manipulations.
Since MAXCPU bump from 64 to 256, what was a single register turned
into 32-byte memory array, requiring memory allocation and accesses.
Splitting struct cpu_search into parameter and result parts allows to
even more reduce stack usage, since the first can be passed through
on recursion.

Remove CPU_FFS() from the hot paths, precalculating first and last CPU
for each CPU group in advance during initialization. Again, it was
not a problem for 64 CPUs before, but for 256 FFS needs much more code.

With these changes on 80-thread system doing ~260K uncached ZFS reads
per second I observe ~30% reduction of time spent in cpu_search_*().

MFC after: 1 month


# 6eecc07f 11-Aug-2020 Conrad Meyer <cem@FreeBSD.org>

smp.h: Reconcile definition and declaration of smp_ncpus

The variable is defined unconditionally; declare it unconditionally as well.

It is already initialized to the correct value (1) for !SMP builds.

No functional change.


# e4f58497 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Add smp_rendezvous_cpus_retry

This is a wrapper around smp_rendezvous_cpus which enables use of IPI
handlers which can fail and require retrying.

wait_func argument is added to to provide a routine which can be used to
poll CPU of interest for when the IPI can be retried.

Handlers which succeed must call smp_rendezvous_cpus_done to denote that
fact.

Discussed with: jeff
Differential Revision: https://reviews.freebsd.org/D23582


# 5032fe17 30-Nov-2019 Mateusz Guzik <mjg@FreeBSD.org>

Add a way to inject fences using IPIs

A variant of this facility was already used by rmlocks where IPIs would
enforce ordering.

This allows to elide fences where they are rarely needed and the cost of
IPI (should it be necessary) is cheaper.

Reviewed by: kib, jeff (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21740


# 6b83069e 04-Jan-2019 Conrad Meyer <cem@FreeBSD.org>

Expose threads-per-core and physical core count information

With new sysctls (to the best of our ability do detect them). Restructured
smp.4 slightly for clarity (keep relevant stuff closer to the top) while
documenting.

Reviewed by: markj, jhibbits (ppc parts)
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D18322


# da9fba54 21-Dec-2017 Bruce Evans <bde@FreeBSD.org>

Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension.
restart_cpus() worked well enough by accident. Before this set of fixes,
resume_cpus() used the same cpuset (started_cpus, meaning CPUs directed to
restart) as restart_cpus(). resume_cpus() waited for the wrong cpuset
(stopped_cpus) to become empty, but since mixtures of stopped and suspended
CPUs are not close to working, stopped_cpus must be empty when resuming so
the wait is null -- restart_cpus just allows the other CPUs to restart and
returns without waiting.

Fix resume_cpus() to wait on a non-wrong cpuset for the ACPI case, and
add further kludges to try to keep it working for the XEN case. It
was only used for XEN. It waited on suspended_cpus. This works for
XEN. However, for ACPI, resuming is a 2-step process. ACPI has already
woken up the other CPUs and removed them from suspended_cpus. This
fix records the move by putting them in a new cpuset resuming_cpus.
Waiting on suspended_cpus would give the same null wait as waiting on
stopped_cpus. Wait on resuming_cpus instead.

Add a cpuset toresume_cpus to map the CPUs being told to resume to keep
this separate from the cpuset started_cpus for mapping the CPUs being told
to restart. Mixtures of stopped and suspended/resuming CPUs are still far
from working. Describe new and some old cpusets in comments.

Add further kludges to cpususpend_handler() to try to avoid breaking it
for XEN. XEN doesn't use resumectx(), so it doesn't use the second
return path for savectx(), and it goes from the suspended state directly
to the restarted state, while ACPI resume goes through the resuming state.
Enter the resuming state early for all cases so that resume_cpus can test
for being in this state and not have to worry about the intermediate
!suspended state for ACPI only.

Reviewed by: kib


# 64de3fdd 30-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

SPDX: use the Beerware identifier.


# bb14d564 21-Aug-2017 Conrad Meyer <cem@FreeBSD.org>

subr_smp: Clean up topology analysis, add additional layers

Rather than repeatedly nesting loops, separate concerns with a single loop
per call stack level. Use a table to drive the recursive routine. Handle
missing topology layers more gracefully (infer a single unit).

Analyze some additional optional layers which may be present on e.g. AMD Zen
systems (groups, aka dies, per package; and cachegroups, aka CCXes, per
group).

Display that additional information in the boot-time topology information,
when it is relevent (non-one).

Reviewed by: markj@, mjoras@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12019


# 67d955aa 08-Apr-2017 Patrick Kelsey <pkelsey@FreeBSD.org>

Corrected misspelled versions of rendezvous.

The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().

Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313


# 4725e6bf 04-Apr-2016 Andriy Gapon <avg@FreeBSD.org>

new x86 smp topology detection code

Previously, the code determined a topology of processing units
(hardware threads, cores, packages) and then deduced a cache topology
using certain assumptions. The new code builds a topology that
includes both processing units and caches using the information
provided by the hardware.

At the moment, the discovered full topology is used only to creeate
a scheduling topology for SCHED_ULE.
There is no KPI for other kernel uses.

Summary:
- based on APIC ID derivation rules for Intel and AMD CPUs
- can handle non-uniform topologies
- requires homogeneous APIC ID assignment (same bit widths for ID
components)
- topology for dual-node AMD CPUs may not be optimal
- topology for latest AMD CPU models may not be optimal as the code is
several years old
- supports only thread/package/core/cache nodes

Todo:
- AMD dual-node processors
- latest AMD processors
- NUMA nodes
- checking for homogeneity of the APIC ID assignment across packages
- more flexible cache placement within topology
- expose topology to userland, e.g., via sysctl nodes

Long term todo:
- KPI for CPU sharing and affinity with respect to various resources
(e.g., two logical processors may share the same FPU, etc)

Reviewed by: mav
Tested by: mav
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D2728


# c0ae6688 08-Jan-2015 John Baldwin <jhb@FreeBSD.org>

Create a cpuset mask for each NUMA domain that is available in the
kernel via the global cpuset_domain[] array. To export these to userland,
add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a
specific domain. Add a -d flag to cpuset(1) that can be used to fetch
the mask for a given domain.

Differential Revision: https://reviews.freebsd.org/D1232
Submitted by: jeff (kernel bits)
Reviewed by: adrian, jeff


# 60ad8150 26-Apr-2014 Scott Long <scottl@FreeBSD.org>

Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code. Replace its use with smp_started. There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Discussed with: peter, jhb
MFC after: 3 days
Obtained from: Netflix


# 428b7ca2 19-Sep-2013 Justin T. Gibbs <gibbs@FreeBSD.org>

Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
- Make sure that are no MMU related IPIs pending on migration.
- Reset pending IPI_BITMAP on resume.
- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
- Add a "suspend_cancelled" parameter to pic_resume(). For the
Xen PIC, restoration of interrupt services differs between
the aborted suspend and normal resume cases, so we must provide
this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
- Don't swap out "suspend safe" timers across a suspend/resume
cycle. This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
- Perform proper suspend/resume process for PVHVM:
- Suspend all APs before going into suspension, this allows us
to reset the vcpu_info on resume for each AP.
- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
- Implement suspend/resume support for the PV timer. Since FreeBSD
doesn't perform a per-cpu resume of the timer, we need to call
smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
- When suspending a PVHVM domain make sure there are no MMU IPIs
in-flight, or we will get a lockup on resume due to the fact that
pending event channels are not carried over on migration.
- Implement a generic version of restart_cpus that can be used by
suspended and stopped cpus.

sys/x86/xen/hvm.c:
- Implement resume support for the hypercall page and shared info.
- Clear vcpu_info so it can be reset by APs when resuming from
suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
- Properly rebind per-cpus VIRQs and IPIs on resume.


# 28d91af3 14-Nov-2012 Jeff Roberson <jeff@FreeBSD.org>

- Implement run-time expansion of the KTR buffer via sysctl.
- Implement a function to ensure that all preempted threads have switched
back out at least once. Use this to make sure there are no stale
references to the old ktr_buf or the lock profiling buffers before
updating them.

Reviewed by: marius (sparc64 parts), attilio (earlier patch)
Sponsored by: EMC / Isilon Storage Division


# fb864578 08-Jun-2012 Mitsuru IWASAKI <iwasaki@FreeBSD.org>

Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.

Reviewed by: attilio@, acpi@


# e3fd0bc1 18-May-2012 Mitsuru IWASAKI <iwasaki@FreeBSD.org>

Add SMP/i386 suspend/resume support.
Most part is merged from amd64.

- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).

- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).

- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.

- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).

- i386/i386/apic_vector.s
Added cpususpend().

- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().

- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().

- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.

MFC after: 3 days


# df3f1d68 22-May-2011 Attilio Rao <attilio@FreeBSD.org>

Merge r221901 from largeSMP project branch:
Increase the size of cg_count in order to enable usage of > 127 CPUs.
cg_children is also bumped in order to keep the structure naturally
padded, even if this is not strictly necessary.

Submitted and tested by: sbruno


# d59dd76c 16-May-2011 Attilio Rao <attilio@FreeBSD.org>

Merge r221278 from largeSMP project:
idle_cpus_mask is just used in sched_4bsd, thus make it private for it.

Tested by: several


# bd514531 14-May-2011 Sean Bruno <sbruno@FreeBSD.org>

Increase size of cg_count to allow us to utilize >128 CPUs.

Pad cg_count and cg_children to keep the struct aligned

Reviewed by: attilio@


# 71a19bdc 05-May-2011 Attilio Rao <attilio@FreeBSD.org>

Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno


# 3121f534 30-Apr-2011 Attilio Rao <attilio@FreeBSD.org>

idle_cpus_mask is just used in the SMP case and within sched_4BSD.
Declare appropriately.


# 6413b057 11-Nov-2010 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

Add some platform KOBJ extensions and continue integrating PowerPC
hypervisor infrastructure support:
- Fix coexistence of multiple platform modules in the same kernel
- Allow platform modules to provide an SMP topology
- PowerPC hypervisors limit the amount of memory accessible in real mode.
Allow the platform modules to specify the maximum real-mode address,
and modify the bits of the kernel that need to allocate
real-mode-accessible buffers to respect this limits.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# e762f666 11-Jun-2010 John Baldwin <jhb@FreeBSD.org>

Add helper macros to iterate over available CPUs in the system.
CPU_FOREACH(i) iterates over the CPU IDs of all available CPUs. The
CPU_FIRST() and CPU_NEXT(i) macros can also be used to iterate over
available CPU IDs. CPU_NEXT(i) wraps around to CPU_FIRST() rather than
returning some sort of terminator.

Requested by: rwatson
Reviewed by: attilio


# b57f5ce0 28-Sep-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r197390:
Remove forward_roundrobin().

Approved by: re (kensmith)


# 51a6ef34 21-Sep-2009 Konstantin Belousov <kib@FreeBSD.org>

Remove forward_roundrobin(), it is unused for quite some time.

Reviewed by: jhb
MFC after: 1 week


# be105717 13-Aug-2009 Attilio Rao <attilio@FreeBSD.org>

MFC r196196:

* Completely remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Add an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


# dc6fbf65 13-Aug-2009 Attilio Rao <attilio@FreeBSD.org>

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


# 7b55ab05 28-Apr-2009 Jeff Roberson <jeff@FreeBSD.org>

- Remove the bogus idle thread state code. This may have a race in it
and it only optimized out an ipi or mwait in very few cases.
- Skip the adaptive idle code when running on SMT or HTT cores. This
just wastes cpu time that could be used on a busy thread on the same
core.
- Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive. Re-use
CG_FLAG_THREAD to mean SMT or HTT.

Sponsored by: Nokia


# c66d2b38 16-Mar-2009 Jung-uk Kim <jkim@FreeBSD.org>

Initial suspend/resume support for amd64.

This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 833b4a13 22-May-2008 John Birrell <jb@FreeBSD.org>

Allow a rendezvous with just a specified CPU too.

Make the API work in the non-smp case too so that a kernel module
can work the same regardless of whether or not it is loaded on a SMP
kernel or not.


# 81aa7175 02-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.

Sponsored by: Nokia


# f53d15fe 08-Nov-2007 Stephan Uphoff <ups@FreeBSD.org>

Initial checkin for rmlock (read mostly lock) a multi reader single writer
lock optimized for almost exclusive reader access. (see also rmlock.9)

TODO:
Convert to per cpu variables linkerset as soon as it is available.
Optimize UP (single processor) case.


# 58553b99 24-Oct-2005 John Baldwin <jhb@FreeBSD.org>

Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
enabled if an attempt is made to send an IPI_STOP IPI. If the kernel
option is enabled, there is also a sysctl to change the behavior at
runtime (debug.stop_cpus_with_nmi which defaults to enabled). This
includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
that is restarted making use of atomic_readandclear() rather than
assuming that the BSP is always included in the set of restarted CPUs.
Also, the NMI handler didn't clear the function pointer meaning that
subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
of stoppedpcbs[] and always enable it for i386 and amd64 instead of
being dependent on KDB_STOP_NMI. It works fine in both the NMI and
non-NMI cases.


# fdc9713b 30-Apr-2005 Doug White <dwhite@FreeBSD.org>

Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by: ups


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 293968d8 03-Sep-2004 Julian Elischer <julian@FreeBSD.org>

ooops finish last commit.
moved the variables but not the declarations.


# 82a1dfc1 03-Sep-2004 Julian Elischer <julian@FreeBSD.org>

Move 4bsd specific experimental IP code into the 4bsd file.
Move the sysctls into kern.sched


# 6804a3ab 01-Sep-2004 Julian Elischer <julian@FreeBSD.org>

Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after: 5 days


# dd68efd0 27-Aug-2004 David E. O'Brien <obrien@FreeBSD.org>

s/smp_rv_mtx/smp_ipi_mtx/g

Requested by: jhb


# f1009e1e 23-Aug-2004 Peter Wemm <peter@FreeBSD.org>

Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock.
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.

This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.

Obtained from: dwhite, alc
Reviewed by: jhb


# b2ae7ed7 27-Mar-2004 Marcel Moolenaar <marcel@FreeBSD.org>

Change the type of the various CPU masks to cpumask_t. Note that as
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.


# b6c71225 03-Dec-2003 John Baldwin <jhb@FreeBSD.org>

Fix all users of mp_maxid to use the same semantics, namely:

1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by: re (scottl)
Tested on: i386, amd64, alpha


# 45c1c90f 03-Dec-2003 John Baldwin <jhb@FreeBSD.org>

Export a few SMP related symbols in UP kernels as well. This is needed to
aid other kernel code, especially code which can be in a module such as
the acpi_cpu(4) driver, to work properly with both SMP and UP kernels.
The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and
the smp_rendezvous() function. This also means that CPU_ABSENT() is now
always implemented the same on all kernels.

Approved by: re (scottl)


# 798a4596 21-Nov-2003 John Baldwin <jhb@FreeBSD.org>

- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.

Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)


# 107902b8 28-Jun-2003 Jeff Roberson <jeff@FreeBSD.org>

- Add structures for defining cpu topologies more complex than SMP.
smp_topology may be left NULL by architectures which have vanilla SMP
setups.


# 0f33dc7b 20-May-2002 Jake Burkholder <jake@FreeBSD.org>

Forward declare struct thread.


# 752dff3d 06-Mar-2002 Jake Burkholder <jake@FreeBSD.org>

Add needed includes of machine/smp.h, remove nested include in sys/smp.h
so that inlines in machine/smp.h can use variables declared in sys/smp.h.


# 88c99cfb 05-Mar-2002 Jeff Roberson <jeff@FreeBSD.org>

Add a new variable mp_maxid. This is used so that per cpu datastructures may
be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.

Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.

Reviewed by: jake
Approved by: jake


# 8b3e7871 31-Oct-2001 Marcel Moolenaar <marcel@FreeBSD.org>

Make smp_started volatile in sys/smp.h and remove the volatile
declaration in subr_smp.c. This solves a compile problem with
gcc 3.0.1 (ia64 cross-build).

Reviewed: jhb


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# bb6f838c 31-Jul-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Move CPU_ABSENT() macro to smp.h, where it belongs anyway. It will be
defined to 0 in the non-SMP case, which very much makes sense as it
permits its usage in per-CPU initialization loops (for an example, check
out subr_mbuf.c).
Further, on a UP system, make mb_alloc always use the first per-CPU
container, regardless of cpuid (i.e. remove reliability on cpuid in the
UP case).

Requested by: alfred


# ba228f6d 10-May-2001 John Baldwin <jhb@FreeBSD.org>

- Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by: jake


# 6caa8a15 27-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


# ca7ef17c 10-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Remove the BETTER_CLOCK #ifdef's. The code is on by default and is here
to stay for the foreseeable future.

OK'd by: peter (the idea)


# 48bed924 27-Jan-2001 Tor Egge <tegge@FreeBSD.org>

Defer assignment of low level interrupt handlers for PCI interrupts
described in the MP table until something asks for the interrupt number
later on.


# a263238c 05-Dec-2000 Peter Wemm <peter@FreeBSD.org>

Move io_apic_{read,write} from apic_ipl.s (where they do not belong) into
mpapic.c. This gives us the benefit of C type checking. These functions
are not called in any critical paths and are not used by the interrupt
routines.


# 9a18987b 05-Dec-2000 Peter Wemm <peter@FreeBSD.org>

GC unused assembler function apic_eoi()


# 5ee171d2 04-Dec-2000 Peter Wemm <peter@FreeBSD.org>

Cleanup some leftover lint from the old interrupt system.
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.

Reviewed by: jhb, jakeb


# 92b123a0 22-Sep-2000 Paul Saab <ps@FreeBSD.org>

Move MAXCPU from machine/smp.h to machine/param.h to fix breakage
with !SMP kernels. Also, replace NCPUS with MAXCPU since they are
redundant.


# 7321545f 22-Sep-2000 Paul Saab <ps@FreeBSD.org>

Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by: peter


# c866ec47 16-Sep-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Make LINT compile.


# 746a9354 07-Sep-2000 John Baldwin <jhb@FreeBSD.org>

Test for both SMP and I386_CPU being set before generating an error.


# 0384fff8 06-Sep-2000 Jason Evans <jasone@FreeBSD.org>

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


# 8ade1681 18-Aug-2000 Mike Smith <msmith@FreeBSD.org>

Increase the default NAPIC from 1 to 2 as a bandaid until we allocate
these dynamically (ie. typically you shouldn't have to set NAPIC at all)


# e666f57c 05-Aug-2000 Tor Egge <tegge@FreeBSD.org>

Be more verbose when changing APIC ID on an IO APIC.

Don't allow cpu entries in the MP table to contain APIC IDs out of range.

Don't write outside array boundaries if an IO APIC entry in the MP table
contains an APIC ID out of range.

Assign APIC IDs for all IO APICs according to section 3.6.6 in the
Intel MP spec:

- If the current APIC ID on an IO APIC doesn't conflict with other
IO APICs or CPUs, that APIC ID should be used. The copy of the MP
table must be updated if the corresponding APIC ID in the MP table
is different.

- If the current APIC ID was in conflict with other units, the
corresponding APIC ID specified in the MP table is checked for conflict.

- If a conflict is still found then fall back to using a new unique ID.
The copy of the MP table must be updated.

- IDs out of range is considered to be in conflict.

During these operations, the IO_TO_ID array cannot be used, since any
conflict would have caused information loss. The array is then corrected,
since all APIC ID conflicts should have been resolved.

PR: 20312, 18919


# c3c50c4e 31-May-2000 Mike Smith <msmith@FreeBSD.org>

Further fixes for multiple-IO-APIC systems from Tor Egge:

Further experimentation showed that some Dell 2450 machines with the
prevention kludge installed still got T_RESERVED traps. CPU interrupt
vector 0x7A was observed to be triggered. This might have been the
bitwise OR of two different vectors sent from each of the IOAPICs at
the same time.

IOAPIC #0: 0x68 --> irq 8: RTC timer interrupt
IOAPIC #1: 0x32 --> irq 18: scsi host adapter or network interface
----
0x7a --> T_RESERVED

Both IOAPICs had ID 0.

Appendix B.3 in the MP spec indicates that the operating system is
responsible for assigning unique IDs to the IOAPICs.

The enclosed patch programs the IOAPIC IDs according to the IOAPIC
entries in the MP table.

Submitted by: tegge


# db6a4261 28-Mar-2000 Matthew Dillon <dillon@FreeBSD.org>

The SMP cleanup commit broke UP compiles. Make UP compiles work again.


# 82916a11 04-Jan-2000 Tor Egge <tegge@FreeBSD.org>

ISA device drivers use the ISA source interrupt number in locations where
the low level interrupt handler number should be used. Change
setup_apic_irq_mapping() to allocate low level interrupt handler X (Xintr${X})
for any ISA interrupt X mentioned in the MP table.

Remove an assumption in the driver for the system clock (clock.c) that
interrupts mentioned in the MP table as delivered to IOAPIC #0 intpin Y
is handled by low level interrupt handler Y (Xintr${Y}) but don't assume
that low level interrupt handler 0 (Xintr0) is used.

Don't allocate two low level interrupt handlers for the system clock.
Reviewed by: NOKUBI Hirotaka <hnokubi@yyy.or.jp>


# 664a31e4 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# f327fe4c 25-Sep-1999 Matt Jacob <mjacob@FreeBSD.org>

Fix from Tor so that if we enter the debugger in the tristate going to
SMP (other CPUs stopped but SMP mode not really started).

Obtained from:Tor.Egge@fast.no


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# 91fe3dc1 20-Jul-1999 Mike Smith <msmith@FreeBSD.org>

Implement an all-CPU shootdown-style rendezvous facility. This allows
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.

The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.

Reviewed by: peter (partially)


# 5206bca1 27-Apr-1999 Luoqi Chen <luoqi@FreeBSD.org>

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


# 572d053e 06-Sep-1998 Tor Egge <tegge@FreeBSD.org>

Maintain a mapping from irq number to (ioapic number, int pin) tuple,
and use this when masking/unmasking interrupts.

Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.

Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.

Don't let an AP enter _cpu_switch before all local apics are initialized.


# 2f1e7069 17-May-1998 Tor Egge <tegge@FreeBSD.org>

Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.


# 5758c2de 01-Apr-1998 Tor Egge <tegge@FreeBSD.org>

Add two workarounds for broken MP tables:

- Attempt to handle PCI devices where the interrupt is
an ISA/EISA interrupt according to the mp table.

- Attempt to handle multiple IO APIC pins connected to
the same PCI or ISA/EISA interrupt source. Print a
warning if this happens, since performance is suboptimal.
This workaround is only used for PCI devices.

With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.


# 300e9a76 01-Apr-1998 Tor Egge <tegge@FreeBSD.org>

Declare some variables modified by interrupt handlers as volatile.


# 8f9110f6 07-Mar-1998 John Dyson <dyson@FreeBSD.org>

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


# 5dd528cd 05-Mar-1998 Tor Egge <tegge@FreeBSD.org>

Remove special handling for resuming clock interrupt when using APIC_IO.
The `generic' vector stubs do the right thing.


# 02c1dc3b 03-Mar-1998 Tor Egge <tegge@FreeBSD.org>

When entering the apic version of slow interrupt handler, level
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.

Clock interrupts now have higher priority than other slow interrupts.


# 3163861c 03-Mar-1998 Tor Egge <tegge@FreeBSD.org>

Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.


# 221e0c59 03-Mar-1998 Tor Egge <tegge@FreeBSD.org>

forward_statclock and forward_hardclock are located in mp_machdep.c.


# 66095752 24-Feb-1998 John Dyson <dyson@FreeBSD.org>

Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs. The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by: Partially from Tor Egge.


# 82d86d37 01-Jan-1998 Bruce Evans <bde@FreeBSD.org>

Moved the SMP declarations of INTREN() and INTRDIS() to the correct header,
i.e., the same header as corresponding non-SMP #defines.


# eae8fc2c 08-Dec-1997 Steve Passe <fsmp@FreeBSD.org>

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


# 20233f27 07-Sep-1997 Steve Passe <fsmp@FreeBSD.org>

General cleanup of the lock pushdown code. They are grouped and enabled
from machine/smptests.h:

#define PUSHDOWN_LEVEL_1
#define PUSHDOWN_LEVEL_2
#define PUSHDOWN_LEVEL_3
#define PUSHDOWN_LEVEL_4_NOT


# 78292efe 30-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

Another round of lock pushdown.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.

Help from: Bruce Evans <bde@zeta.org.au>


# 9a3b3e8b 26-Aug-1997 Peter Wemm <peter@FreeBSD.org>

Clean up the SMP AP bootstrap and eliminate the wretched idle procs.

- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)

More cleanup to follow, this is more of a checkpoint than a
'finished' thing.


# 8ee0110a 24-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

A clean fix for the spl "deadlock before smp_active" problem.

Added a new variable, 'bsp_apic_ready', which is set as soon as the bootstrap
CPU has initialized its local APIC. Conditionalize the GENSPLR functions
to call ss_lock ONLY after bsp_apic_ready is TRUE; This should prevent
any problems with races between the time the 1st AP becomes ready and the
time smp_active is set.


# 4a73d99f 20-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

Made PEND_INTS default.
Made NEW_STRATEGY default.
Removed misc. old cruft.

Centralized simple locks into mp_machdep.c
Centralized simple lock macros into param.h

More cleanup in the direction of making splxx()/cpl MP-safe.


# 59964619 19-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

Preperation for moving cpl into critical region access.
Several new fine-grained locks.
Control of new FAST_INTR() methods.


# 7e48002a 17-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

Removed volatile from arg to simple_lock & friends.


# b5cdece0 14-Aug-1997 Steve Passe <fsmp@FreeBSD.org>

The promised "better fix" for "Trap 9 When Boot SMP" problem.
We now tsleep() in kthread_init() between start_init()
and prepare_usermode() while waiting for ALL the idle_loop()
processes to come online.

Debugged & tested by: "Thomas D. Dean" <tomdean@ix.netcom.com>

Reviewed by: David Greenman <dg@root.com>


# 98bf2bff 31-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Fixed imen declaration.

Submitted by: Bruce Evans <bde@zeta.org.au>


# 2e6a5b15 30-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Converted the TEST_LOPRIO code to default.


# 412f3e4d 27-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Modified the PEND_INTS algorithm to fix the ISA INT loss problem.

Noticed by: dave adkins <adkin003@gold.tc.umn.edu> and others.


# e27fb7b6 24-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

param.h:
Macros to convert the Lite2 lock manager primitives to the names used
in the kernel proper. This allows us to hide them from the lock
manager till they can be turned on.
smp.h:
declarations for the new simplelock functions.


# d9593fb9 23-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Forced 32bit alignment of struct simple_lock in param.h.

Added declarations of new simple_lock data and functions to smp.h.


# 919bdda1 22-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Coded simple_lock and friends in asm.


# 87a6f310 22-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Last commit didn't take, operator error???


# 03aad533 20-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Pass string arg to apic_dump.


# c5f838ab 12-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Many new test defines, including:
- TEST_CPUSTOP adds stop_cpus()/restart_cpus(), OFF by default
- TEST_ALTTIMER new method for attaching 8259 PIC to APIC
this method avoids 'ExtInt' programming, ON by default
- TIMER_ALL sends 8259/8254 timer INTs to all CPUs, ON by default
- ASMPOSTCODExxx code to display bytes to POST hardware, OFF by default


# ad2e8ff4 08-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

General cleanup of APIC code.
stop_cpus/restart_cpus STILL not working!


# 69f0a823 06-Jul-1997 Steve Passe <fsmp@FreeBSD.org>

Additional debugging functions and macros.
"spurious INTerrupt" support.


# 734d28f8 27-Jun-1997 Steve Passe <fsmp@FreeBSD.org>

Preliminaries for stop_cpus()/restart_cpus().
Both are turned off by default.

Added macro for displaying POST codes from kernel.


# 89e4e0c0 25-Jun-1997 Steve Passe <fsmp@FreeBSD.org>

Modified to declare merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()
- get_pci_apic_irq() -> pci_apic_pin()


# b3196e4b 22-Jun-1997 Peter Wemm <peter@FreeBSD.org>

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


# 08942d46 28-May-1997 Steve Passe <fsmp@FreeBSD.org>

apic.h now has structure definitions for both the local APIC and io APIC.

apic.h has defines like:
#define lapic__id lapic->id

Once private pages and "known virtual addr" mapping of the APICs is
ready all 'lapic__XXX' will be changed to 'lapic.XXX', and the defines
will be removed.

Changes to smp.h for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.


# b8d67ee0 28-May-1997 Steve Passe <fsmp@FreeBSD.org>

Add declaration of mp_probe().

This is now called directly from machdep.c.


# 2263dd14 07-May-1997 Peter Wemm <peter@FreeBSD.org>

remove #include "opt_smp.h"
remove declarations for the SMPcurproc[NCPU] etc arrays. There was no
need to mention NCPU there, and they've been moved to their normal home.


# f258030f 06-May-1997 Steve Passe <fsmp@FreeBSD.org>

Force user to config SMP kernel with "options APIC_IO".

Reviewed by: Peter Wemm <peter@spinner.DIALix.COM>


# 2479ac60 05-May-1997 Steve Passe <fsmp@FreeBSD.org>

Code to handle SMP/APIC_IO mapping of ISA INTs to APIC pins above IRQ15.

- doesn't break my system.
- NOT yet verified on the affected motherboard.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


# 004cb623 03-May-1997 Steve Passe <fsmp@FreeBSD.org>

added declaration for get_isa_apic_mask().

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


# a1b71271 01-May-1997 Steve Passe <fsmp@FreeBSD.org>

cleaned up FAST_IPI code.
- one-liners all become inline.
- multi-liners become functions.
- FAST_IPI defines go away.

re-worked APICIPI_BANDAID code.
- now refered to as DETECT_DEADLOCK.
- on by default.


# dff5c18a 30-Apr-1997 Steve Passe <fsmp@FreeBSD.org>

changed expect_lock() to try_lock(), the real name used in mplock.s


# 2c5d02ff 27-Apr-1997 Steve Passe <fsmp@FreeBSD.org>

remove all the SMP_INVLTLB defines, making the code default for APIC_IO.

Reviewed by: informal discussion with Peter Wemm <peter@spinner.DIALix.COM>


# 477a642c 26-Apr-1997 Peter Wemm <peter@FreeBSD.org>

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!